gapa
(Rafał Gapski)
September 20, 2023, 10:17am
1
Currently I have azure blob remote with added crypt layer over it. When I use copy or sync commands for large files and abort it in the middle of upload, it seems that re-run same command start from the beginning of a file. Is adding additional layer chunker will make it possible? Are there any disadvantages of chunk remote? How does it work when in cases of modifying large files - does it remove unused parts in cases of making file much smaller?
rclone today does not support resume - neither to remote like azure directly nor with added chunker.
These issues are tracked but so far there is not much progress.
rclone:master
← mcalman:add-resume-interface
opened 02:58PM - 01 Sep 20 UTC
Added an interface for resuming failed uploads. Later on this interface
can be … implemented by any backend where resumes are possible.
#87
#### What is the purpose of this change?
Adds an optional interface for resuming uploads that can later be implemented by any backend where resuming uploads is possible. Once implemented by a backend, this will allow uploads to continue after a failure, even when restarting Rclone operations.
#### Was the change discussed in an issue or in the forum before?
https://github.com/rclone/rclone/issues/87
https://forum.rclone.org/t/drive-resume-resumable-upload-implementation/18289/4
https://forum.rclone.org/t/chunker-resume-upload-contribution/18571
#### Checklist
- [x] I have read the [contribution guidelines](https://github.com/rclone/rclone/blob/master/CONTRIBUTING.md#submitting-a-pull-request).
- [x] I have added tests for all changes in this PR if appropriate.
- [x] I have added documentation for the changes if appropriate.
- [x] All commit messages are in [house style](https://github.com/rclone/rclone/blob/master/CONTRIBUTING.md#commit-messages).
- [x] I'm done, this Pull Request is ready for review :-)
opened 10:41AM - 23 Mar 21 UTC
enhancement
Remote: Chunker
hashing
metadata
resume
### What is your current rclone version?
1.54.1
### What problem are you a… re trying to solve?
This ticket requests the _Resume_ feature in the chunker backend.
from https://github.com/rclone/rclone/issues/87#issuecomment-671506822 by @mcalman:
> I'm interested in addressing the case when an upload is interrupted for a large file, and must be restarted. It would be nice if the user was able to resume uploading that file from where they left off [...]
> I have been looking into using the chunker backend to support an upload resume feature. I noticed that when an upload is done with chunker and is quit during a file upload, the chunks that have already been uploaded are left on the remote, but then ignored. I have been working on modifying rclone chunker to check for these existing chunks, and if present, use them rather than re-upload those chunks.
Note that here we ask only for _sequential resume_, which is irrelevant to the _multi-thread upload feature_ covered by requests #5041 (for chunker) and #4798 (general discussion).
### How do you think rclone should be changed to solve that?
from the 1st https://github.com/rclone/rclone/pull/4547#discussion_r583522691:
> We can't just chain to the lower backend in general case. If a file is chunked, its remote will chain to a small metadata (or nothing if metadata is disabled). If it's not chunked, it can become chunked after resume, but we can't predict it [in a general case].
from the 2nd https://github.com/rclone/rclone/pull/4547#discussion_r583543773:
> Chunker can tolerate objects uploaded from multiple clients thanks to transactions [and save partially uploaded chunks per transaction].
Later, upon a resume request it can select the "best" incomplete transaction given the rolling hash state and size of already uploaded chunks.
from the 3rd https://github.com/rclone/rclone/pull/4547#discussion_r583534761:
> Golang's [Hash](https://golang.org/pkg/hash/#example__binaryMarshaler) interface allows to [save](https://golang.org/pkg/encoding/#BinaryMarshaler)/restore intermediate hash state for any (TBC) type of hash.
> [The common Resume handler will] [keep](https://golang.org/pkg/encoding/base64/#example) it in the resume metadata json together with hash name,
> [and will] negotiate with [chunker] whether operation should be continued from the last point or [retry] from the start
from the 4th https://github.com/rclone/rclone/pull/4547#issuecomment-786889739:
> The use of intermediate (aka rolling or accrued) hashsums will prevent the following scenario:
> * user uploads a large file
> * network broken, upload canceled
> * source file is changed or another attempt is changing the partial upload on target
> * user asks to resume a file
> * rclone resumes (here we could have checked validity of partial upload and rewind from start)
> * after some hours rclone finds that fingerprint is wrong
from the 5th https://github.com/rclone/rclone/pull/4547#issuecomment-786593956:
> [Let's] add a new per-transaction control chunk to save info about partial hash and [probably] hashes of uploaded chunks.
>
> [Let's also] add a code that selects transaction to resume given a partial hash and the total uploaded size so far. Maybe select the "best" partial transaction (when rename is fast) or just pick a single partial transaction ID (when it's slow).
The implementation will obey the _Resumer_ interface developed by PR #4547.
> In case of chunker the _resumer cache_ usage can be somewhat decreased because already uploaded chunks are isolated remotely and marked by a "transaction ID". The _resumer proper_ will just re-check them based on negotiations with chunker.
**NOTE** This change will create a new version of the chunker metadata and grow the number of tested combinations. I think we can commit this together with other chunker PRs on a _dedicated branch_ which will produce a beta release for public beta-testing. Later we can merge these commits together from there on the master branch using a single metadata version number.
### References
- Related to feature request #87 (Resume uploads)
- Depends on pull request #4547 (add Resumer interface)
- _Orthogonal_ to feature request #5041 (multi-thread uploads in chunker)
- _Orthogonal_ to discussion #4798 (multi-thread uploads for different backends)
- Related to thread https://forum.rclone.org/t/intelligent-faster-chunker-file-updates-on-checksum-enabled-remotes/22313/7
Given how old these issues are it wont happen quickly - unless you have skills and time to help.
Have a look at docs .
1 Like
gapa
(Rafał Gapski)
September 20, 2023, 10:30am
3
Thank you for the reply, I've subscribed to attached pull request
system
(system)
Closed
November 19, 2023, 10:31am
4
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.