gapa
(Rafał Gapski)
September 20, 2023, 10:17am
1
Currently I have azure blob remote with added crypt layer over it. When I use copy or sync commands for large files and abort it in the middle of upload, it seems that re-run same command start from the beginning of a file. Is adding additional layer chunker will make it possible? Are there any disadvantages of chunk remote? How does it work when in cases of modifying large files - does it remove unused parts in cases of making file much smaller?
rclone today does not support resume - neither to remote like azure directly nor with added chunker.
These issues are tracked but so far there is not much progress.
rclone:master
← mcalman:add-resume-interface
opened 02:58PM - 01 Sep 20 UTC
Added an interface for resuming failed uploads. Later on this interface
can be … implemented by any backend where resumes are possible.
#87
<!--
Thank you very much for contributing code or documentation to rclone! Please
fill out the following questions to make it easier for us to review your
changes.
You do not need to check all the boxes below all at once, feel free to take
your time and add more commits. If you're done and ready for review, please
check the last box.
-->
#### What is the purpose of this change?
Adds an optional interface for resuming uploads that can later be implemented by any backend where resuming uploads is possible. Once implemented by a backend, this will allow uploads to continue after a failure, even when restarting Rclone operations.
#### Was the change discussed in an issue or in the forum before?
https://github.com/rclone/rclone/issues/87
https://forum.rclone.org/t/drive-resume-resumable-upload-implementation/18289/4
https://forum.rclone.org/t/chunker-resume-upload-contribution/18571
#### Checklist
- [x] I have read the [contribution guidelines](https://github.com/rclone/rclone/blob/master/CONTRIBUTING.md#submitting-a-pull-request).
- [x] I have added tests for all changes in this PR if appropriate.
- [x] I have added documentation for the changes if appropriate.
- [x] All commit messages are in [house style](https://github.com/rclone/rclone/blob/master/CONTRIBUTING.md#commit-messages).
- [x] I'm done, this Pull Request is ready for review :-)
opened 10:41AM - 23 Mar 21 UTC
enhancement
Remote: Chunker
hashing
metadata
resume
### What is your current rclone version?
1.54.1
### What problem are you a… re trying to solve?
This ticket requests the _Resume_ feature in the chunker backend.
from https://github.com/rclone/rclone/issues/87#issuecomment-671506822 by @mcalman:
> I'm interested in addressing the case when an upload is interrupted for a large file, and must be restarted. It would be nice if the user was able to resume uploading that file from where they left off [...]
> I have been looking into using the chunker backend to support an upload resume feature. I noticed that when an upload is done with chunker and is quit during a file upload, the chunks that have already been uploaded are left on the remote, but then ignored. I have been working on modifying rclone chunker to check for these existing chunks, and if present, use them rather than re-upload those chunks.
Note that here we ask only for _sequential resume_, which is irrelevant to the _multi-thread upload feature_ covered by requests #5041 (for chunker) and #4798 (general discussion).
### How do you think rclone should be changed to solve that?
from the 1st https://github.com/rclone/rclone/pull/4547#discussion_r583522691:
> We can't just chain to the lower backend in general case. If a file is chunked, its remote will chain to a small metadata (or nothing if metadata is disabled). If it's not chunked, it can become chunked after resume, but we can't predict it [in a general case].
from the 2nd https://github.com/rclone/rclone/pull/4547#discussion_r583543773:
> Chunker can tolerate objects uploaded from multiple clients thanks to transactions [and save partially uploaded chunks per transaction].
Later, upon a resume request it can select the "best" incomplete transaction given the rolling hash state and size of already uploaded chunks.
from the 3rd https://github.com/rclone/rclone/pull/4547#discussion_r583534761:
> Golang's [Hash](https://golang.org/pkg/hash/#example__binaryMarshaler) interface allows to [save](https://golang.org/pkg/encoding/#BinaryMarshaler)/restore intermediate hash state for any (TBC) type of hash.
> [The common Resume handler will] [keep](https://golang.org/pkg/encoding/base64/#example) it in the resume metadata json together with hash name,
> [and will] negotiate with [chunker] whether operation should be continued from the last point or [retry] from the start
from the 4th https://github.com/rclone/rclone/pull/4547#issuecomment-786889739:
> The use of intermediate (aka rolling or accrued) hashsums will prevent the following scenario:
> * user uploads a large file
> * network broken, upload canceled
> * source file is changed or another attempt is changing the partial upload on target
> * user asks to resume a file
> * rclone resumes (here we could have checked validity of partial upload and rewind from start)
> * after some hours rclone finds that fingerprint is wrong
from the 5th https://github.com/rclone/rclone/pull/4547#issuecomment-786593956:
> [Let's] add a new per-transaction control chunk to save info about partial hash and [probably] hashes of uploaded chunks.
>
> [Let's also] add a code that selects transaction to resume given a partial hash and the total uploaded size so far. Maybe select the "best" partial transaction (when rename is fast) or just pick a single partial transaction ID (when it's slow).
The implementation will obey the _Resumer_ interface developed by PR #4547.
> In case of chunker the _resumer cache_ usage can be somewhat decreased because already uploaded chunks are isolated remotely and marked by a "transaction ID". The _resumer proper_ will just re-check them based on negotiations with chunker.
**NOTE** This change will create a new version of the chunker metadata and grow the number of tested combinations. I think we can commit this together with other chunker PRs on a _dedicated branch_ which will produce a beta release for public beta-testing. Later we can merge these commits together from there on the master branch using a single metadata version number.
### References
- Related to feature request #87 (Resume uploads)
- Depends on pull request #4547 (add Resumer interface)
- _Orthogonal_ to feature request #5041 (multi-thread uploads in chunker)
- _Orthogonal_ to discussion #4798 (multi-thread uploads for different backends)
- Related to thread https://forum.rclone.org/t/intelligent-faster-chunker-file-updates-on-checksum-enabled-remotes/22313/7
Given how old these issues are it wont happen quickly - unless you have skills and time to help.
Have a look at docs .
1 Like
gapa
(Rafał Gapski)
September 20, 2023, 10:30am
3
Thank you for the reply, I've subscribed to attached pull request
system
(system)
Closed
November 19, 2023, 10:31am
4
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.