Rclone fails to switch to multi-part uploads when a file is too large

helloimalastair · February 20, 2023, 5:19pm

What is the problem you are having with rclone?

When attempting to copy a 54.779 GiB local file to the remote with --s3-upload-concurrency higher than 2, rclone attempts to PutBucket a single time. When it fails(because the file is too large), it seems to just stop, instead of attempting a multi-part upload.

rclone does not provide any visual indication that it is failing, other than the fact that progress does not advance.

It appears that it does eventually succeed, but it takes about 5 minutes to start. No idea if this still qualifies as a bug, or just as a slight issue...

This adventure goes deeper and deeper. After letting it run for a while, it starts about 50 multipart uploads, running at around 150 MiB/s, but then stops making new chunks. It then begins to slow down the transfer, until it hits 0/s. Around 5 minutes later, it fires another 10 chunks, speeds up, then slows down again.

This is on a connection that normally runs around ~400 MB/s, and while I don't expect full utilization, this still seems extremely odd...

Run the command 'rclone version' and share the full output of the command.

rclone v1.61.1
- os/version: ubuntu 22.10 (64 bit)
- os/kernel: 5.19.0-31-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.19.4
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Cloudflare R2

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

rclone copy /media/alastair/7a70a803-7d9e-4711-a945-5d1791ec1507/planet.pmtiles r2:/protomap --s3-upload-cutoff=100M --s3-chunk-size=100M --s3-upload-concurrency=10 -vv -P

The rclone config contents with secrets removed.

[Drive]
type = drive
scope = drive.readonly
token = {"access_token":"SOME_GOOGLE_ACCESS_TOKEN","token_type":"Bearer","refresh_token":"SOME_GOOGLE_REFRESH_TOKEN","expiry":"2022-06-20T18:06:37.677412+02:00"}
team_drive = 

[r2]
type = s3
provider = Cloudflare
access_key_id = ACCESS_KEY_ID
secret_access_key = SECRET_ACCESS_KEY
region = auto
endpoint = R2_ENDPOINT

A log from the command with the `-vv` flag

2023/02/20 18:15:29 DEBUG : rclone: Version "v1.61.1" starting with parameters ["rclone" "copy" "/media/alastair/7a70a803-7d9e-4711-a945-5d1791ec1507/planet.pmtiles" "r2:/protomap" "--s3-upload-cutoff=100M" "--s3-chunk-size=100M" "--s3-upload-concurrency=10" "-vv" "-P"]
2023/02/20 18:15:29 DEBUG : Creating backend with remote "/media/alastair/7a70a803-7d9e-4711-a945-5d1791ec1507/planet.pmtiles"
2023/02/20 18:15:29 DEBUG : Using config file from "/home/alastair/.config/rclone/rclone.conf"
2023/02/20 18:15:29 DEBUG : fs cache: adding new entry for parent of "/media/alastair/7a70a803-7d9e-4711-a945-5d1791ec1507/planet.pmtiles", "/media/alastair/7a70a803-7d9e-4711-a945-5d1791ec1507"
2023/02/20 18:15:29 DEBUG : Creating backend with remote "r2:/protomap"
2023/02/20 18:15:29 DEBUG : r2: detected overridden config - adding "{ZdEwv}" suffix to name
2023/02/20 18:15:29 DEBUG : fs cache: renaming cache item "r2:/protomap" to be canonical "r2{ZdEwv}:protomap"
2023-02-20 18:15:30 DEBUG : planet.pmtiles: Need to transfer - File not found at Destination
Transferred:              0 B / 54.779 GiB, 0%, 0 B/s, ETA -
Transferred:            0 / 1, 0%
Elapsed time:        29.5s
Transferring:
 *                                planet.pmtiles:  0% /54.779Gi, 0/s, -

ncw · February 21, 2023, 10:03am

Rclone is most likely making the checksum to upload here. You can disable this with

  --s3-disable-checksum   Don't store MD5 checksum with object metadata

The consequences being that large files will no longer have checksums so rclone check won't work on them etc.

I think rclone's bandwidth calculations are probably being confused by the multipart upload. This isn't normally a problem, but you've set the chunk size quite large 100M. The bandwidth calculation will be correct in the end when the file has been uploaded. Do you have an alternate way of looking at the bandwidth used?

I note also that Cloudflare R2 has in the past been unhappy with large s3 upload concurrency. That may be fixed now, I don't know.

What I suggest you do is add --s3-disable-checksum so as not to confuse the timings, then try different values of chunk size and concurrency and see if you can find the optimum for cloudflare r2.

helloimalastair · February 21, 2023, 1:56pm

I just re-ran the copy with --s3-disable-checksum and --s3-chunk-size=50M. It starts running at almost 200 MB/s, but then falls back down to zero once it hits 50 chunks, which almost makes me think that rclone only counts bandwidth usage when starting a chunk, not while the chunk is actually uploading.

The thing that makes it even more odd is the fact that the Ubuntu System Monitor shows around 1 MB/s transfer once the initial rush completes, which almost makes me think something within rclone is waiting for a response to a chunk upload, or something else.

Full command from this point:

rclone copy /media/alastair/7a70a803-7d9e-4711-a945-5d1791ec1507/planet.pmtiles r2:/enamtest --s3-upload-cutoff=100M --s3-chunk-size=50M --s3-upload-concurrency=50 -vv -P --s3-disable-checksum

ncw · February 21, 2023, 3:36pm

I think that may be the case since the 100M chunks are buffered in the s3 SDK which rclone has no control over.

Rclone has to wait for the response to uploading each chunk before uploading the next one.

I think you may be overloading R2 or hitting some rate limit there - try reducing --s3-upload-concurrency=50 to 4 (which is the default) and see what you get, then work up from there.

helloimalastair · February 21, 2023, 9:31pm

Looks like it is still the same issue. Weirdly enough, now it is even slow to start when I completely remove the extra flags, running only

rclone copy /media/alastair/7a70a803-7d9e-4711-a945-5d1791ec1507/planet.pmtiles r2:/protomap -vv -P

I used to be able to run at a few MB/s without any configuration, but now that doesn't work either...

ncw · February 22, 2023, 8:49am

If you don't want slow to start you need --s3-disable-checksum

Hmm, things to try

Try setting --s3-upload-cutoff 1T (very large) so that you only upload files with single part uploads - does that help?

Also try --s3-concurrency 2

Related threads

I suspect this is caused by some new rate limit at cloudflare - remember R2 is a relatively new service and we've seen quite a few changes in how it works over its brief life.

helloimalastair · February 27, 2023, 6:44pm

Hey, sorry for the really slow reply, I was away from my desk for the(admittedly long) weekend.

Running

rclone copy /media/alastair/7a70a803-7d9e-4711-a945-5d1791ec1507/planet.pmtiles r2:/protomap --s3-disable-checksum --s3-upload-cutoff=5G --s3-upload-concurrency=2 -vv -P

Reduced to 5G, because that seems to be the max limit for R2, it progresses through chunks at an ok pace(around 5 seconds per chunk), but considering the chunks are small, it is still pretty slow.

ncw · February 28, 2023, 10:34am

That is an improvement!

I'd try increasing --s3-chunk-size next to see if you can improved the performance. Also try higher values of --s3-upload-concurrency until the uploads start to slow down.

system · March 30, 2023, 10:35am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.