Box remote + chunker, slow single-file upload performance. How to upload chunks in parallel?

What is the problem you are having with rclone?

Running rclone copy whenever new files are downloaded on my seedbox. Using watchexec, it runs once when the first new episode is downloaded, and then again once that's finished (which usually picks up the rest of the season).

The problem is that the first upload, with a single file, runs incredibly slowly (like 2-3MB/s). If it's trying to upload two files, I get roughly 6-10MB/s. And it appears somewhat linear up to ~30MB/s for lots of files (which feels like the Box limit).

Run the command 'rclone version' and share the full output of the command.

bigmoney@10:~/src/infinity$ rclone version
rclone v1.63.1

  • os/version: debian 10.13 (64 bit)
  • os/kernel: 4.19.0-22-amd64 (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.20.6
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Box Business (5GB file size limit)

rclone copy ~/Media box-crypt-chunked:/Media --ignore-existing --progress --checksum --transfers=100

The rclone config contents with secrets removed.

[box]
type = box
token = ...

[box-crypt]
type = crypt
remote = box:/rclone/mount-a
password = ...
password2 = ...
filename_encoding = base32768

[box-crypt-chunked]
type = chunker
remote = box-crypt:/
chunk_size = 5000M
name_format = *.rcc###
hash_type = sha1all

A log from the command with the -vv flag

Didn't capture it with -vv yet, but here's the normal output showing the issue:

With one file:

Transferred:      768.859 MiB / 9.853 GiB, 8%, 2.361 MiB/s, ETA 1h5m47s
Checks:                56 / 56, 100%
Transferred:            0 / 1, 0%
Elapsed time:      3m54.3s
Transferring:
 * TV/Foundation (2021)/S…asStudio-Scrambled.mkv:  7% /9.853Gi, 2.361Mi/s, 

With two files:

Transferred:       12.572 GiB / 22.084 GiB, 57%, 8.113 MiB/s, ETA 20m
Checks:                56 / 56, 100%
Transferred:            0 / 2, 0%
Elapsed time:     25m10.9s
Transferring:
 * TV/Foundation (2021)/S….DV.HEVC-CasStudio.mkv: 67% /12.231Gi, 5.149Mi/s, 13m10s
 * TV/Foundation (2021)/S…asStudio-Scrambled.mkv: 43% /9.853Gi, 2.964Mi/s, 31m53s

I'll push a new version of my watch script that adds better logging tomorrow, unless the issue is immediately obvious to someone?

welcome to the forum,

based on your command and flags, i assume that you have read this

box is well known to be very slow as discussed in the forum

This is Box behaviour observed by other users. You should increase --transfers flag value to maximise parallelism benefits. Try something like 100 for example. You have to experiment until you find what is the sweet spot.

Yep, that's what led me to post, since this comment:

whoever implemented box remote decided to use --transfers to control both at the same time

led me to think maybe I was just not using enough --transfers. But I set it to 100 and am not seeing any improvement for a single file.

not sure what to tell you except based on forum and docs, that is how box will perform.

Already have, I'm afraid. Seeing just fine performance (30MB/s+) if I have 4 or more files to upload at once, but there doesn't seem to be any way to speed up a single. Could it be that --transfers actually isn't attempting to parallelise multiparts?

I'm also curious to run a test with setting the chunk size to be smaller (~1GB), or turning off checksumming.

why are you running rclone twice like that, to transfer just one file and to then transfer rest of season?
just run rclone one time, once the complete season has downloaded

no need to guess, as per the link i shared and you had already read, use a rclone debug log.

It is unfortunate implementation IMO when --transfers control both - number of chunks for multipart and number of files.

So when you set it to 100 and copy one large file it will be uploaded in 100 chunks.

But if you have more than 100 files they will be uploaded each as one chunk.

Transfers

For files above 50 MiB rclone will use a chunked transfer. Rclone will upload up to --transfers chunks at the same time (shared among all the multipart uploads). Chunks are buffered in memory and are normally 8 MiB so increasing --transfers will increase memory use.

If I was a heavy Box user I would look at changing it. Until somebody does it nothing will change.

And indeed the best way to look at this is you post full DEBUG log. Then we can see what is going on. Nothing is perfect maybe there is some bug? etc.

It's being triggered by a filesystem watcher, but being debounced so it only has one instance of rclone running at each time. As soon as one file appears, it kicks off the first transfer, which takes ages.

I could also set it up so that each new file that appears triggers its own rclone copyto, so I'd have multiple instances running simultaneously, and I wasn't sure if they'd conflict with each other. I also considered using rclone rcd and setting up each transfer as a separate _async job but thought I'd try this first.

no need to guess, as per the link i shared and you had already read, use a rclone debug log.

Yeah I will do. As I mentioned I couldn't easily restart this process with the extra debugging so I was curious mainly to see if something about my setup was obviously wrong. But will dig in and generate full logs out tomorrow.

Whoa really? If that's true that absolutely sounds like it might be the culprit. Particularly with the chunker with checksumming—even <5GB files end up as two files to upload (one data, one metadata), as far as I understand. So it could be that even a "single file" ends up being sent as two files, both in one chunk.

Thanks for that though, that gives me a few experiments to run to see if I can track this down.

If I was a heavy Box user I would look at changing it. Until somebody does it nothing will change.

I haven't looked at the code at all yet (so far everything I've wanted to do with rclone has "just worked"), but if it really is just a matter of adding a --box-transfers-per-file flag and using that in place of --transfers, I should be able to figure that out.

I think people would be grateful if it is done. No idea why original implementation was done in such IMO weird way.

Well, by way of update, I've turned on debug logging and... I can't reproduce the slow performance. Perhaps I was being throttled as I had been experimenting with the API for a few hours?

Anyway here's a log of uploading a 25GB file across 5 chunks: https://pastebin.com/raw/WnTf34VP

A couple of queries:

  • this says I finished uploading ~47GB. Is encryption really causing the file to be ~88% larger? Or is it miscounting something?
  • It's clearly uploading the chunks one at a time. Maybe the Box remote is fine as-is but the chunker should upload in parallel?
  • It appears that 1-3 64Mi parts start being uploaded each second. But with the total speed being so fast, I can't tell whether they're happening in parallel or serially. If it logged a "Uploading part 3/79 complete" message you'd be able to see if they interleaved with the next upload. In any case, it's certainly not starting 100 simultaneous transfers.

Comparing that to a job where I had 3 files to upload (all <5gb so single chunk), you can see that the 3 files are started simultaneously: https://pastebin.com/raw/nCn33VhP

  • Total bandwidth still seems double-counted
  • Each file appears to upload its parts sequentially, as they run at very different speeds:

After 1 minute of uploading, two files have uploaded 1.2GB, one has uploaded 256MB:

2023/08/28 02:32:44 DEBUG : <encrypted S01E03>: Uploading part 40/83 offset 1.219Gi/2.572Gi part size 32Mi
2023/08/28 02:32:44 DEBUG : <encrypted S01E05>: Uploading part 9/72 offset 256Mi/2.220Gi part size 32Mi
2023/08/28 02:32:45 DEBUG : <encrypted S01E04>: Uploading part 40/75 offset 1.219Gi/2.325Gi part size 32Mi

But that "slow file" appears to speed up as soon as the others finish, which is weird too:

  • First 2 mins it uploads 576Mi (other files finish here)
  • Uploads the remaining 1.65Gi in 26s

Which makes me think Box is just throttling connections so what can you do? ¯\(ツ)

Double counting is fixed in the latest beta and will be released in rclone v1.64.

Transfers do not start at the same time but should run in parallel.

What command and arguments did you use to upload these test files? I see you're using sha1all as the hash_type, is that still true?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.