TLDR: It looks like some situations lead to chunker being still active on small files, which wastes API calls and severely restricts throughput for more restricted backends (e.g. Box). Chunker should automatically ignore files smaller than the limit and revert to the regular backend in those cases.
I am backing up some of our server's data to Box - around 58TB, with large files (>15GB) comprising ~95% of the data, and the remaining 5% data made of millions of small files. Because of the large files, I used chunker. The large files were a higher priority, so I pushed them first. I was able to finish those ~54TB in a matter of days. However, once it got to those millions of smaller files, my transfer rates plummeted. I was getting a throughput of 40 files per minute, which steadily dropped to 6 files per minute over the course of over a month. What would happen is that files would just sit in the queue at either 0% or 100% for minutes before going through. I've spent weeks tuning parameters to reduce API calls, but nothing seemed to really work.
Looking a the log, I've noticed that the transactions for each file looked like this:
2023/07/... INFO : my_file.rclone_chunk.001_znavdm: Moved (server-side) to: my_file
2023/07/... INFO : my_file: Copied (new)
It seems chunker was making chunks even though these files are a few MB. I decided to restart the transfers keeping everything the same except for using box
as the backend, instead of box-chunker
. Just by doing that, my transfer rate jumped from 6 files/min to >200/min, so it's very likely that chunker was the culprit.
For reference, here's my rclone invocation, chunker configuration, and rclone version (the latest at the time of writing this). I think some combination of flags is preventing chunker from seeing the file sizes on the local machine.
Invocation
rclone copy --files-from="file_list.txt" \
--fast-list --multi-thread-streams=20 \
--ignore-size --ignore-checksum \
--ignore-existing --log-file=logfile.log \
--log-level=INFO -P --transfers=120 \
--checkers=120 --skip-links \
indir "box-chunker:outdir"
Chunker
[box-chunker]
type = chunker
remote = box:
chunk_size = 15G
hash_type = sha1
rclone version
rclone v1.63.1-DEV
- os/version: centos 7.9.2009 (64 bit)
- os/kernel: 3.10.0-1127.13.1.el7.x86_64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.20.6
- go/linking: static
- go/tags: none```