When using rclone for local network syncs on directories with a lot of files (1 million+) copy and checking are interfering with another.
A check-only run (nothing or only small files have changed) takes about 30 minutes. Files copy over with 80 Mbyte/s. But as soon as both are interfering with one another, copying and checking slow down to a crawl (copying a 15 GB file takes over 8 hours).
Please add a flag to stop checks while files being copied or something similar.
What is the problem you are having with rclone?
Copy and check slow down to a crawl when they are simultaneously executed in a single rclone command over a directory with a huge amount (1 million+) of files.
What is your rclone version (output from rclone version )
rclone 1.51.0
Which OS you are using and how many bits (eg Windows 7, 64 bit)
Linux
Which cloud storage system are you using? (eg Google Drive)
Local Network
The command you were trying to run (eg rclone copy /tmp remote:tmp )
are you running multiple rclone commands at the same time?
It's a single rclone instance that syncs a local folder (1 million+ files) with a local network folder. But while it copies a larger file, rclone runs the checks in the background. Both of these operations interfere with one another because the read-and-write heads of the source and destination disks are all over the place.
If I run the command on the sub-folders with the larger files in it (under 10.000 files), the copy itself runs at 80 Mbyte/s. Then I re-run the command on the directory I want to be synced and it finishes the 1 million+ checks in about 30 minutes. But if I have a large file that needs to be transferred, rclone will eventually find said file and start the copy process while the checks keep running. The transfer speed drops to 200 Kbyte/s, because the heads of the disks constantly switch between reading/writing the file that needs to be transferred and checking the existing files.
Are you copying from an olde-fashioned spinning HDD?
This would be relatively easy, but it would involve buffering info about the files which need copying in memory which isn't enormous, say 1k per file which needs to be transferred. This mode would need to do something when --max-backlog was reached - maybe start the transfers anyway.
Suggestions for names for the flag?
One thing I could do is make an output mode for rclone check which outputted files which needed transferring in a format suitable for the --files-from parameter.
You could then do something like
rclone check source dest --differing-files-output > filez
rclone copy source dest --files-from filez --no-traverse
If you were willing to do a bit of scripting you could munge the output of check as it is at the moment into a format suitable for --files-from - that might be useful as an experiment to see whether it is worth while implementing the flag or the flag above.
That is a nice idea! --files-from - would be great for scripting. Want to please make a new issue on github with that idea in? It shouldn't be too difficult.
@RadarOReily
These scripts are written for something else, but do pretty much what you are describing. There are a few variations of difflist here, all using rclone commands and --files-from .
Coincidentally I also wrote just yesterday a bash script for a friend who wanted to feed 1 folder or 1 file at a time. Here is the simple version:
#!/usr/bin/env bash
# USAGE ./rc_one sync src: dest: <= change sync to copy/move as needed
action=$1
src=$2
dest=$3
while read -r name; do
rclone $action "$src$name" "$dest$name" -vP
done < <(rclone tree -d -i --level 1 --full-path --noreport $src | sort -r)
Remove -d if you want to process files, not just folders. And adjust or remove --level n depending on your needs (in his case he wanted to process one folder at a time, one level down).
And of course add other rclone flags like -vP .
I'm curious ncw/calisro - would --fast-list and/or something like --backlog=5000000 not help with the interference he describes? Specfically, I'm curious if --fast-list is useful or not with doing this kind of checking.
Separately, we submitted this issue to github a year or so ago for a similar challenge, suggesting a flag for rclone check that outputs names only. After which with your help I created the difflist/diffmove scripts. I'll link that github issue if I can find it.
--fast-list can help yes. However rclone now uses it automatically in quite a few places now, for instance if you do rclone ls or rclone lsf -R you'll be using --fast-list if the backend supports it.
Ah yes I vaguely remember that! Do link it if you find it!