I've been using rclone sync for around 2 years now to sync backups to an offsite location. All is working great, but lately I've been looking into optimizing it.
I move around 10 TB of data per backup session over 600/600 Mbit/s WAN. I do checksums (look below for complete command) on the files to be 100% sure there are no unnecessary transfers, since the files are quite large. During my testing it seems like the --checksum flag makes the checksum run every time the sync happens, and with large files, that takes time!
I've been looking around now for some time to see if it would be possible to somehow store the checksums to a file instead, and then just read that file. That would mean that every time the script is run and the checksum/mod-time/size are identical - don't checksum it again. If the mod-time/size differ, then do a new checksum and transfer if it differs.
Maybe I'm bad at explaining, but my main goal is to skip checksums if the file is identical. Maybe there's a smarter way?
I've been running rsync as well (with zstd) and it's super nice, but only single CPU, so my Xeon L-CPU caps out and I can't utilize full speed over the network, even with compress-level=1 set.
Run the command 'rclone version' and share the full output of the command.
rclone v1.68.2
os/version: debian 12.8 (64 bit)
os/kernel: 6.8.12-5-pve (x86_64)
os/type: linux
os/arch: amd64
go/version: go1.23.3
go/linking: static
go/tags: none
Which cloud storage system are you using? (eg Google Drive)
NONE. Only SSH (SFTP)
The command you were trying to run (eg rclone copy /tmp remote:tmp)
Mission failed. Running the same script after the first run, doesn't cache the checksums (as it seems). It takes the same amount of time, and it checks the checksum again....
Does the caching work only if I make a pre-run with something like
I have never used hasher myself but as docs state that it helps with "Cache checksums to help with slow hashing of large local or (S)FTP files" it sounded like perfect fit for your case.
I think as you have "slow" hashing on both end of your sync it require hasher on both local and sftp. You have to experiment a bit here.
Also I noticed you sync all data every time. You could speed things up by only syncing new files (--min-age). If you run it daily then only sync files changed in the last 25h for example. You could still run full sync sometimes to make sure that all is fine.
Yeah, to me too. But, either I'm doing it wrong or it's bugging out on me. Strange this is that it says that it's writing to bolt, but there's no file. So I'm guessing it's a bug.
2025/01/06 12:07:41 DEBUG : PBS-MAIN~hasher.bolt: Opened for writing in 74.394µs
2025/01/06 12:07:42 DEBUG : PBS-MAIN~hasher.bolt: released
root@pbs-main:~# ls -la .cache/rclone/
total 1
drwxr-xr-x 2 root root 2 Jan 6 12:08 .
drwxr-xr-x 5 root root 5 Sep 15 17:59 ..
root@pbs-offsite:~# ls -la .cache/rclone/kv/
total 34
drwx------ 2 root root 4 Jan 6 12:05 .
drwx------ 3 root root 3 Jan 5 17:29 ..
-rw------- 1 root root 262144 Jan 4 22:09 PBS-MAIN~hasher.bolt
The PBS-MAIN~hasher.bolt is from an effort last night trying to pre-check everything. Running the script again doesn't update the file, and the time is the same (15 minutes on my test files which are already synced).