When running rclone to sync multi-GB data between high performance file systems in Linux (Lustre / GPFS) it seems to only use 32k read/write sizes (verified with strace). Can this be changed (either via parameter or source) to use something much larger (like 1MiB)? I had tried --buffer-size and it doesn't seem to effect the read/write size. I'm not sure this is really a primary use case for rclone, but the tool is generally desirable for what we are doing (users moving large data sets).
Run the command 'rclone version' and share the full output of the command.
Using the rclone rpm in RedHat 8.4.
rclone v1.55.1-DEV
os/type: linux
os/arch: amd64
go/version: go1.14.12
go/linking: dynamic
go/tags: none
Which cloud storage system are you using? (eg Google Drive)
Local storage (GPFS / Lustre)
The command you were trying to run (eg rclone copy /tmp remote:tmp)
I built the latest version and it did help some. I actually didn't realize there was a post copy checksum and disabling that was a significant improvement. I also dropped multi-threaded transfers in favor of just increasing --transfers to have more file level parallelism. Thanks!
The test / example data set is 51 directories, 3970 files, and 80GiB in size. The test system has about 1.2 GiB/s of throughput available that I want to keep full since we have several PB to move.
A simple "cp -rp" takes ~4m and an rsync is ~5.5m. Rclone is substantially faster due to parallelism at ~1.3m. I use these arguements:
Using 8 transfers / checkers and ignore-checksum (to prevent the post-copy verify, which for a local copy shouldn't be necessary I'm assuming) seem required to hit a stable 1.2BiG/s. A few samples seem to show the buffer around 64M or more helps a small amount.
We will need to be able to resync data sets as we do an initial sync and then periodically update until we have a window to switch to the new area so it's important that we don't unnecessarily recopy data. One quirk I have seen is that some files seem to trigger a checksum even though the file is unchanged:
I'm not sure why this occurs. It may have something to do with differing storage technologies on the source and destination. Even if I add "--modify-window 1s" it still sometimes does the seemingly unnecessary read / checksum (not sure if it's doing that to the source or destination).
My thought is I can use rclone to do the bulk transfers and then use a final rsync to sync up permissions, acls, etc.
Are you using Glusterfs? That is renown for slightly variable modification times.
I don't think it should be doing that. I couldn't repro it myself (using touch to change age of file) can you come up with a simple repro I can try?
Sounds perfect.
Note that with rclone v1.59 you can use the -M flag which will sync permissions but it doesn't know anything about ACLs so rsync might be your best choice there.