Rclone sync S3 to S3 runs for hours and copy nothing

This is rclone doing HEAD requests to read the modtime most likely.

You can stop it doing this with the --size-only or the --checksum flags and the sync should start much quicker.

Are a great number of those 80 million files in the same directory? That is what your out of memory makes me think.

The problem is big syncs with millions of files in one directory. Rclone syncs on a directory by directory basis so you can have 10,000,000 directories with 1,000 files in and it will sync fine, but if you have a directory with 100,000,000 files in you are likely to need about 100GB of RAM to process it.

I have a plan on how to improve these large directory syncs. This would involve storing the sync info on disk.

I've even found a nice open source library to help.

All I need is a bit of time - or maybe some sponsorship.

@lc63 would your company like to sponsor an out of memory large sync mode for rclone? Check out taking out a support contract which can help you get answers quicker and keeps the rclone project sustainable.


Meanwhile you can simulate an out of memory sync using a bit of unix tooling like this

First read file names (this is likely to take 2 hours for you I think)

rclone lsf --files-only -R src:bucket | sort > src
rclone lsf --files-only -R dst:bucket | sort > dst

Now use comm to find what files need to be transferred

comm -23 src dst > need-to-transfer
comm -13 src dst > need-to-delete

You now have a list of files you need to transfer from src to dst and another list of files in dst that aren't in src so should likely be deleted.

Then break the need-to-transfer file up into chunks of (say) 10,000 lines with something like split -l 10000 need-to-transfer and run this on each chunk to transfer 10,000 files at a time. The --files-from and the --no-traverse means that this won't list the source or the destination so will avoid using too much memory.

rclone copy src:bucket dst:bucket --files-from need-to-transfer-aa --no-traverse

I left a lot of details out, but that is the basic idea.

If you want you can include hash and/or size in the listing so you can work out if you need to sync changes or not. This takes a bit more processing as you need to strip that info before you make the need-to-transfer file.

2 Likes