Rclone sync S3 to S3 runs for hours and copy nothing

The log is filled by:

2023/07/17 19:06:00 INFO  :
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:    3h55m0.0s

I hadn't seen it, but indeed, the copy is finally killed by the OOM killer. The process took 3h55. Data is not copied in batches of 1000. I think the OOM kill is a consequence, not a cause.

Yes it is tough one. 80 millions objects potentially require 80GB of RAM (1KB per object). so probably need machine with 128GB RAM...

@ncw - is there any clever way to limit memory usage for such big syncs? We can see more and more huge datasets handled by rclone and maybe it would make sense to have slower but less memory hungry mode of operations?

1 Like

This is rclone doing HEAD requests to read the modtime most likely.

You can stop it doing this with the --size-only or the --checksum flags and the sync should start much quicker.

Are a great number of those 80 million files in the same directory? That is what your out of memory makes me think.

The problem is big syncs with millions of files in one directory. Rclone syncs on a directory by directory basis so you can have 10,000,000 directories with 1,000 files in and it will sync fine, but if you have a directory with 100,000,000 files in you are likely to need about 100GB of RAM to process it.

I have a plan on how to improve these large directory syncs. This would involve storing the sync info on disk.

I've even found a nice open source library to help.

All I need is a bit of time - or maybe some sponsorship.

@lc63 would your company like to sponsor an out of memory large sync mode for rclone? Check out taking out a support contract which can help you get answers quicker and keeps the rclone project sustainable.


Meanwhile you can simulate an out of memory sync using a bit of unix tooling like this

First read file names (this is likely to take 2 hours for you I think)

rclone lsf --files-only -R src:bucket | sort > src
rclone lsf --files-only -R dst:bucket | sort > dst

Now use comm to find what files need to be transferred

comm -23 src dst > need-to-transfer
comm -13 src dst > need-to-delete

You now have a list of files you need to transfer from src to dst and another list of files in dst that aren't in src so should likely be deleted.

Then break the need-to-transfer file up into chunks of (say) 10,000 lines with something like split -l 10000 need-to-transfer and run this on each chunk to transfer 10,000 files at a time. The --files-from and the --no-traverse means that this won't list the source or the destination so will avoid using too much memory.

rclone copy src:bucket dst:bucket --files-from need-to-transfer-aa --no-traverse

I left a lot of details out, but that is the basic idea.

If you want you can include hash and/or size in the listing so you can work out if you need to sync changes or not. This takes a bit more processing as you need to strip that info before you make the need-to-transfer file.

2 Likes

Thank you for looking into it. This exactly what I thought. Simply somebody has to implement disk bound dir listing caching. When things move into 10s of millions objects it needs different approach. Most users will never see this limits but enterprise.

Thank you very much for your reply and your time. My company doesn't want to take out a support contract at the moment, but it's not out of the question in the long term.

I will implement the proposed workaround. I'll report back here. Thanks for your time.

Let us know how you get on. We should probably write up this technique in the docs.

The copy works (with hashes for updates). However, the estimated processing time for the initial backup is significant : 221 days on Scaleway Objet Storage and 60 days on OVH High Performance S3 (we'll be reassessing our needs, because of the restoration time). The price of quantity.

I'll explain this technique in the documentation. On the rclone wiki (Home · rclone/rclone Wiki · GitHub) ?

2 Likes

What about forum and how to guides? I think wiki became a bit obscure and not many people look there.

FYI,

Wiki : Big syncs with millions of files · rclone/rclone Wiki · GitHub
Forum/How to : Big syncs with millions of files.

Regards.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.