Unable to migrate huge S3 Bucket (500 millions objects / around 100Tb)

ncw · April 27, 2022, 3:53pm

Rclone will use something like 1k or RAM per object or folder in a directory, so those 500 million directories will take something like 500 GB of RAM....

What you can do is do a sync using a file list, so something like

rclone lsf --files-only -R remote_src:bucket_source > source-files

This should complete without using too much RAM.

Then you can use this to do the copy. Note the --no-traverse prevents the directory listing and the --no-check-dest prevents rclone checking the source already exists which will speed things up.

rclone copy --files-from source-files --no-traverse --no-check-dest remote_src:bucket_source remote_tgt:bucket_target

I think you'll need to split the source-files up into chunks (say 100,000 lines) so that step doesn't use too much memory either and run it multiple times.

I'd like to make rclone deal with this more sensibly and I have ideas for doing that - maybe your company would like to sponsor me to add those features?