S3 to s3 copy. 1 billion objects

strod · February 18, 2019, 3:09pm

Hi!
Now i’ve tested rclone for data migration purposes. I need to move data between two on-premise s3-compatible storages.
rclone performed well on 10 million objects.
But the production environment has more than 1 billion objects. Two questions:

Is rclone ready for listing and syncing such amount of objects?
Anyone had similar experience?
Is it possible to sync buckets that are continiously changed?

Thanks.

calisro · February 19, 2019, 1:47pm

The number of files shouldn’t matter as rclone will do it in batches anyway. It should perform the same with 100,000 objects vs a billion. It’ll just take longer.

rclone will just copy the files as it progresses along. Subsequent runs will pick up any files that changed along the way.

ncw · February 25, 2019, 12:37pm

It should be. The main limitation will be how many objects are in a “directory”. Rclone keeps that many in memory at once. Assuming you haven’t got all 1 billion objects in a single directory then you should be fine.

There have been reports of other huge syncs yes, but I don’t recall one at 1 billion objects before.

There are certainly some optimizations you can do.

Check out –user-server-modtime and --update.

You’ll probably want to use --checksum in your sync, or --size-only to avoid the metadata read for the timestamp.

Yes, rclone will do its best to sync such buckets. There may be a few errors in the logs where files get deleted or changed at the wrong time but the sync will continue. You might want to set --retries 1 so rclone doesn’t immediately retry the sync in this case - you’ll script it to run on your schedule I’d imagine.