Now i’ve tested rclone for data migration purposes. I need to move data between two on-premise s3-compatible storages.
rclone performed well on 10 million objects.
But the production environment has more than 1 billion objects. Two questions:
- Is rclone ready for listing and syncing such amount of objects?
- Anyone had similar experience?
- Is it possible to sync buckets that are continiously changed?
The number of files shouldn’t matter as rclone will do it in batches anyway. It should perform the same with 100,000 objects vs a billion. It’ll just take longer.
rclone will just copy the files as it progresses along. Subsequent runs will pick up any files that changed along the way.
It should be. The main limitation will be how many objects are in a “directory”. Rclone keeps that many in memory at once. Assuming you haven’t got all 1 billion objects in a single directory then you should be fine.
There have been reports of other huge syncs yes, but I don’t recall one at 1 billion objects before.
There are certainly some optimizations you can do.
Check out –user-server-modtime and --update.
You’ll probably want to use
--checksum in your sync, or
--size-only to avoid the metadata read for the timestamp.
Yes, rclone will do its best to sync such buckets. There may be a few errors in the logs where files get deleted or changed at the wrong time but the sync will continue. You might want to set
--retries 1 so rclone doesn’t immediately retry the sync in this case - you’ll script it to run on your schedule I’d imagine.