We are hosting internal docker registry with 3 data centers. each DC registry nodes connect to DC specific Ceph S3 storage
We found DC B and DC C missing thousand of layers and thus want to copy from DC-A to B & C. Number of files and total size is very huge, close to 20 TB or 1180813 files
The reason i selected rclone vs s3cmd is, rclone seems supporting multi site bucket to bucket copy where s3cmd doesn’t support unless you download first and upload again.
Questions
Don’t want to copy every file from source to destination except missing files. what is best option copy or sync?
in either the case, what are the best optimized flags to improve speed and reliability of the copy/sync command flags like -v --log-file rclone.log --checkers=16 --transfers=16 ???
Can you share the right command with right arguments to speedup the operation for large data sets?
Top tip first: on an s3 remote reading the mod time takes an extra transaction so using --checksum or --size-only will speed up a sync so I’d recommend one of those.
--fast-list may improve performance.
Setting --checkers and --transfers higher will use more network bandwidth and memory at some point it will become counterproducive. The defaults of 4 & 4 are quite conservative. I regularly use 64 & 64.
If you’ve got lots of really big files and you don’t care about keeping the md5sum use
--s3-disable-checksum Don't store MD5 checksum with object metadata
Increasing this will help with big files at the cost of memory
--s3-chunk-size SizeSuffix Chunk size to use for uploading. (default 5M)
Increase this if you have a small number of big files
--s3-upload-concurrency int Concurrency for multipart uploads. (default 2)