you asked "how do I have an estimate of time needed" and i think i answered that.
rclone calculates checksums for every file copied.
so your performance is much more then just a theoretical max network speed.
what is your cpu, how many cores, how many threads can it handle. how much ram do you have?
is the server dedicated just to rclone, are there other users,
you have a lot of data to move.
are the s3 files in use, are you able to saturate your connection without interfering?
i was suggesting what i do.
i often install rclone for different customers with different systems.
so i do about 10 test runs, tweaking the parameters and then i know for sure.
i would tweak --transfers and --checkers and add --progress to see the bandwidth used.
Thanks for the --transfers and --checkers, progress already added with -P.
both infrastructure are cluster with a load balancer in front end, so probably the loadbalancer will be my bottleneck.
Did you never try data sync with such big buckets?
I would like to avoid any possible program limitation, like array size, objects ecc..
-s3-upload-concurrency should be --s3-upload-concurrency
many uses have copied larger data sets than you.
i have done a couple of 30+TB transfers.
rclone limits are dependent on the flags you use and the cpu and ram.
perhaps your computer cannot handle your settings or can handle much more.
you can get an estimate by reading this https://rclone.org/s3/#multipart-uploads
" Multipart uploads will use --transfers * --s3-upload-concurrency * --s3-chunk-size extra memory. Single part uploads to not use extra memory."
i get more specifications.
case 1 )i have 19 buckets that totaly contain 90TB and 500 milion of objects.
case 2) i have 2 bucket that totaly contain 50TB and 500 milion of objects
case 1)
each bucket should contain 26.315.789 objects
the average size of objects should be 193.27 Kilobyte
case 2)
each bucket should contain 250.000.000 objects
the average size of objects should be 107.37 Kilobyte
Having those specs, how would you change the execute command?
actually idk if all file are in a single bucket, of if have folder and subfolder.
I'd recommend using the --checksum flag - this will save transactions for an S3 to S3 copy.
What the keys in the buckets look like is important. If the keys have / in them so simulating a directory heirachy rclone can copy them by loading each "directory" at once.
However if there is no directory structure rclone will have to load them all into RAM at once. For 250M objects that will take lots of RAM! So much RAM that it might actually make copying with rclone impossible...
Assuming all the files aren't in one directory then rclone will hardly use any RAM - it will be using mostly network with a bit of CPU. A 4GB VM would be plenty I'd say.
Will you be repeating the copy, so trying to keep the source and destination in sync?
->However if there is no directory structure rclone will have to load them all into RAM at once. For 250M objects that will take lots of RAM! So much RAM that it might actually make copying with rclone impossible...
Any idea of possible ram usage?128GB,256GB?
->Will you be repeating the copy, so trying to keep the source and destination in sync?
There are some application working on source bucket, we need to migrate the data and the start the application on destination bucket. And then we will remove the old infrastructure.
So yes probably we will start the sync more then one time, to keep everything in sync.
Between 250-500 bytes per object I'd guess so 64-128GB
If you are doing repeated syncing then it will use more memory as it has to hold the source objects and the destination objects in memory to compare them, so twice what I said above.
However I think that you can probably use --no-traverse to stop rclone caching the destination objects.