I am in the process of evaluation a migration method for our environment. I was wondering if rclone could able to handle our migration need.
We have on-premises object storage that needs to be migrated to different vendor. We have roughly around 2.5 Billion small objects (avg. size is around 20K) in our largest bucket. We’re using keys similar to this example: 0111f0/60/00/00/object.dat.
During the migration, we would need to achieve around 1000-1500 TPS, faster than that will stress the source too much and slower will take too much time.
As far as I understood, Rclone requires quite a lot of memory on and CPU when working with large number of objects. Is it able to handle this much?
CPU isn't normally a problem, but memory might be.
You say your objects look like 0111f0/60/00/00/object.dat - rclone will treat that as a file path even though it is really just a database key.
The limiting factor is how many files in a "pseudo directory". Rclone needs to keep the info for each directory in memory. This takes about 1k of RAM per object, so if your largest directory has 1,000,000 objects in it then rclone will use about 1G of RAM per --checkers.
If you want to find the largest directory then you could do something like this
I ran this against our one of our smaller bucket and it seems that data is distributed evenly across file paths. This would roughly mean that we would need more than 166GB of memory, as there’s 15 file paths. This should not be an issue as we can allocate around 512-768GB is needed. I need to do some more testing and see if I can calculate this against the large bucket. That will take a long time to complete..
Luckily the bucket we need to migrate is WORM protected and there will not be any new writes to it during the migration. That should make things a bit easier. We just need 1:1 copy of all objects. I will need to dig deeper on the needed parameters and start testing this tool a bit more.
one issue that’s wondering about is what happens if there’s a network failure or other issue during the migration. Is the only option to start from the beginning? All list operations to the source array are painful to execute. I already noticed that there’s no graceful stop or pause options.