How can I speed up a copy of a large directory (400TB)?

Hi Guys!

I need to migrate large amount of data from azureblob to s3 (AWS), currently I ran a "rclone copy azureblob s3" but I want to know if it is possible to speed up the process in some way. For example, could i run an "rclone sync azureblob s3" in parallel of the "rclone copy azureblob s3?.

Thanks in advance!

not familiar with those backends, but you could try to see the limits they impose and you could increase number of parallel checks and transfers.

Most likely your bottleneck will be your internet connection, so if data is divided in multiple folders and you have multiple/independent fast connections, you could do multiple copy/sync on each for a different folder.

Regardless 400TB will take a long time. Even maxing a 1gbps connection 24/7 would take over 40 days.

hello and welcome to the forum,

with that amount of data, the weak link will be the speed of internet connection.
rclone can easily saturuate my 1Gbps fiber optic connection.

best to run a few tests, using default values and see how long it takes to transfer a couple of TB.
based on those results, might be possible to tweak some flags..

use --progress and a debug log, to see what is going on...

no advantage to running multiple commands in that way.

how many total files need to be copied?

No. You could easily break consistency. And it isn't needed either! Just up --transfers.

Are you using --fast-list? Usually it is recommended for these bucket based remotes but I wonder if it is actually a problem now since it is waiting so long on listing before working. Not sure. Just a thought.

This is a serious amount of data. What infrastructure are you running this on? What time frame are you looking to have this finished in? There are 3rd party services to help with this for that amount of data. And for that size account (not to mention the $8400/month you're about to spend), they may have assistance.

1 Like

So I use fionera's PR and then coupled with --transfers 15

Or you can use gclone or fclone to speed up, the thing is they ain't as stable as rclone itself, would not recommend it. Better safe than have a mess that needs to restart.

I am estimating around 10 days if you are diligently watching and restart it whenever it fails, or set up a systemd with auto restart on fail.

My experience, slow and steady better than a worrisome 50 times re-run.

lol, it's like when you build up data properly to that level it is quite amazing.

Thanks Guys for your comments & recommendations !! I really appreciated!

Thanks,
Regards

i would rent a cheap virtual machine from aws or microsoft and run rclone on it
in that way, not using local resources such as internet

1 Like

Bandwidth cost would be expensive though.

I agree with comment above on using a 3rd party data migration company and just spend to have a specialized company do it.