Fastest way to run a "one-off" migration of an entire Azure blob storage account to Amazon s3

What is the problem you are having with rclone?

Hello,

I have been asked to migrate an entire Azure blob storage container to Amazon S3, this will be a one off migration and the files in the Azure storage container will not change at all during the migration.

I have roughly 50 million blobs totalling around 14TB, but no single directory contains more than 1000 files.

Would rclone be a good tool to use for this task or am I overcomplicating things?

If so, are there any flags I should be looking at that could speed the process (the destination bucket will start empty so 'im thinking that there really isn't a need to do any kind of calculation of which files will need transferring)

I really appreciate your help and apologies if this is posted into the wrong place - I'll happily move it to the correct place if someone could point out where that is.

Thanks for your time.

Run the command 'rclone version' and share the full output of the command.

I do not yet have rclone installed, this is an exploratory question about whether or not rclone would be a good choice for my task. (but i would use the most recent version of rclone if so)

Which cloud storage system are you using? (eg Google Drive)

Azure Blob Storage Containers (Premium ZRS)

Amazon S3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

Not applicable to this question, Although if someone does have a sample of a command I could run, that would be very helpful.

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

Not applicable to this question.

A log from the command that you were trying to run with the -vv flag

Not applicable to this question.

welcome to the forum,

i would rent a cheap vm from aws, in the same region as the bucket.
then run rclone in that vm using something simple.
rclone move azure: aws: --no-traverse --no-check-dest --retries=1

and i am sure @kapitainsky will be some suggestions ;wink

1 Like

Have a look at this Github issue:

Looks like with your data structure you are in good position to do it using rclone.

Both source and destination are enterprise class storage remotes so you do not have to worry about artificial throttling etc.

I would simply start with defaults and measure what speed you get. Then increase transfers and checkers if your network connection is not saturated - this should be only bottleneck I think about really.

2 Likes

Thanks both!, I'll rent an EC2 instance then and give it a go, would you have any suggestions for what I should start out with for transfers or are the default values generally pretty good starting place?

Once I start running rclone, is it generally safe to stop the process, change the values for transfers and then restart it again?

Sorry if thats a silly question, I'm totally new to rclone.

i think it is mostly about saturating the internet connection and minimizing api calls.
exactly how to get that, just follow @kapitainsky approach.

yes, can restart.
fwiw, for testing, i would pick individual folders on the source and run rclone just on that, see the result, then tweak as needed.

and then, in the end, consider https://rclone.org/sponsor/