Time estimation for migration

Hello,

I am trying to find if there is a way I can estimate time for migration from Source1 to Source 2 if I know the amount of data I'll be transferring.

Another question is if there are any recommended specs for machine where rclone needs to run on.

Thanks!

hello and welcome to the forum,

pick a subset of files that represent the overall data and transfer that using --progress

basically, rclone does not use much memory, cpu and uses zero disk space.

For the cpu, i beg to slightly differ IF you have a high bandwith machine to do the transfer (like capable of something like 1gb/s down 1gb/s up), and the source and destination rclones are crypted. It will hammer one core if you do this in one job.

Yes

If you have 100 GB to "migrate" and you have 1Gb connection it wont be faster than about 1000s.

There are too many variables and too little details you provided to give you any more sensible answer:)

If these are two remote directories, like google drive, dropbox, or even just separate NAS's. Then the limiting factor will be your network speed.
If your source and destination directories are on the same machine, then disk IO is going to be your limiting factor.

You could use rclone copy with the -P flag and a small subset of your data. That will allow you to see your transfer rate, as something like 64.388 MiB/s. You can then use a download time calculator to give you an estimate time based on the total amount of data you need to transfer.

FWIW, I just transferred 5.8TiB of data from Google to Dropbox using a Linode cloud server with a 1Gbps connection, and it took 27hrs and 13 minutes.

Thank you estimate, these are some good points!!
@H4v0k Thanks for your numbers, it does give me some estimate!

To add more details to my query, I have to transfer about 10TB of data from Akamai Netstorage to Amazon S3.

I don't have a lot on insight into Akamai netstorage currently and hence the issue. Based on their documentation, they support 1000read/sec and 90 concurrent connections.

I was wondering if I have to make this faster, do I need to use 2-4 machines with rclone running

as mentioned above, need run a test on a subset of files.
based on the results, we can tweak the command

Two 1st world providers:)

It should fly given you have fast internet Cx. rclone will not be bottleneck here.

You can like @H4v0k use some temporary server with fast two way connections (Linode, Digitalocean or why not Amazon).

10TB with 1Gb Cx will take no longer than 2 days max. I would not overcomplicate it with multiple machines unless for whatever reason you have to do it much faster.

I would do quick test runs with different

e.g.

 --checkers 16 --transfers 8
 --checkers 32 --transfers 16
 --checkers 32 --transfers 24

to see what works the best for your overall setup

Yeah, I know, however I don't have access to Netstorage yet and unfortunately, I have to give them an estimate before that :confused:
So I was trying to figure out if there was a way. If using a machine with specific specs can help me figure this out

ahh ok - this case I would use my connection speed as rough base for estimation.

multiply time by 3 to leave some margin.

Mention that it is all based on assumptions and throw some caveats.

You should finish real job faster and everybody will be happy.

1 Like

Thanks, this helps! Let's say I am using an EC2 (and it's not my forte); which machine do I go with to get 1Gbps connection

Unless you will run it on Raspberry Pi then any modern computer should handle it without issues - even with 10Gb Internet. Only moving to much higher bandwidths it would require more hardware planning. I assume you do not have 10Gb+ Internet

Thanks!! This will help me create some preliminary estimates, and then I can have actual estimate when I get the access

Sorry do not have it on top of my head now:) But as you need it only for less than 48h do not penny pinch too much. Better to have something that can rock. 8+ cores, 32+ GB RAM I would use just to have nice sleep knowing that it will handle all traffic.

What you should check is networking cost for 10TB traffic (assume 12TB to have some margin for testing).

1 Like

well, that is a big problem, no way to offer an accurate estimtate.

as this is a one time transfer, do not need to be super cheap with the vm specs, the cost difference will not matter much.
this should be more than enough but as per @kapitainsky, make sure you can sleep.

note: for free ingress into amazon, the cloud vm must be in the same region as the bucket.
as for egress from Akamai, i have no idea.

https://calculator.aws/#/addService/ec2-enhancement

Good to know, thanks!

Yes will look at the Ec2 pricing for this estimate.

Thanks, I was trying to find this. Does this mean there will be no egress cost for EC2 ? when transferring to S3? I know for S3 it will be free

Thanks!

I have just seen this posted:

Interesting ideas how to tweak S3 connections - see " Optional Rclone Settings to Improve Performance:"

Might be worth to try few different options and use the fastest one.

@spr0810 - when you are done with this transfer let us know how long it took. Always useful to hear some real life experience.

Thanks, sure will take a look!

Cool, good to know, will definitely

Definitely, thanks for all the useful tips; will let the community know when I'm done with this migration

I have one more question;
Looking at the documentation for netstorage, it says -
"This will effectively allow commands like copy/copyto, move/moveto and sync to upload from local to remote and download from remote to local directories with symlinks."

Can I not directly copy from remote server 1 ( netstorage) to remote server 2 (s3)?
The netstorage page only refers to copying to local directories.
Also didn't understand the concept of symlink :confused: