I'm trying to migrate a lot of data from Amazon S3 to DigitalOcean spaces using
rclone. The issue I'm facing is that
rclone first downloads to local storage from S3, then uploads back to Spaces. I can't really do that efficiently with 100+ TB of data.
I want to make use of server side copy to avoid this unnecessary step, but can't seem to find a way to enable that between two different hosting providers.
Commands I'm using:
rclone sync s3:origin/folder-1/ spaces:dest/folder-1/
- os/arch: darwin/amd64
- go version: go1.13.1
We're using several buckets in Spaces utilizing hashring to map resources. Spaces tends to get slower if you have too many resources in one Space. Our move is due to specific reasons, but bottom line is DO is cheaper for us.
I'll be in touch with DO support team, but wanted to see if I can get some good advice in this forum as well. Thanks for sharing your thoughts!
I do not think it is possible to transfer server-side via fundamentally different providers. You can do it between google systems for example, but not in a case like this (I'm fairly certain, but won't say 100% because I've never had to do this personally).
So in short you need a middle-man, but it doesn't have to be your local system.
For mass-transfers like this, a GCP virtual machine is often used (especially for transfer to google systems as the ingest is free). The main benefit to running the clone transfer in some sort of virtual server is that you can have access to basically unlimited bandwidth. rclone will not need to save any data locally so a very minimal system (like a GCP micro instance) will suffice, but it will need a lot of that bandwidth. A microinstance for reference can do about 40-45MB/sec, but a more powerful VM could do much much faster given the right config.
You should definitely investigate if there exists any limits to ingest rate (ie. number of GBs pr day uploaded) on whatever service you are migrating to - because if there is then you will no doubt hit that limit real fast. Not all paid premium services (especially the pay-pr-use types) have limits like that, but some do.
EDIT: I am not so familiar with digitalocean but...
I can't identify any obvious limit like I talked about. There is some (small) extra cost to extra storage, but I assume you are aware of this already.
But assuming you don't have limits to worry about - the external VM aproach is very sufficient for the job, and it doesn't have to cost much at all if you choose wisely. GCP is great IMO as you pay-as-you-go by resources used. On traffic-export out of google you do pay something for the traffic, but it's peanuts so even 100TB as a one-time thing is not really a problem - go look at the price-sheet if interested. (you even have a 1-year trial you can activate for 300USD worth of credits to test the system, which would be far more than you'd need here I expect).
hey, the point i am making is that there is no need for an external middle-man. no need to use google.
- get vps from DO.
- mount the DO storage inside DO vps.
- run rclone inside the DO vps, copying amazon s3 data to the mounted DO storage.
Sorry, I didn't catch that DO also has VMs.
In that case - yes - that is the obvious choice here (as I see their ingest is free also), and obviously that VM will be in the same core network as the storage.
Sorry Monkeyman, my bad for not reading your answer properly before adding mine
for a demon to apologize to a lowly monkeyman, me so proud and honored.
Thank you both! I'll try spinning up a VM on DO and go from there.
The best way of doing the transfer is with
rclone sync as you've got it.
For extra speed add
--checksum since both ends support MD5 checksums (otherwise you'll use extra transactions reading the modtime) and
--fast-list provided your VPS has enough memory to hold the entire directory listing. You can also increase
--transfers 64 if you have lots of small files - both ends can take it.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.