Need to transfer 200 TB from S3 to Google Drive

I need to transfer 200 TB from Amazon S3 to Google Drive.

What are my options here without going bankruptcy?

Does GCE charge for downloading? I know upload is free to Google drive.

Ingress is free on GCE.

You'll still pay for the egress on S3 though :frowning:

Using --fast-list is a good idea for reducing transactions and for s3->drive they both have a supported checksum MD5 so I'd use --checksum to do the transfer (best to avoid modtime checking on s3).

1 Like

200 TB x ~$ 8.5 = ~1,700 usd

Ouch... I was wondering whether a snowball would be cost effective but it looks like they charge you the same costs for data out!

You don't need any more than a Google Cloud microinstance VM to run the transfer - which can be entirely free.

Google ingress and transfer within Google network is free.
There is not much to do about the egress fees on S3 though... I guess you just have to contact them and ask them what the cheapest option would be. Maybe they have another option when it comes to a one-time large-scale transfer like this. I am not really that familiar with S3, so I really can't say...

Another problem is that you will have a 750GB/day upload limit on Gdrive. While that is fairly generous in most practical use-cases it's still going to take a long long time to complete 200TB. About 267 days to be spesific. Not insurmountable - and the transfer can run automatically in the cloud VM - but it's still going to take some patience for sure...

3 Likes

Do you know how to setup multiple public ip into a single VM in GCE?
It’s not load balancer, I just need 5 public ip.

I have never done that myself, but I assume it will be possible to do if you just configure a machine to have multiple virtual network cards, because I am fairly sure this is possible to do with a custom-config VM and GCP is quite flexible with this. This - and having multiple public IPs is probably not within the free-usage though. That said, the actual cost of such a VM is pretty trivial, and besides Google gives all new users a 300dollar credit / 1 year free trial which you can use to play around with. You can do a lot with those free credits.

It would help if you told me what you were trying to accomplish though, because there is a good chance there is a simpler way to achieve the actual end-goal. (more public IPs wouldn't have any relevance to upload limits anyway)

I wants download large amounts of data from a source without getting flagged by splitting downloads in 5 chunks and each one with different ip.

given the limitations of gdrive in terms of data uploaded per day.
and that gce will charge per ip address,

use one ip at a time, switch the ip as needed.

Are you talking about an Amazon S3 download limitation here? (if so you should specify what that is exactly), then maybe I can suggest what the best approach is.

Or are you talking about uploading to Google?
If so I think you have misunderstood how the limitation works.

Pertaining to Google as the destination:
From a network perspective you are allowed to ingest (transfer into the Google network) as much data as you want for free, so that is not a problem at all. Therefore there is no need to bother with rotating IPs because there is no limit based on the IP you use.

As for actually storing the data once it enters the network (ie. getting data into Gdrive) there is a 750GB/day limit - and this limit is pr Google user. That is the only major limiting factor.
Out of respect to Nick and the well-being of the project I can not discuss on this forum how to exploit or willfully break the TOS.

Besides - if you did that it would probably be pretty obvious if 5 userlimits worth of data were transferring daily for 2 months into the same Gdrive storage. Maybe Google wouldn't care, but I wouldn't bet on them not being able to see it. And would you really want to give them a (valid) reason to terminate an account with that much data on it?

1 Like