I'm using Google Compute Engine to download large amounts of data (usenet - from outside of the GCP infrastructure) and then upload to an existing encrypted remote on Google Drive. I've been doing this from my 100Mbit home server, but I'm trying out Compute Engine since there's much higher bandwidth.
Once the data is downloaded, it ought to be a pretty quick transfer from the VM to Drive, but I'm maxing out at around 25-30MB/s. The total amount transferred is under the daily cap, but I want to do it as fast as possible so I can power down my instance when I don't need it. I get the same speeds whether I use rclone copy or mount the remote and use rsync to do the transfer.
There are also possible bottlenecks in my VM instance, so there's a lot of tuning here and I'm not quite sure where to start.
Well first of all you don't have to download data and THEN upload it.
You can just copy from one remote to another - no temporary storage is needed on the GCP VM.
rclone copy Remote1:/Foldername Remote2:/foldername
or just sync or move if that's more appropriate obviously
Secondly, to increase upload performance specifically, I would suggest increasing the --drive-chunk-size 64M, but be aware this uses up to 64M of memory pr transfer so it's perhaps something you should lower if you are on a minimal VM. This has big benefits on home connections but on "endless" bandwidth like on GCP it may have a much smaller impact.
This only really benefit large files though. If your problem is that you have tons of smaller files then no setting can really help you much as a Google Drive will only allow you to start about 2 transfers a second. That's a server limit we can't control (unless you archive files together into single bigger files).
--transfers 10 might help you a small bit with many smaller files but it can't override the 2/second transfer limit. This will at best only help you keep your performance as close to that limit as possible.
I clarified the original post - my source data isn't in the GCP infrastructure, it's from the broader interwebs (usenet). So I do need a staging area, most likely, so as to avoid thrashing Google Drive's API quota. Currently I'm using a SSD Persistent Disk as my staging area for the high I/O.
Running htop, I'm only at 12-25% CPU usage during rclone copy operations, and VM is only at 192M of total RAM usage, so that's not my bottleneck.
Ok so data goes:
usenet --> GCP --> Gdrive ?
I don't know what options you have to access usenet. If you can access it via for example FTP or SFTP or a simple HTTP interface then rclone could connect directly to it. Otherwise then I suppose you may have to download it first. I'm not well versed with usenet.
And do consider that usenet might also have limits on it's speed.
The limiting factors involved here would be:
- The Usenet source
- The amount of files (the 2/sec transfer limit on the Gdrive)
- The bandwidth on the GCP VM (I think this is typically 1.2Gbit pr CPU)
- For downloads to GCP spesifically, the speed of the VMs harddrive which is mostly determined by the size you set for it
I can't say for sure which of these are specifically limiting you though, as that is down to the details that are unknown to me.
EDIT: Assuming you already have some data on GCP to transfer - the first and last on the list would be irrelevant. That is only relevant for getting the data to GCP in the first place (either directly to rclone or via a temp-download).
For tuning/testing I would suggest you test with a singe very large file (to remove the Gdrive 2/sec limit from the equation since you can't do anything about that anyway).
But do set that chunk-size up as suggested. It might have a decent impact. If you have memory for it, go up to as high as 128M or even 256M as long as you don't go over the limit, or else rclone will crash. Again that memory is per transfer don't forget
This will help a lot with actually utilizing the bandwidth you have. Otherwise with small chunks the connection needs to stop and start a lot - and TCP needs a bit of time to "spool up" so on the default 5M it does this a lot and as the bandwidth gets higher this means more and more % of the time it's not transferring at full speed. Higher speeds do necessitate larger chunks. 64M is a sweet spot but technically larger can be marginally better if you can afford it.
I'm using NZBGet, downloads are around 35MB/s which I'm happy with. It's more so that I was expecting to be able to move data from GCP to Drive with at least the same speed as usenet --> GCP, hoping for much faster.
I've been getting the configuration set up on an f1-micro instance since I don't need performance for testing, but I'm looking at 2Gbit / vCPU, and the f1-micro has 0.2 vCPUs. Theoretical maximum is about 50 MB/s.
I changed my config and set drive-chunk-size to 32M and got a nearly instant 20 MB/s boost in copy performance from GCP to Drive. I'm doing my testing with an ubuntu iso which roughly simulates my actual transfers.
Yes, a combo of the drive-chunks and a bit more transfers will help you.
But a micro is limited as you say, so eventually you will hit that BW limit.
Do note that with TCP you can expect around 10-15% overhead (and likely a couple of % from the HTTP API also). So if the theoretical max was 50MB/sec then 40-45MB/sec is abut the highest practical number you can expect to reach I think. Beyond that you would be needing a VM with higher specs.
You probably have a quite limited memory then for chunks. 32M is decent too. It doesn't have to double if you want to fine-tune it, but it must be:
"Must a power of 2 >= 256k."
since more transfers will mean more RAM usage too when you increase chunk size, the "best" combination there would depend on how large the average file is.
At 64M I'm hitting the practical limit of the bandwidth and maxing out at about 43 MB/s.
I'll probably only run the instance for maybe 6 or so hours a week, especially considering the virtually unlimited bandwidth. My home server delivers the content from Drive, so GCP is only really being used for the extremely fast connection and free egress to Drive. I'll see if it's worth a few dollars a month.
Cool, this sounds about right
Bumped up to a beefier machine type to test things out, I'm getting over 100 MB/s from usenet now. Also this is what my rclone copy to Drive looks like:
Transferred: 39.622G / 86.447 GBytes, 46%, 135.545 MBytes/s, ETA 5m53s
Following up on this, it seems like I'm maxing out at around 42-43 MB/s (per file) to Drive on an rclone copy operation no matter how hefty the machine is. I've settled on a n1-highcpu-4 machine with a 500GB pd-ssd, which has a theoretical write limit of 173 MB/s (a bit higher reads). I'm getting that full speed from usenet, so the machine itself can handle 173 MB/s writes / reads.
Is there some sort of speed limit that the Drive API itself is imposing? Has anybody gotten faster transfer speeds than me? I'm not sure if there's anything else I can tune.
Yeah, I max out my gigabit when I have 3-4 uploads going:
Transferred: 96.800G / 471.619 GBytes, 21%, 96.237 MBytes/s, ETA 1h6m28s
Transferred: 102.458G / 471.622 GBytes, 22%, 96.256 MBytes/s, ETA 1h5m27s
Transferred: 108.082G / 471.622 GBytes, 23%, 96.241 MBytes/s, ETA 1h4m28s
Transferred: 113.737G / 471.625 GBytes, 24%, 96.254 MBytes/s, ETA 1h3m27s
Transferred: 119.428G / 471.626 GBytes, 25%, 96.296 MBytes/s, ETA 1h2m25s
Transferred: 125.121G / 471.626 GBytes, 27%, 96.335 MBytes/s, ETA 1h1m23s
Transferred: 130.800G / 471.629 GBytes, 28%, 96.360 MBytes/s, ETA 1h21s
My upload is straight forward:
# Move older local files to the cloud
/usr/bin/rclone move /local/ gcrypt: --log-file /opt/rclone/logs/upload.log -v --exclude-from /opt/rclone/scripts/excludes --delete-empty-src-dirs --user-agent animosityapp --fast-list --max-transfer 700G
and I use a chunk size of 128M in my config.
Your move command is essentially the same as mine, with 128M chunk size too. I suppose I should clarify that if I'm transferring multiple files, then I transfer faster than 40 MB/s in total, but per-file I can't get faster than that.
rclone move /pd/dl/done gcrypt_username:/media/content -P -v --exclude "/*unpack*/**" --ignore-case --no-traverse --delete-empty-src-dirs
Don't know. My connection isn't anywhere near 40 MB/sec it's not something I can even test.
Wouldn't be terribly surprised if there is something on the Google backend that sets a natural limit for what a single transfer can max out at.
But this said, it seems like quite the luxury-problem when you can bypass even that speed with just a few concurrent transfers.
If you actually consider this to be a problem, contact Google and pray they actually give you a concise answer I guess...
Some stones are better left unturned.
Comparing Usenet to a Cloud Storage provider is kind of like comparing a buffalo to a horse.
The way you tend to get speed on Usenet is by having many connections (8-12) so it threads and gives you the speed, which is downloading from Usenet.
You are uploading which is not multi threaded, but a single thread.
A multi threaded download from rclone easily maxes my gigabit:
In my experimenting I maxed GCE out at 200-210MB/s from usenet providers (I have several) to locally mounted SSDs, but "only" 30MB/s (~300mbps) to Gdrive (single file transfer)
AFAIK GCE limits bandwith depending upon the number of cores you select, so keep that in mind I thing its 2Gbps per core if they didnt changed anything recently.
GCE bandwidth depends on vCPU count, block storage type, and block storage size. In this case though the limiting factor was just the single file transfer to Drive, the link just wasn't that fast (comparatively).
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.