Best way to maximize 100gbps GCP gVNIC?

What is the problem you are having with rclone?

I want to perform Cloud-to-Cloud transfers using rclone that maximize the bandwidth I can get (100gbps)

I am currently seeing only 2.2gbps speeds when doing a GCS to crypt(GDrive) transfer. Even Ookla's speedtest CLI reports higher (albeit not close to 100gbps)

Run the command 'rclone version' and share the full output of the command.

rclone v1.53.3-DEV
- os/arch: linux/amd64
- go version: go1.18

Which cloud storage system are you using? (eg Google Drive)

rclone-crypt(Google Drive), GCS, rclone-crypt(Box) and Dropbox

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone -v copy GCS:takeout-export-HASH crypt-gdrive: --drive-chunk-size 512M

A log from the command with the -vv flag

2022/09/04 23:24:50 INFO  : 
Transferred:   	   16.008G / 17.252 TBytes, 0%, 273.852 MBytes/s, ETA 18h19m58s
Checks:                 4 / 4, 100%
Transferred:            4 / 5480, 0%
Elapsed time:       1m5.7s
Transferring:
 * tZ/1.zip: 87% /3.475G, 53.395M/s, 8s
 * tZ/2.zip: 20% /3.993G, 85.328M/s, 38s
 * tZ/3.zip: 16% /3.993G, 124.827M/s, 27s
 * tZ/4.zip:  2% /2.270G, 31.818M/s, 1m11s

Adding --transfers 16 and changing to --drive-chunk-size 1G also keeps it limited to 280MBytes/s

Is there a way to see if the source (GCS) side is slow or if the destination (rclone-crypt(GDrive)) is slow? Which one is the bottleneck?

That's a very old version so updating would be good.

What's the current bottleneck you are seeing? CPU?

I can't figure out what remote is what as you didn't share a rclone.conf.

To my understanding, you'll be paying egress out to dropbox so you'll get quite a bill if you transfer large amounts of data out.

1 Like

Thanks for looking into this!

Version updated.

$rclone --version
rclone v1.59.1
- os/version: ubuntu 22.04 (64 bit)
- os/kernel: 5.15.0-1017-gcp (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.18.5
- go/linking: static
- go/tags: none

Here's the config (I removed Dropbox to make this easier to debug for now):

~$ rclone config show
Enter configuration password:
password:

[GCSTakeout]
type = google cloud storage
object_acl = authenticatedRead
token = 

[GDrive]
type = drive
client_id = x.apps.googleusercontent.com
client_secret = 
token = 
team_drive = x
root_folder_id = 

[rGDrive]
type = crypt
remote = Gdrive:folderpath
password = x
password2 = x-x

The latest command I've tried:

rclone -v copy GCSTakeout:takeout-export-HASH rGDrive: --drive-chunk-size 1G --transfers 16

It's not CPU bound afaict as it remains under 10% on GCP dashboard as well as tops:

I find that when I do two separate rclone copy commands (for different source buckets), I am able to get BW utilization to 600MiB/s (vs ~350MiB/s for just one run)

Is there a way to test the max bandwidth of a source or destination?

Are you maxing out one core maybe?

Usual things that cause bottlenecks

  • hashing - try --size-only - I don't think that is the case here
  • crypt - this uses a significant amount of CPU - maybe this is maxing out one core?

You can profile rclone's CPU usage with the --rc flag and the hints in the rc docs.

You can also run rclone with the --cpuprofile flag which will generate a file you can examine with go pprof.

  --cpuprofile string   Write cpu profile to file

Is it just a single file you are copying

Like a speedtest command for rclone? Nice idea...

could have something like

rclone test speed remote:dir

and it would upload files of increasing sizes and download them again to check the speed.

2 Likes

I sometimes do a quick non-scientific download measurement like this:

rclone check --download myRemote:folder/with/large/files myRemote:folder/with/large/files --progress

It is a hack that will download and binary compare files from the same remote folder without hitting local disk.

Note: rclone check performs the download using the --checkers. You may need to increase/double them until you reach max speed.

1 Like

Thanks @Ole and @ncw for taking the time to look at my issue!

  1. For testing if I'm CPU bound, I bypassed the crypt completely and had rclone upload 'directly' to GDrive with
    $ rclone -v copy GCSTakeout:takeout-export-HASH GDrive:/folder/ --drive-chunk-size 1G --transfers 16 --checkers 16 --progress
    (same config as above. Note that I'm using GDrive as the destination here, not rGdrive)

CPU usage is still low. It's 3~4% instead of the 6~10% with crypt.
Bandwidth utilization is still at ~280MiB/s though. I even threw in --checkers here but that didn't help either.

The transfer is of ~4000 files of 10+ GiB each (with that one folder being about ~16TiB)

Yep! But instead of random Ookla servers or M-Lab servers (for NDT) this would be the actual servers from the Cloud service you care about.

1 Like

I did another experiment where I ran multiple rclone processes with the same copy command but to different folders. After 6 separate processes started (multiple tmux windows), I was able to hit a spike of 2GiB/s (and 15% CPU Utilization) for a brief moment before it came down to 1.8GiB/s

What is it about parallel processes/launches that rclone can't natively parallelize?

In some degree, this idea of maxing out at 100gbps is not that academic, 25gbps are showing up in some home internet packages now. I would love to debug this as rclone is such a great piece of software in general.

1 Like

This is a cool idea @Ole!

I did this across 16 synchronized panes of tmux:

rclone check --download GCSTakeout:takeout-export-HASH GCSTakeout:takeout-export-HASH --progress

I was able to get to ~6GiB/s with ~50% CPU Utilization:

Doing the same command just single, still gives me ~300MiB/s

You are right, it seems like something in rclone (including Go runtime, https support, etc.) is limiting your throughput. The big question is whether this is something configurable in rclone (or Go) or due to the design/architecture.

I think the first step is to check if there are any issues related to throttling or errors in the communication and therefore suggest you try with debug output like this:

rclone check --download --log-file=check_download.log --log-level=DEBUG GCSTakeout:takeout-export-HASH GCSTakeout:takeout-export-HASH --progress

Next grep the log for "pacer:" - do you find any hits? Anything suspicious in the log?

Next, I am curios to understand just how many transfers are needed to max out the download speed. Can you tell us the rough download speed for each of these commands:

rclone check --download --checkers=4 GCSTakeout:takeout-export-HASH GCSTakeout:takeout-export-HASH --progress
rclone check --download --checkers=8 GCSTakeout:takeout-export-HASH GCSTakeout:takeout-export-HASH --progress
rclone check --download --checkers=16 GCSTakeout:takeout-export-HASH GCSTakeout:takeout-export-HASH --progress
rclone check --download --checkers=32 GCSTakeout:takeout-export-HASH GCSTakeout:takeout-export-HASH --progress

and the effect of some of the other tuning parameters:

rclone check --download --buffer-size=256M GCSTakeout:takeout-export-HASH GCSTakeout:takeout-export-HASH --progress
rclone check --download --use-mmap GCSTakeout:takeout-export-HASH GCSTakeout:takeout-export-HASH --progress
rclone check --download --multi-thread-streams=8 GCSTakeout:takeout-export-HASH GCSTakeout:takeout-export-HASH --progress
rclone check --download --disable-http2 GCSTakeout:takeout-export-HASH GCSTakeout:takeout-export-HASH --progress

The selected tuning parameters are inspired by this list of flags and the documentation.

Do you know any tools that are able to perform faster downloads from GCS?

Sorry for the delay. These tests took time to run but I made some progress:
rclone check --download --log-file=check_download.log --log-level=DEBUG GCSTakeout:bucket GCSTakeout:bucket —progress
I checked the log file for pacer and found nothing.
Also, I did the same command with GDrive and still found nothing.

Here are the bandwidth numbers in MiB/s:

| rclone check --download --checkers=4 GCSTakeout:bucket GCSTakeout:bucket ---progress | 303 |
| rclone check --download --checkers=8 GCSTakeout:bucket GCSTakeout:bucket ---progress | 275 |
| rclone check --download --checkers=16 GCSTakeout:bucket GCSTakeout:bucket ---progress | 280 |
| rclone check --download --checkers=32 GCSTakeout:bucket GCSTakeout:bucket ---progress | 280 |
| rclone check --download --buffer-size=256M GCSTakeout:bucket GCSTakeout:bucket ---progress | 360 |
| rclone check --download --use-mmap GCSTakeout:bucket GCSTakeout:bucket ---progress | 355 |
| rclone check --download --multi-thread-streams=8 GCSTakeout:bucket GCSTakeout:bucket ---progress | 353 |
| rclone check --download --disable-http2 GCSTakeout:bucket GCSTakeout:bucket ---progress | 2158 |
| ^ scaled to 16 via tmux | 6461 |
| rclone check --download --disable-http2 --checkers=32 --buffer-size=8G GCSTakeout:bucket GCSTakeout:bucket ---progress | 6400 |
| rclone check --download --disable-http2 ---checkers=32 ---buffer-size=128M GCSTakeout:bucket GCSTakeout:bucket ---progress | 6553 |
| rclone check --download --disable-http2 --checkers=128 --buffer-size=1G GCSTakeout:bucket GCSTakeout:bucket ---progress | 5632 |
| rclone check --download --disable-http2 --checkers=64 --buffer-size=1G GCSTakeout:bucket GCSTakeout:bucket ---progress | 5836 |
| ^ scaled to 16 via tmux | 6144 |

As you can see --disable-http2 seemed to really have unlocked some bottleneck with increasing checkers really helping.

I also tried the same commands with GDrive instead of GCSTakeout. Here's the table, with CPU utilization and speed in MiB/s

|rclone check --download --checkers=4 GDrive:folder GDrive:folder —progress|1%|175|
|rclone check --download --checkers=8 GDrive:folder GDrive:folder —progress|1.8%|370|
|rclone check --download --checkers=16 GDrive:folder GDrive:folder —progress|2.5%|710|
|rclone check --download --checkers=32 GDrive:folder GDrive:folder —progress|2.8%|1400|
|rclone check --download --checkers=128 GDrive:folder GDrive:folder —progress|26%|5632|
|rclone check —download —checkers=512 GDrive:folder GDrive:folder —progress|26%|5632|

Seems like here, checkers were the only thing that unlocked the bottleneck.

Again, can't get above ~6GiB/s

But we've made a lot of progress!

I tried a new copy command:
rclone -v copy --disable-http2 --checkers=32 GCSTakeout:bucket rGDrive:folder --drive-chunk-size=1G --transfers=32 --progress and was able to get 2GiB/s at ~17% CPU Utilization.

Ah, this is because of this Go issue x/net/http2: client can't fully utilize network resource · Issue #37373 · golang/go · GitHub

This is the figure that you got for this experiment

Maybe you have arrived at the limits of the server?

When you do your test with --disable-http2 which seems to be the major improvement, what does CPU utilization look like?

1 Like

Perfect and you did most of it yourself by cleverly building on top of my curious questions.

I would consider a sustained data transfer of 6 GiB/s very good. That does require good end-to-end hardware, software and setup. You are now reaching a download on 80% of the result from speedtest.com. Not bad at all, when comparing a controlled file transfer with https encryption, directory traversal, checksums etc with a raw http transfer of random data.

Also remember that the number you see in rclone is only the size of the files transferred. It doesn't show alle the overhead communication needed to perform the file transfers such as directory listings, reading/storing hash sums etc. etc.

Perhaps we can improve, I have some ideas you can test.

I would

  • increase transfers to 128 based on your GDrive download test (5632/1400*32); I know that this is upload, but it is a good first guess.
  • typically use 2 to 4 times as many checkers as transfers when copying/syncing - especially when the number of files to be transferred are lower than the number of files in the target.
  • optimize Google Drive tunning parameters based on forum consensus, that is add --drive-pacer-min-sleep=10ms --drive-pacer-burst=200
  • keep the default --drive-chunk-size, unless you have tested and found a significant improvement. Note, the effect may change with the number of transfers - so it need to be retested when changing --transfers.

So something like this might be faster:

rclone -v copy --disable-http2 --checkers=256 --transfers=128 --drive-pacer-min-sleep=10ms --drive-pacer-burst=200 GCSTakeout:bucket rGDrive:folder --progress

or this (with large chunk-size and fewer transfers to save memory):

rclone -v copy --disable-http2 --checkers=128 --transfers=64 --drive-chunk-size=512M --drive-pacer-min-sleep=10ms --drive-pacer-burst=200 GCSTakeout:bucket rGDrive:folder --progress

I don't know if you clear the target folder before each test, otherwise you may want to consider adding --ignore-times to force a comparable retransfer of all files.

If one of them is faster, then tweak the transfer, checkers and chunk-size up/down until you find the sweet spot.

You may also be able to optimize the command to your specific data/situation. Here are some flags to consider/test:
--no-check-dest
--no-traverse

Here it is again important to stick to the KISS principle of only testing/adding one flag at the time and only keeping the ones that really makes a significant difference.

Note: I don't use Gdrive that much, but I have an impression you can only upload 750GB/24h - are you aware of this?