Performance of Rclone vs Azcopy

I'm noticing very slow data transfer rate with rclone (as compared to Azcopy) from local disk to Azure Blob container.

I'm testing on a 19G file which takes 3 mins to upload via Azcopy. Rclone copy takes 31 mins for the same file. I have another 9.8 GB file which takes 1.6 mins to upload via Azcopy and 15 mins to upload using rclone.

What am I missing?

rclone v1.57.0
- os/version: centos 7.6.1810 (64 bit)
- os/kernel: 3.10.0-957.5.1.el7.x86_64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.17.2
- go/linking: static
- go/tags: none

Microsoft Azure Blob Storage

rclone copy test2.tar.gz my_azure_demo:demo -P
type = azureblob
sas_url = https://<myaccount><sastoken>
2021/11/10 06:01:23 DEBUG : rclone: Version "v1.57.0" starting with parameters ["../rclone-v1.57.0-linux-amd64/rclone" "copy" "test2.tar.gz" "my_azure_demo:demo" "-P" "-vv"]
2021/11/10 06:01:23 DEBUG : Creating backend with remote "test2.tar.gz"
2021/11/10 06:01:23 DEBUG : Using config file from "/u/<user>/.config/rclone/rclone.conf"
2021/11/10 06:01:23 DEBUG : fs cache: adding new entry for parent of "test2.tar.gz", "<cwd>"
2021/11/10 06:01:23 DEBUG : Creating backend with remote "my_azure_demo:demo"
2021-11-10 06:01:23 DEBUG : test2.tar.gz: Need to transfer - File not found at Destination
2021-11-10 06:32:38 DEBUG : test2.tar.gz: md5 = 1badc2baf3dd320cd4449e4554a28428 OK
2021-11-10 06:32:38 INFO  : test2.tar.gz: Copied (new)
Transferred:       19.037 GiB / 19.037 GiB, 100%, 9.609 MiB/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:     31m14.8s
2021/11/10 06:32:38 INFO  : 
Transferred:       19.037 GiB / 19.037 GiB, 100%, 9.609 MiB/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:     31m14.8s

2021/11/10 06:32:38 DEBUG : 12 go routines active

Thank you in advance!

The defaults for rclone are set reasonably conservatively so they don't use too much CPU / memory.

If you want to make your transfer run quicker then try increasing

  --azureblob-chunk-size SizeSuffix             Upload chunk size (<= 100 MiB) (default 4Mi)

This will use more memory as rclone will have 4 or so chunks in memory at once per transfer. The default is probably too conservative - I can't remember why I chose 4MiB though...

I notices that rclone uses 4 upload streams and that isn't configurable. If you can't make rclone match the speed of azcopy then I can make that number configurable too.

azcopy is written in Go and uses the same library as rclone so rclone should be able to match its performance.

Note also rclone will be making md5 checksums in advance which will slow it down (but is good for data integrity). You can turn these off with

  --azureblob-disable-checksum                  Don't store MD5 checksum with object metadata

If you want more speed.

1 Like

Thank you for the detailed reply. I have a better understanding now.

With --azureblob-chunk-size 102400 --azureblob-disable-checksum, the data transfer rates are not improving. The rates I observed vary between 4x (best case) to 10x (worst case) slower than Azcopy.

Like you said, changing the number of upload streams (i.e. MaxBuffers) might be the way forward.

From the top output, I can confirm that rclone is reporting just 0.46G RES vs 3.3G for azcopy. Likewise, the VIRT reported is just 1.1G vs 4.4G for azcopy. Clearly, azcopy by default is using much more memory.

Please let me know if more data points are needed.

Also, I'd be happy to test any subsequent changes and share feedback.

Thank you once again!

Interesting tests thank you.

Here is a version of rclone with an --azureblob-upload-concurrency flag. The default is 4 simultaneous uploads as before.

I also took the 100M cap off --azureblob-chunk-size as I don't think it is needed any more.

I'd be very interested in your experiments!

Note that you can specify the chunk size as 128M or 1G which you might find easier.

v1.58.0-beta.5881.1b9534dae.fix-azureblob-concurrency on branch fix-azureblob-concurrency (uploaded in 15-30 mins)

Thank you, this helps! I'll report the findings. I'm running multiple iterations of several combinations of concurrency and chunk size settings.

Thank you - very interested in your findings. I looked at the source of the azcopy tool and failed to work out what their default settings are yet! The code is quite complicated and obviously optimized for performance.

The data below is the Average Transfer Rate in Mbps (>300 uploads are tested overall to generate this summary). The performance clearly improves with smaller chunk sizes and more concurrency.

For comparison - Azcopy averages ~865 Mbps and the NIC is rated 1,000 Mbps. To match Azcopy's data transfer rates, concurrency 64 and chunk size 4M seem to be good settings.

At the very least, allowing users to set --azureblob-upload-concurrency is a very valuable enhancement!

Thank you!

That is a fantastic bit of testing - thank you.

Increasing concurrency seems to be a consistent win up to 32 or 64.

However increasing chunk size seems to be a net loss which is not what I expected.

From that table I don't see any reason to increase the default chunk size of 4M.

I could increase the default concurrency though..

I did a few tests myself on a cloud VM and it seems that the performance increase starts off quite linear when increasing concurrency. The memory usage goes up by approx chunk_size * concurrency as you might expect.

chunk_size 4MiB

concurrency speed (MiB/s)
4 8.7
8 13
16 28
32 56
64 113
128 170

We could certainly afford to increase the default from 4 - I could make it 16 which would make the memory usage per transfer be 64M which is in line with the other rclone backends.

What performance do you see for chunk size 4M with concurrency 4,8,16,32?

Yes, agreed with everything you mentioned. Surely, It's a good idea to increase the default concurrency.

I'm already testing chunk size 4M with the remaining concurrency values and will keep you posted.

1 Like

Here is the updated data -

With concurrency 16, it still falls short of Azcopy - whereas 64 looks good.

Thank you!

Thanks for that testing :slight_smile:

I've raised the default to 16 - I don't think I can go higher than that by default without blowing up all the raspberry Pi's running rclone!

I've put a note in the help about raising it for more performance. You can set this as a parameter in the config file upload_concurrency = 64 which is convenient!

I've merged this to master now - it will appear in this beta and all subsequent ones.

v1.58.0-beta.5899.df07964db on branch master (uploaded in 15-30 mins)

Thank you again for doing the performance tests - very useful!

1 Like

I understand and very much appreciate this prompt change. Thanks a bunch!

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.