I'm noticing very slow data transfer rate with rclone (as compared to Azcopy) from local disk to Azure Blob container.
I'm testing on a 19G file which takes 3 mins to upload via Azcopy. Rclone copy takes 31 mins for the same file. I have another 9.8 GB file which takes 1.6 mins to upload via Azcopy and 15 mins to upload using rclone.
This will use more memory as rclone will have 4 or so chunks in memory at once per transfer. The default is probably too conservative - I can't remember why I chose 4MiB though...
I notices that rclone uses 4 upload streams and that isn't configurable. If you can't make rclone match the speed of azcopy then I can make that number configurable too.
azcopy is written in Go and uses the same library as rclone so rclone should be able to match its performance.
Note also rclone will be making md5 checksums in advance which will slow it down (but is good for data integrity). You can turn these off with
--azureblob-disable-checksum Don't store MD5 checksum with object metadata
Thank you for the detailed reply. I have a better understanding now.
With --azureblob-chunk-size 102400 --azureblob-disable-checksum, the data transfer rates are not improving. The rates I observed vary between 4x (best case) to 10x (worst case) slower than Azcopy.
Like you said, changing the number of upload streams (i.e. MaxBuffers) might be the way forward.
From the top output, I can confirm that rclone is reporting just 0.46G RES vs 3.3G for azcopy. Likewise, the VIRT reported is just 1.1G vs 4.4G for azcopy. Clearly, azcopy by default is using much more memory.
Please let me know if more data points are needed.
Also, I'd be happy to test any subsequent changes and share feedback.
Thank you - very interested in your findings. I looked at the source of the azcopy tool and failed to work out what their default settings are yet! The code is quite complicated and obviously optimized for performance.
The data below is the Average Transfer Rate in Mbps (>300 uploads are tested overall to generate this summary). The performance clearly improves with smaller chunk sizes and more concurrency.
For comparison - Azcopy averages ~865 Mbps and the NIC is rated 1,000 Mbps. To match Azcopy's data transfer rates, concurrency 64 and chunk size 4M seem to be good settings.
At the very least, allowing users to set --azureblob-upload-concurrency is a very valuable enhancement!
Increasing concurrency seems to be a consistent win up to 32 or 64.
However increasing chunk size seems to be a net loss which is not what I expected.
From that table I don't see any reason to increase the default chunk size of 4M.
I could increase the default concurrency though..
I did a few tests myself on a cloud VM and it seems that the performance increase starts off quite linear when increasing concurrency. The memory usage goes up by approx chunk_size * concurrency as you might expect.
chunk_size 4MiB
concurrency
speed (MiB/s)
4
8.7
8
13
16
28
32
56
64
113
128
170
We could certainly afford to increase the default from 4 - I could make it 16 which would make the memory usage per transfer be 64M which is in line with the other rclone backends.
What performance do you see for chunk size 4M with concurrency 4,8,16,32?
I've raised the default to 16 - I don't think I can go higher than that by default without blowing up all the raspberry Pi's running rclone!
I've put a note in the help about raising it for more performance. You can set this as a parameter in the config file upload_concurrency = 64 which is convenient!
I've merged this to master now - it will appear in this beta and all subsequent ones.