Multi-thread download disk IO

Hi,

Posting this in the hope that it might help someone avoid the debugging process I went through to solve this mystery! It's a bit of a edge use case, but definitely relevant.

I'm using rClone to download a lot of data (~100GB/day) from a FTP share onto an Azure premium file share (not blob). I have this setup using the Docker container in an Azure container instance with the share mounted to it. Transfers were working for most files, but slowed way down on larger (~300MB) files.

Looking into the logs, I saw that my file share was throttling from maxing out the disk IO per second. My share gives 40 MiBytes of bandwidth, but only 200 IOPS, so this was the limiting factor. After trying a bunch of settings (notably buffer size, available memory, and the number of transmissions), I stumbled on the multi-threaded downloading feature, which kicks in when downloads are larger than 250MB!

Long story short, disabling this by setting --multi-thread-streams to 0 solves the issue. It's still IO throttled (moving data from one high performance centre to another is fast), but much closer to the theoretical bandwidth.

Not sure how widely applicable this issue would be- presumably any download to an IO limited disk would be affected, but an IO limit without a corresponding bandwidth cap outside of cloud computing setting is pretty rare. Thought I'd pass along the note though!

Interesting point. Multi thread streams speed up downloads to local disks quite a lot normally, but you are 100% correct,that is at cost of more local disk IO and in particular more non-local disk IO. It shouldn't be a massive multiplier though as the same bytes get downloaded in the end.

However I recently fixed a problem with multithread downloads not setting the Sparse file flag on Windows which makes the problem a lot worse as the first thing that happens is that the OS fills the file with 0s. You don't say if you are using Windows or not, but that could be a problem. If so it would be fixed by the latest beta.

Interesting- I’m happy to know that my gut feel for the processes going on here was on the right track!

This test was done in an Azure Docket container instance, which I’m 90% sure runs on Linux... however, the local storage is a mounted SMB share running on Windows, so that might play a role. I’ll update this whenever the patch gets pushed to the docker container, and see if that sparse flag fixes the issue.

No idea whether the sparse files will work on SMB over windows. The code change was only for Windows - linux expects all file systems to be sparse capable or write zeros if not so I suspect the code change won't help here. Disabling multi-thread copies would though!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.