Multi threaded downloads - comments and testers needed

Hello @ncw,

Yep, it is working great here -- and you are more than welcome, thank you for all the hard work you've put, and continue to put, into making rclone so great a piece of software.

Cheers,
-- Durval.

1 Like

I've just worked out what it is doing when it is pausing after the download - it is calculating the MD5SUM on the file it has just downloaded. I verified this by killing rclone with SIGQUIT and examining the backtraces. For a sequential download this would be done as it goes along but a parallel download can't do that so it has to read the file again and MD5SUM it.

Here is my tests downloading a 1G file from google drive on a datacenter class internet connection

drive-threads-0.log
Elapsed time:       59.4s
drive-threads-2.log
Elapsed time:       24.1s
drive-threads-3.log
Elapsed time:       23.6s
drive-threads-4.log
Elapsed time:       23.8s
drive-threads-5.log
Elapsed time:       24.2s
drive-threads-6.log
Elapsed time:       24.6s
drive-threads-7.log
Elapsed time:       24.2s
drive-threads-8.log
Elapsed time:       23.8s
drive-threads-9.log
Elapsed time:       24.3s
drive-threads-10.log
Elapsed time:       24.5s

So 2 threads are a big win over 1, but you don't win anything much above that.

If I do the test again with --ignore-checksum (+ a patch to stop it calculating the hash then ignoring it!)

drive-threads-0-ignore-checksum.log
Elapsed time:       45.6s
drive-threads-2-ignore-checksum.log
Elapsed time:       11.7s
drive-threads-3-ignore-checksum.log
Elapsed time:       11.7s
drive-threads-4-ignore-checksum.log
Elapsed time:       11.5s
drive-threads-5-ignore-checksum.log
Elapsed time:       11.5s
drive-threads-6-ignore-checksum.log
Elapsed time:       10.8s
drive-threads-7-ignore-checksum.log
Elapsed time:       11.7s
drive-threads-8-ignore-checksum.log
Elapsed time:       11.1s
drive-threads-9-ignore-checksum.log
Elapsed time:       11.2s

So 12 seconds of the transfer time is spent MD5SUMing 1GB of data, which seems a bit long...

rclone@rclone-testing:~$ time md5sum 1G
e4589c2520a47b62966a7cd86507a40d  1G

real	0m3.160s
user	0m2.734s
sys	0m0.424s
rclone@rclone-testing:~$ time rclone md5sum 1G
e4589c2520a47b62966a7cd86507a40d  1G

real	0m12.468s
user	0m12.245s
sys	0m0.212s

Hmm, I see rclone is calculating all the hashes (SHA1/MD5SUM/Dropbox/etc)... Another patch to change that...

rclone@rclone-testing:~$ time md5sum 1G
e4589c2520a47b62966a7cd86507a40d  1G

real	0m3.263s
user	0m2.770s
sys	0m0.489s
rclone@rclone-testing:~$ time ./rclone md5sum 1G
e4589c2520a47b62966a7cd86507a40d  1G

real	0m3.153s
user	0m2.629s
sys	0m0.527s

Interestingly with box it seems the more threads the better!

box-threads-0.log
Elapsed time:       1m37s
box-threads-2.log
Elapsed time:       31.2s
box-threads-3.log
Elapsed time:       34.6s
box-threads-4.log
Elapsed time:       25.2s
box-threads-5.log
Elapsed time:       21.1s
box-threads-6.log
Elapsed time:         18s
box-threads-7.log
Elapsed time:         16s
box-threads-8.log
Elapsed time:       13.9s
box-threads-9.log
Elapsed time:         13s

B2 is similar

b2-threads-0.log
Elapsed time:     1m13.1s
b2-threads-2.log
Elapsed time:         40s
b2-threads-3.log
Elapsed time:       31.6s
b2-threads-4.log
Elapsed time:       31.3s
b2-threads-5.log
Elapsed time:       19.9s
b2-threads-6.log
Elapsed time:       22.3s
b2-threads-7.log
Elapsed time:       17.4s
b2-threads-8.log
Elapsed time:       17.2s
b2-threads-9.log
Elapsed time:       17.1s

And here are the results for S3 which shows I think that it is network limited even for 1 stream.

s3-threads-0.log
Elapsed time:       11.1s
s3-threads-2.log
Elapsed time:        9.6s
s3-threads-3.log
Elapsed time:        9.8s
s3-threads-4.log
Elapsed time:       10.8s
s3-threads-5.log
Elapsed time:       10.7s
s3-threads-6.log
Elapsed time:        9.5s
s3-threads-7.log
Elapsed time:        9.4s
s3-threads-8.log
Elapsed time:        9.7s
s3-threads-9.log
Elapsed time:       12.1s

I found with my experiments with drive that at about 250MB 1 and 2 threads are the same speed. whereas with 100MB 1 thread is faster than 2 threads.

So after all that testing, I propose as the defaults

  • --multithread-upload-cutoff 250M
  • --multithread-upload-streams 4

The former probably isn't too controversial, but the latter is quite conservative.

So size of file vs streams gives

  • < 250M 1 stream
  • < 500M 2 streams
  • < 750M 3 streams
  • = 750M 4 streams

Thoughts?

2 Likes

I've merged this to master now which means it will be in the latest beta in 15-30 mins and released in v1.48.

Thanks for testing :smile:

Thoughts about the defaults welcome too!

Wouldn't that depend on the internet bandwidth available?

Yes it would. It is a compromise. 250M seemed like a reasonable point to start having multiple threads. Not all services give benefits with more and more threads so I chose 4. I could be persuaded that that number is too high, but I don't want to make it higher by default.

been going through this wonderful thread - out of sheer ignorance - will this work from a rclone http server or to a wedav serve ?

If you use rclone to download from rclone serve http or rclone serve webdav then it will work. It won't work for uploads to either of those (http doesn't support uploads anyway).

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.