Hello @ncw,
Yep, it is working great here -- and you are more than welcome, thank you for all the hard work you've put, and continue to put, into making rclone
so great a piece of software.
Cheers,
-- Durval.
Hello @ncw,
Yep, it is working great here -- and you are more than welcome, thank you for all the hard work you've put, and continue to put, into making rclone
so great a piece of software.
Cheers,
-- Durval.
I've just worked out what it is doing when it is pausing after the download - it is calculating the MD5SUM on the file it has just downloaded. I verified this by killing rclone with SIGQUIT and examining the backtraces. For a sequential download this would be done as it goes along but a parallel download can't do that so it has to read the file again and MD5SUM it.
Here is my tests downloading a 1G file from google drive on a datacenter class internet connection
drive-threads-0.log
Elapsed time: 59.4s
drive-threads-2.log
Elapsed time: 24.1s
drive-threads-3.log
Elapsed time: 23.6s
drive-threads-4.log
Elapsed time: 23.8s
drive-threads-5.log
Elapsed time: 24.2s
drive-threads-6.log
Elapsed time: 24.6s
drive-threads-7.log
Elapsed time: 24.2s
drive-threads-8.log
Elapsed time: 23.8s
drive-threads-9.log
Elapsed time: 24.3s
drive-threads-10.log
Elapsed time: 24.5s
So 2 threads are a big win over 1, but you don't win anything much above that.
If I do the test again with --ignore-checksum
(+ a patch to stop it calculating the hash then ignoring it!)
drive-threads-0-ignore-checksum.log
Elapsed time: 45.6s
drive-threads-2-ignore-checksum.log
Elapsed time: 11.7s
drive-threads-3-ignore-checksum.log
Elapsed time: 11.7s
drive-threads-4-ignore-checksum.log
Elapsed time: 11.5s
drive-threads-5-ignore-checksum.log
Elapsed time: 11.5s
drive-threads-6-ignore-checksum.log
Elapsed time: 10.8s
drive-threads-7-ignore-checksum.log
Elapsed time: 11.7s
drive-threads-8-ignore-checksum.log
Elapsed time: 11.1s
drive-threads-9-ignore-checksum.log
Elapsed time: 11.2s
So 12 seconds of the transfer time is spent MD5SUMing 1GB of data, which seems a bit long...
rclone@rclone-testing:~$ time md5sum 1G
e4589c2520a47b62966a7cd86507a40d 1G
real 0m3.160s
user 0m2.734s
sys 0m0.424s
rclone@rclone-testing:~$ time rclone md5sum 1G
e4589c2520a47b62966a7cd86507a40d 1G
real 0m12.468s
user 0m12.245s
sys 0m0.212s
Hmm, I see rclone is calculating all the hashes (SHA1/MD5SUM/Dropbox/etc)... Another patch to change that...
rclone@rclone-testing:~$ time md5sum 1G
e4589c2520a47b62966a7cd86507a40d 1G
real 0m3.263s
user 0m2.770s
sys 0m0.489s
rclone@rclone-testing:~$ time ./rclone md5sum 1G
e4589c2520a47b62966a7cd86507a40d 1G
real 0m3.153s
user 0m2.629s
sys 0m0.527s
Interestingly with box it seems the more threads the better!
box-threads-0.log
Elapsed time: 1m37s
box-threads-2.log
Elapsed time: 31.2s
box-threads-3.log
Elapsed time: 34.6s
box-threads-4.log
Elapsed time: 25.2s
box-threads-5.log
Elapsed time: 21.1s
box-threads-6.log
Elapsed time: 18s
box-threads-7.log
Elapsed time: 16s
box-threads-8.log
Elapsed time: 13.9s
box-threads-9.log
Elapsed time: 13s
B2 is similar
b2-threads-0.log
Elapsed time: 1m13.1s
b2-threads-2.log
Elapsed time: 40s
b2-threads-3.log
Elapsed time: 31.6s
b2-threads-4.log
Elapsed time: 31.3s
b2-threads-5.log
Elapsed time: 19.9s
b2-threads-6.log
Elapsed time: 22.3s
b2-threads-7.log
Elapsed time: 17.4s
b2-threads-8.log
Elapsed time: 17.2s
b2-threads-9.log
Elapsed time: 17.1s
And here are the results for S3 which shows I think that it is network limited even for 1 stream.
s3-threads-0.log
Elapsed time: 11.1s
s3-threads-2.log
Elapsed time: 9.6s
s3-threads-3.log
Elapsed time: 9.8s
s3-threads-4.log
Elapsed time: 10.8s
s3-threads-5.log
Elapsed time: 10.7s
s3-threads-6.log
Elapsed time: 9.5s
s3-threads-7.log
Elapsed time: 9.4s
s3-threads-8.log
Elapsed time: 9.7s
s3-threads-9.log
Elapsed time: 12.1s
I found with my experiments with drive that at about 250MB 1 and 2 threads are the same speed. whereas with 100MB 1 thread is faster than 2 threads.
So after all that testing, I propose as the defaults
The former probably isn't too controversial, but the latter is quite conservative.
So size of file vs streams gives
= 750M 4 streams
Thoughts?
I've merged this to master now which means it will be in the latest beta in 15-30 mins and released in v1.48.
Thanks for testing
Thoughts about the defaults welcome too!
Wouldn't that depend on the internet bandwidth available?
Yes it would. It is a compromise. 250M seemed like a reasonable point to start having multiple threads. Not all services give benefits with more and more threads so I chose 4. I could be persuaded that that number is too high, but I don't want to make it higher by default.
been going through this wonderful thread - out of sheer ignorance - will this work from a rclone http server or to a wedav serve ?
If you use rclone to download from rclone serve http
or rclone serve webdav
then it will work. It won't work for uploads to either of those (http doesn't support uploads anyway).
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.