Multi threaded downloads - comments and testers needed

Thanks for the info @ncw, and it’s (encrypted) GoogleDrive all the way for me, so I really look forward to putting that multithreaded code through its paces!

One question: would that multithreaded code work in rclone serve restic mode? From what I’ve seen so far about that mode, I’m guessing not (I think that, similarly to --transfers, this would not work because when rclone is serve’ing restic, it’s the latter that ‘takes the initiative’ rergarding threads and I/O), but I thought better to ask anyway.

I think multithreaded uploads should work now with serve restic where the backend support them (eb B2).

Actually I looked at the Google drive code again and it does not do multithread uploads - sorry :frowning: I looked at the drive upload API and I don’t think the API is capable of it. I think it needs each chunk one at a time.

@ncw it is not included here?
https://beta.rclone.org/test/testbuilds-latest/

It won’t appear there until I have merged it. There should be an android build for it too, just not quite sure where!

thanks

also question:
this feature is only active when sync copy move or also with serve?

This will work for anything which calls the internal operations.Copy function. This is sync/copy/move and friends copyto/moveto. If you are using rclone mount with --vfs-cache-mode full it will be active. It won’t be active for rclone serve though, but if you are using rclone serve http then you could use rclone to do multithreaded downloads from it.

What about serve webdav?

rclone serve webdav supports Range requests so you could use multithreaded downloads from it just fine :slight_smile:

When the download of the threads themselves complete, there is a decent delay at the end. Is this rclone reassembling the downloads?

I ran some basic tests but i’ll do more.
rclone copy robgs:xx.dmp.gz . -vv -P --multi-thread-streams X

with 4 threads:

2019-05-02 11:17:07 DEBUG : xx.dmp.gz: Finished multi-thread copy with 4 parts of size 301.849M
2019-05-02 11:17:34 INFO  : xx.dmp.gz: Multi-thread Copied (new)

With 2 threads.

2019-05-02 11:23:57 DEBUG : xx.dmp.gz: Finished multi-thread copy with 2 parts of size 603.697M
2019-05-02 11:24:32 INFO  : xx.dmp.gz: Multi-thread Copied (new)

with 12 threads

2019-05-02 11:20:13 DEBUG : xx.dmp.gz: Finished multi-thread copy with 12 parts of size 100.616M
2019-05-02 11:21:00 INFO  : xx.dmp.gz: Multi-thread Copied (new)

I compared the multi-threaded rclone to a regular copy and the regular copy downloaded faster each time which i thought was interesting and unexpected.

1  baseline 1m58.8s
4  streams  2m23.2s
12 streams  2m45s
2  streams  2m37.7s

If you subtract out that period at the end which I think its reassembling (?) then they all finish pretty close to each other.

I also tried a google compute and results are much faster but similar as compared to a single stream.

google compute
1 baseline 6.3s
50 streams 20m cut 15.3s
2 streams 20m cut 15.5s
8 streams 20m cut 14s
4 streams 20m cut 14.1s

My observation is that the time to assemble ((which is what im guessing its doing at the end) always negates the improvement.

Hi @ncw,

Too bad… :frowning: But then, there’s no cure for stupid APIs… :frowning: Google should have done it better.

Anyway, thanks for the response.

– Durval.

Hello @ncw,

just did my first test-run of the v1.47.0-028-g1de11062-fix-2252-multipart-download-beta version, happy to report it worked great!

More details:

  1. it was a ~2.5GB download from an encrypted google drive directory to a local directory.
  2. I ran it with --multi-thread-cutoff=1 --multi-thread-streams=4 and --transfers=1 to try and stress the mutlthreading code.
  3. it performed admirably: the -v -v output showed each file being split in four equal parts, and then each one being handled by a separate thread. The main objective was accomplished: it easily saturated the lousy 15Mbps link I was on at the time, much to the chagrin of everyone else trying to use it :wink:
  4. AND… the transfer was 100% correct! I checked it on both sides with md5sum and every single file was reported as OK.
  5. I also tried using it for an upload to that same Encrypted Google Drive: as you predicted, it did not work, but in a very smart way: the multithreaded code simply did not engage, i.e. it was like the --multi-thread-cutoff=1 --multi-thread-streams=4 were ignored, so nothing was “broken” and it simply proceeded to do the upload in the usual single-threaded manner.

So, congrats and thank you very much for another great rclone feature! :wink: This is one thing I will be using a LOT.

EDIT:

Just for the record, the exactly same thing happened here with downloads from an SFTP remote: the multithread options were silently ignored and the files proceeded to transfer in “normal” single-threaded mode.

Cheers,
– Durval.

Hi @calisro,

Weird.

I went through my -v -v output (see my last post above) and I’m seeing literally zero time between the last multi-thread copy: stream n/4 (x-y size nM finished message and the Finished multi-thread copy with 4 parts of size xM and Multi-thread Copied (new) messages for each file (all three are logged at exactly the same second), so I guess that, at least in my case, there’s no “assembly” time…

Cheers,
– Durval.

Thanks. I’m going to retest and see if I can figure out what is happening at that time.

They Don’t allow multi thread uploads because use cpu to merge all the parts. And Google thinks is not necessary for cumums users.

Also Google drive have a great upload speed 50 MB/s with good connection

Thank you for testing :smile:

Downloads should work for an SFTP remote… It worked when I tried it

 2019/05/07 11:37:09 DEBUG : rclone: Version "v1.47.0-DEV" starting with parameters ["rclone" "copyto" "-vv" "TestSftp:100M" "/tmp/100M" "--multi-thread-cutoff" "10M"]
2019/05/07 11:37:09 DEBUG : Using config file from "/home/ncw/.rclone.conf"
2019/05/07 11:37:10 DEBUG : 100M: Couldn't find file - need to transfer
2019/05/07 11:37:10 DEBUG : 100M: Starting multi-thread copy with 4 parts of size 25M
2019/05/07 11:37:10 DEBUG : 100M: multi-thread copy: stream 4/4 (78643200-104857600) size 25M starting
2019/05/07 11:37:10 DEBUG : 100M: multi-thread copy: stream 1/4 (0-26214400) size 25M starting
2019/05/07 11:37:10 DEBUG : 100M: multi-thread copy: stream 2/4 (26214400-52428800) size 25M starting
2019/05/07 11:37:10 DEBUG : 100M: multi-thread copy: stream 3/4 (52428800-78643200) size 25M starting
2019/05/07 11:37:10 DEBUG : 100M: multi-thread copy: stream 4/4 (78643200-104857600) size 25M finished
2019/05/07 11:37:11 DEBUG : 100M: multi-thread copy: stream 1/4 (0-26214400) size 25M finished
2019/05/07 11:37:11 DEBUG : 100M: multi-thread copy: stream 2/4 (26214400-52428800) size 25M finished
2019/05/07 11:37:11 DEBUG : 100M: multi-thread copy: stream 3/4 (52428800-78643200) size 25M finished
2019/05/07 11:37:11 DEBUG : 100M: Finished multi-thread copy with 4 parts of size 25M
2019/05/07 11:37:12 INFO  : 100M: Multi-thread Copied (new)
2019/05/07 11:37:12 INFO  : 
Transferred:   	      100M / 100 MBytes, 100%, 41.089 MBytes/s, ETA 0s
Errors:                 0
Checks:                 0 / 0, -
Transferred:            1 / 1, 100%
Elapsed time:        2.4s

2019/05/07 11:37:12 DEBUG : 39 go routines active
2019/05/07 11:37:12 DEBUG : rclone: Version "v1.47.0-DEV" finishing with parameters ["rclone" "copyto" "-vv" "TestSftp:100M" "/tmp/100M" "--multi-thread-cutoff" "10M"]

There should be 0 time to assemble. Rclone preallocates the file (using OS magic which takes no time) then writes directly into the file at the correct place, so it should make nice non sparse files.

Hi @ncw,

I may have messed something up in this test (for example, this) . Will redo and get back to you.

Cheers,
– Durval.

Hello @ncw,

Just triple-checked and this is not the case; to avoid any ambiguities with specifying pure numbers to --multi-thread-cutoff as raised here, I also specified it with an explicit 1k (and yes, all the files are larger than 1KB).

Here’s what I get from -v -v (plx: is my SFTP remote, and egd: the encrypted Google Drive one):

2019/05/08 20:29:22 DEBUG : rclone: Version "v1.47.0-028-g1de11062-fix-2252-multipart-download-beta" starting with parameters ["rclone" "-v" "-v" "--checkers=12" "--transfers=12" "--multi-thread-cutoff=1k" "--multi-thread-streams=4" "--low-level-retries=100" "copyto" "plx:REDACTED1 "egd:REDACTED2"]
2019/05/08 20:29:22 DEBUG : Using config file from "/home/rclone/.rclone.conf"
2019/05/08 20:29:31 ERROR : REDACTED3: Failed to copy: failed to open source object: Open: couldn't connect SSH: ssh: handshake failed: EOF
2019/05/08 20:29:31 ERROR : REDACTED4: error reading source directory: List failed: dirExists: couldn't connect SSH: ssh: handshake failed: EOF
2019/05/08 20:29:37 DEBUG : t7jdkrvoi13onj81b6csehcmd36en2fc0iie2phj85ft0ucvh9b0ocgf0e0dj6mhbu7tu9fcr220qkc8uj28hi0f9haa9dfao2rf8n0/t7jdkrvoi13onj81b6csehcmd36en2fc0iie2phj85ft0ucvh9b0ocgf0e0dj6mhbu7tu9fcr220qkc8uj28hi0f9haa9dfao2rf8n0: Sending chunk 0 length 8388608
2019/05/08 20:29:38 DEBUG : bjl5ot8093atgmeb180o217otmn80rmen7670inloc0uqg05jl69toi0kktucinqde975c6p3r11e/bjl5ot8093atgmeb180o217otmn80rmen7670inloc0uqg05jl69toi0kktucinqde975c6p3r11e: Sending chunk 0 length 8388608
2019/05/08 20:29:38 INFO  : Encrypted drive 'egd:REDACTED2': Waiting for checks to finish
2019/05/08 20:29:38 INFO  : Encrypted drive 'egd:REDACTED2': Waiting for transfers to finish
2019/05/08 20:29:40 DEBUG : 53qmbm9i6rusr24bcbl153el9b5hlr1e3e6jd5nm81uol6it41lgk5u1kof0on1iavousfe1v8l26/53qmbm9i6rusr24bcbl153el9b5hlr1e3e6jd5nm81uol6it41lgk5u1kof0on1iavousfe1v8l26: Sending chunk 0 length 8388608
2019/05/08 20:29:40 DEBUG : bjl5ot8093atgmeb180o217otmn80rmen7670inloc0uqg05jl69toi0kktucinqde975c6p3r11e/bjl5ot8093atgmeb180o217otmn80rmen7670inloc0uqg05jl69toi0kktucinqde975c6p3r11e: Sending chunk 8388608 length 8388608
2019/05/08 20:29:41 DEBUG : l3skaibth618p7kkladak86hfrku6e8pqvihvho9q1av2bihusasb4mfp979gjnqc6e6v6ug7orues0epfc88pnt8c5kk22nqa6d0fo/l3skaibth618p7kkladak86hfrku6e8pqvihvho9q1av2bihusasb4mfp979gjnqc6e6v6ug7orues0epfc88pnt8c5kk22nqa6d0fo: Sending chunk 0 length 8388608
2019/05/08 20:29:42 DEBUG : l3skaibth618p7kkladak86hfrku6e8pqvihvho9q1av2bihusasb4mfp979gjnqc6e6v6ug7orues0epfc88pnt8c5kk22nqa6d0fo/l3skaibth618p7kkladak86hfrku6e8pqvihvho9q1av2bihusasb4mfp979gjnqc6e6v6ug7orues0epfc88pnt8c5kk22nqa6d0fo: Sending chunk 8388608 length 8388608
2019/05/08 20:29:43 DEBUG : 1es1ip9tep31583l177p4hv4u6lpu80eg5aft5ovf4dkgvbualgunf7t9dhb19tmg3l6eaim8dmd21dpg87v65q52gd6dar0hmf14jeq96omd2713r8oj7ijmiskvhii/1es1ip9tep31583l177p4hv4u6lpu80eg5aft5ovf4dkgvbualgunf7t9dhb19tmg3l6eaim8dmd21dpg87v65q52gd6dar0hmf14jeq96omd2713r8oj7ijmiskvhii: Sending chunk 0 length 8388608
2019/05/08 20:29:43 DEBUG : hkltbk0pdek31hu0gb67rvsk7m8q9dcgq8uq3pu1rsea6hrdfogcrj04vpj7qfdjvkrbfhs7bjuno/hkltbk0pdek31hu0gb67rvsk7m8q9dcgq8uq3pu1rsea6hrdfogcrj04vpj7qfdjvkrbfhs7bjuno: Sending chunk 0 length 8388608

ie, no “multi-thread copy” messages… :-/

The only thing I can see above is the Failed to copy: failed to open source object: Open: couldn't connect SSH: ssh: handshake failed: EOF message, but I’ve been getting it since forever, and it doesn’t seem to affect data transfer (as it proceeds with no issues, just single-threadedly).

What gives?

– Durval.

Interestingly enough. Retested and see no waiting at the last stage anymore. Perhaps a Disk IO issue or something else involved. Not really sure. working great now. :slight_smile:

Glad to hear it’s working for you now :wink: This feature really rocks, it’s going to help a lot with my usage here where it’s impossible to saturate the link with just a single TCP connection.

Ah… Multi thread copy will only work to the local backend for the time being. It could in theory work for some other backends but it is really complicated and I haven’t got my head round how to do it!