Multi threaded downloads - comments and testers needed

When the download of the threads themselves complete, there is a decent delay at the end. Is this rclone reassembling the downloads?

I ran some basic tests but i’ll do more.
rclone copy robgs:xx.dmp.gz . -vv -P --multi-thread-streams X

with 4 threads:

2019-05-02 11:17:07 DEBUG : xx.dmp.gz: Finished multi-thread copy with 4 parts of size 301.849M
2019-05-02 11:17:34 INFO  : xx.dmp.gz: Multi-thread Copied (new)

With 2 threads.

2019-05-02 11:23:57 DEBUG : xx.dmp.gz: Finished multi-thread copy with 2 parts of size 603.697M
2019-05-02 11:24:32 INFO  : xx.dmp.gz: Multi-thread Copied (new)

with 12 threads

2019-05-02 11:20:13 DEBUG : xx.dmp.gz: Finished multi-thread copy with 12 parts of size 100.616M
2019-05-02 11:21:00 INFO  : xx.dmp.gz: Multi-thread Copied (new)

I compared the multi-threaded rclone to a regular copy and the regular copy downloaded faster each time which i thought was interesting and unexpected.

1  baseline 1m58.8s
4  streams  2m23.2s
12 streams  2m45s
2  streams  2m37.7s

If you subtract out that period at the end which I think its reassembling (?) then they all finish pretty close to each other.

I also tried a google compute and results are much faster but similar as compared to a single stream.

google compute
1 baseline 6.3s
50 streams 20m cut 15.3s
2 streams 20m cut 15.5s
8 streams 20m cut 14s
4 streams 20m cut 14.1s

My observation is that the time to assemble ((which is what im guessing its doing at the end) always negates the improvement.

Hi @ncw,

Too bad… :frowning: But then, there’s no cure for stupid APIs… :frowning: Google should have done it better.

Anyway, thanks for the response.

– Durval.

Hello @ncw,

just did my first test-run of the v1.47.0-028-g1de11062-fix-2252-multipart-download-beta version, happy to report it worked great!

More details:

  1. it was a ~2.5GB download from an encrypted google drive directory to a local directory.
  2. I ran it with --multi-thread-cutoff=1 --multi-thread-streams=4 and --transfers=1 to try and stress the mutlthreading code.
  3. it performed admirably: the -v -v output showed each file being split in four equal parts, and then each one being handled by a separate thread. The main objective was accomplished: it easily saturated the lousy 15Mbps link I was on at the time, much to the chagrin of everyone else trying to use it :wink:
  4. AND… the transfer was 100% correct! I checked it on both sides with md5sum and every single file was reported as OK.
  5. I also tried using it for an upload to that same Encrypted Google Drive: as you predicted, it did not work, but in a very smart way: the multithreaded code simply did not engage, i.e. it was like the --multi-thread-cutoff=1 --multi-thread-streams=4 were ignored, so nothing was “broken” and it simply proceeded to do the upload in the usual single-threaded manner.

So, congrats and thank you very much for another great rclone feature! :wink: This is one thing I will be using a LOT.

EDIT:

Just for the record, the exactly same thing happened here with downloads from an SFTP remote: the multithread options were silently ignored and the files proceeded to transfer in “normal” single-threaded mode.

Cheers,
– Durval.

Hi @calisro,

Weird.

I went through my -v -v output (see my last post above) and I’m seeing literally zero time between the last multi-thread copy: stream n/4 (x-y size nM finished message and the Finished multi-thread copy with 4 parts of size xM and Multi-thread Copied (new) messages for each file (all three are logged at exactly the same second), so I guess that, at least in my case, there’s no “assembly” time…

Cheers,
– Durval.

Thanks. I’m going to retest and see if I can figure out what is happening at that time.

They Don’t allow multi thread uploads because use cpu to merge all the parts. And Google thinks is not necessary for cumums users.

Also Google drive have a great upload speed 50 MB/s with good connection

Thank you for testing :smile:

Downloads should work for an SFTP remote… It worked when I tried it

 2019/05/07 11:37:09 DEBUG : rclone: Version "v1.47.0-DEV" starting with parameters ["rclone" "copyto" "-vv" "TestSftp:100M" "/tmp/100M" "--multi-thread-cutoff" "10M"]
2019/05/07 11:37:09 DEBUG : Using config file from "/home/ncw/.rclone.conf"
2019/05/07 11:37:10 DEBUG : 100M: Couldn't find file - need to transfer
2019/05/07 11:37:10 DEBUG : 100M: Starting multi-thread copy with 4 parts of size 25M
2019/05/07 11:37:10 DEBUG : 100M: multi-thread copy: stream 4/4 (78643200-104857600) size 25M starting
2019/05/07 11:37:10 DEBUG : 100M: multi-thread copy: stream 1/4 (0-26214400) size 25M starting
2019/05/07 11:37:10 DEBUG : 100M: multi-thread copy: stream 2/4 (26214400-52428800) size 25M starting
2019/05/07 11:37:10 DEBUG : 100M: multi-thread copy: stream 3/4 (52428800-78643200) size 25M starting
2019/05/07 11:37:10 DEBUG : 100M: multi-thread copy: stream 4/4 (78643200-104857600) size 25M finished
2019/05/07 11:37:11 DEBUG : 100M: multi-thread copy: stream 1/4 (0-26214400) size 25M finished
2019/05/07 11:37:11 DEBUG : 100M: multi-thread copy: stream 2/4 (26214400-52428800) size 25M finished
2019/05/07 11:37:11 DEBUG : 100M: multi-thread copy: stream 3/4 (52428800-78643200) size 25M finished
2019/05/07 11:37:11 DEBUG : 100M: Finished multi-thread copy with 4 parts of size 25M
2019/05/07 11:37:12 INFO  : 100M: Multi-thread Copied (new)
2019/05/07 11:37:12 INFO  : 
Transferred:   	      100M / 100 MBytes, 100%, 41.089 MBytes/s, ETA 0s
Errors:                 0
Checks:                 0 / 0, -
Transferred:            1 / 1, 100%
Elapsed time:        2.4s

2019/05/07 11:37:12 DEBUG : 39 go routines active
2019/05/07 11:37:12 DEBUG : rclone: Version "v1.47.0-DEV" finishing with parameters ["rclone" "copyto" "-vv" "TestSftp:100M" "/tmp/100M" "--multi-thread-cutoff" "10M"]

There should be 0 time to assemble. Rclone preallocates the file (using OS magic which takes no time) then writes directly into the file at the correct place, so it should make nice non sparse files.

Hi @ncw,

I may have messed something up in this test (for example, this) . Will redo and get back to you.

Cheers,
– Durval.

Hello @ncw,

Just triple-checked and this is not the case; to avoid any ambiguities with specifying pure numbers to --multi-thread-cutoff as raised here, I also specified it with an explicit 1k (and yes, all the files are larger than 1KB).

Here’s what I get from -v -v (plx: is my SFTP remote, and egd: the encrypted Google Drive one):

2019/05/08 20:29:22 DEBUG : rclone: Version "v1.47.0-028-g1de11062-fix-2252-multipart-download-beta" starting with parameters ["rclone" "-v" "-v" "--checkers=12" "--transfers=12" "--multi-thread-cutoff=1k" "--multi-thread-streams=4" "--low-level-retries=100" "copyto" "plx:REDACTED1 "egd:REDACTED2"]
2019/05/08 20:29:22 DEBUG : Using config file from "/home/rclone/.rclone.conf"
2019/05/08 20:29:31 ERROR : REDACTED3: Failed to copy: failed to open source object: Open: couldn't connect SSH: ssh: handshake failed: EOF
2019/05/08 20:29:31 ERROR : REDACTED4: error reading source directory: List failed: dirExists: couldn't connect SSH: ssh: handshake failed: EOF
2019/05/08 20:29:37 DEBUG : t7jdkrvoi13onj81b6csehcmd36en2fc0iie2phj85ft0ucvh9b0ocgf0e0dj6mhbu7tu9fcr220qkc8uj28hi0f9haa9dfao2rf8n0/t7jdkrvoi13onj81b6csehcmd36en2fc0iie2phj85ft0ucvh9b0ocgf0e0dj6mhbu7tu9fcr220qkc8uj28hi0f9haa9dfao2rf8n0: Sending chunk 0 length 8388608
2019/05/08 20:29:38 DEBUG : bjl5ot8093atgmeb180o217otmn80rmen7670inloc0uqg05jl69toi0kktucinqde975c6p3r11e/bjl5ot8093atgmeb180o217otmn80rmen7670inloc0uqg05jl69toi0kktucinqde975c6p3r11e: Sending chunk 0 length 8388608
2019/05/08 20:29:38 INFO  : Encrypted drive 'egd:REDACTED2': Waiting for checks to finish
2019/05/08 20:29:38 INFO  : Encrypted drive 'egd:REDACTED2': Waiting for transfers to finish
2019/05/08 20:29:40 DEBUG : 53qmbm9i6rusr24bcbl153el9b5hlr1e3e6jd5nm81uol6it41lgk5u1kof0on1iavousfe1v8l26/53qmbm9i6rusr24bcbl153el9b5hlr1e3e6jd5nm81uol6it41lgk5u1kof0on1iavousfe1v8l26: Sending chunk 0 length 8388608
2019/05/08 20:29:40 DEBUG : bjl5ot8093atgmeb180o217otmn80rmen7670inloc0uqg05jl69toi0kktucinqde975c6p3r11e/bjl5ot8093atgmeb180o217otmn80rmen7670inloc0uqg05jl69toi0kktucinqde975c6p3r11e: Sending chunk 8388608 length 8388608
2019/05/08 20:29:41 DEBUG : l3skaibth618p7kkladak86hfrku6e8pqvihvho9q1av2bihusasb4mfp979gjnqc6e6v6ug7orues0epfc88pnt8c5kk22nqa6d0fo/l3skaibth618p7kkladak86hfrku6e8pqvihvho9q1av2bihusasb4mfp979gjnqc6e6v6ug7orues0epfc88pnt8c5kk22nqa6d0fo: Sending chunk 0 length 8388608
2019/05/08 20:29:42 DEBUG : l3skaibth618p7kkladak86hfrku6e8pqvihvho9q1av2bihusasb4mfp979gjnqc6e6v6ug7orues0epfc88pnt8c5kk22nqa6d0fo/l3skaibth618p7kkladak86hfrku6e8pqvihvho9q1av2bihusasb4mfp979gjnqc6e6v6ug7orues0epfc88pnt8c5kk22nqa6d0fo: Sending chunk 8388608 length 8388608
2019/05/08 20:29:43 DEBUG : 1es1ip9tep31583l177p4hv4u6lpu80eg5aft5ovf4dkgvbualgunf7t9dhb19tmg3l6eaim8dmd21dpg87v65q52gd6dar0hmf14jeq96omd2713r8oj7ijmiskvhii/1es1ip9tep31583l177p4hv4u6lpu80eg5aft5ovf4dkgvbualgunf7t9dhb19tmg3l6eaim8dmd21dpg87v65q52gd6dar0hmf14jeq96omd2713r8oj7ijmiskvhii: Sending chunk 0 length 8388608
2019/05/08 20:29:43 DEBUG : hkltbk0pdek31hu0gb67rvsk7m8q9dcgq8uq3pu1rsea6hrdfogcrj04vpj7qfdjvkrbfhs7bjuno/hkltbk0pdek31hu0gb67rvsk7m8q9dcgq8uq3pu1rsea6hrdfogcrj04vpj7qfdjvkrbfhs7bjuno: Sending chunk 0 length 8388608

ie, no “multi-thread copy” messages… :-/

The only thing I can see above is the Failed to copy: failed to open source object: Open: couldn't connect SSH: ssh: handshake failed: EOF message, but I’ve been getting it since forever, and it doesn’t seem to affect data transfer (as it proceeds with no issues, just single-threadedly).

What gives?

– Durval.

Interestingly enough. Retested and see no waiting at the last stage anymore. Perhaps a Disk IO issue or something else involved. Not really sure. working great now. :slight_smile:

Glad to hear it’s working for you now :wink: This feature really rocks, it’s going to help a lot with my usage here where it’s impossible to saturate the link with just a single TCP connection.

Ah… Multi thread copy will only work to the local backend for the time being. It could in theory work for some other backends but it is really complicated and I haven’t got my head round how to do it!

Hello @ncw,

I can confirm that: I ran again the same rclone copy from SFTP that only worked single-threaded the other day, but this time I replaced the egd: part with ~/egd/(which is where I rclone mount the exact same Encrypted Google Drive remote), and multi-threading worked like a charm:

2019/05/10 12:49:52 DEBUG : rclone: Version "v1.47.0-028-g1de11062-fix-2252-multipart-download-beta" starting with parameters ["rclone" "-v" "-v" "--checkers=12" "--transfers=12" "--multi-thread-cutoff=1k" "--multi-thread-streams=4" "--low-level-retries=100" "copyto" "plx:REDACTED1" "/home/rclone/egd/REDACTED2"]
2019/05/10 12:49:52 DEBUG : Using config file from "/home/rclone/.rclone.conf"
2019/05/10 12:50:04 DEBUG : REDACTED3: Sizes differ (src 3528838 vs dst 0)
2019/05/10 12:50:04 DEBUG : REDACTED3: Failed to pre-allocate: operation not supported
2019/05/10 12:50:04 DEBUG : REDACTED3: Starting multi-thread copy with 4 parts of size 861.533k
2019/05/10 12:50:04 DEBUG : REDACTED3: multi-thread copy: stream 4/4 (2752512-3528838) size 758.131k starting
2019/05/10 12:50:04 DEBUG : REDACTED3: multi-thread copy: stream 2/4 (917504-1835008) size 896k starting
2019/05/10 12:50:04 DEBUG : REDACTED3: multi-thread copy: stream 1/4 (0-917504) size 896k starting
2019/05/10 12:50:04 DEBUG : REDACTED3: multi-thread copy: stream 3/4 (1835008-2752512) size 896k starting 

So everything is great here, only thing that would make it better would be encrypted Google Drive multi-threaded upload support :-/ Too bad Google had to FUBAR their API :expressionless:

PS1: so, it seems that what I did above could be used as a ‘trick’ to do multi-threaded download from supported remotes to any remote supported by rclone mount, by turning the later into a ‘pseudo local’ destination (from the point of view of the rclone copy command). How nice is that? :slight_smile:

PS2: Humrmrmrmr… by watching the operation of the above command here, it sure seems that the upload into my encrypted Google Drive (through its rclone mount mountpoint) is happening as it’s multithreaded… would just this rclone mount ‘trick’ be enough for that? Am I missing something?

Cheers,
– Durval.

Howdy everyone,

Some anedoctal performance info:

  • On the above multi-threaded SFTP-to-Encrypted-Google-Drive-over-rclone-mount copy, I’m seeing over 9MB/s total transfer speed (as reported in the rclone once-a-minute progress message).
  • With the single-threaded SFTP-to-Encrypted-Google-Drive-direct copy I used to run, I saw at most ~6MB/s;
  • So, as nothing else has changed, I’m seeing at least a 50% speed improvement just from this multi-threaded feature alone.

Looks great! :wink:

Cheers,
– Durval.

i don’t trust you :slight_smile:
most likely it’s speed fluctuations. i saw that when played with settings in rclone-browser. with same settings every new uploading was start with different speed according to connected google server
and for best testing look at download/upload speed. in best scenario they should be at max bandwith both and almost the same. in other case it will be a testimony that the rclone/mount caching files localy.

Hi @AndShy,

Man of little faith :wink: But anyway, I don’t trust me either: that’s why I said “anedoctal” :slight_smile:

It’s not impossible. But this +50% speed was the final number at the end of a very long 43GB transfer (which took over 1h to complete), so if it was a “fluctuation”, it was a pretty firm one :wink:

Cheers,
– Durval.

it’s possible. my fluctuations were from 1MB/s to 4MB/s and visible right after uploading start.
just start transfer with 10-15 threads and look at download/upload speed difference

That will work! I’m guessing you have --vfs-cache-mode writes though so this will do the multithread download to a temporary file then upload it.

I think what should happen is that the multithread download completes to the temporary file, then that is uploaded to google afterwards. Is that not what you are seeing?

Hello @ncw,

Indeed, it seems to be working quite nicely. And yes, without --vfs-cache-mode writes, rclone mount threw a bunch of errors, so I added it.

Sure thing! (I was not aware that, with --vfs-cache-mode writes, rclone mount would work with a temporary file. nice trick!)

Anyway, the result was so smooth, it led me to think it was working in multithreaded mode. Say, what happens with --transfers=N? If I understand it correctly, then it would result in N temporary files being written to local storage then uploaded separately, one thread for each, correct? That would account for the 'smoothness' I was seeing here.

Cheers,
-- Durval.

Yes, that is right.

I'll write a few more docs and merge the multi thread download feature - I think it is working well - thank you all for your testing :smile:

1 Like