When the download of the threads themselves complete, there is a decent delay at the end. Is this rclone reassembling the downloads?
I ran some basic tests but i’ll do more. rclone copy robgs:xx.dmp.gz . -vv -P --multi-thread-streams X
with 4 threads:
2019-05-02 11:17:07 DEBUG : xx.dmp.gz: Finished multi-thread copy with 4 parts of size 301.849M
2019-05-02 11:17:34 INFO : xx.dmp.gz: Multi-thread Copied (new)
With 2 threads.
2019-05-02 11:23:57 DEBUG : xx.dmp.gz: Finished multi-thread copy with 2 parts of size 603.697M
2019-05-02 11:24:32 INFO : xx.dmp.gz: Multi-thread Copied (new)
with 12 threads
2019-05-02 11:20:13 DEBUG : xx.dmp.gz: Finished multi-thread copy with 12 parts of size 100.616M
2019-05-02 11:21:00 INFO : xx.dmp.gz: Multi-thread Copied (new)
I compared the multi-threaded rclone to a regular copy and the regular copy downloaded faster each time which i thought was interesting and unexpected.
just did my first test-run of the v1.47.0-028-g1de11062-fix-2252-multipart-download-beta version, happy to report it worked great!
More details:
it was a ~2.5GB download from an encrypted google drive directory to a local directory.
I ran it with --multi-thread-cutoff=1 --multi-thread-streams=4and--transfers=1 to try and stress the mutlthreading code.
it performed admirably: the -v -v output showed each file being split in four equal parts, and then each one being handled by a separate thread. The main objective was accomplished: it easily saturated the lousy 15Mbps link I was on at the time, much to the chagrin of everyone else trying to use it
AND… the transfer was 100% correct! I checked it on both sides with md5sum and every single file was reported as OK.
I also tried using it for an upload to that same Encrypted Google Drive: as you predicted, it did not work, but in a very smart way: the multithreaded code simply did not engage, i.e. it was like the --multi-thread-cutoff=1 --multi-thread-streams=4 were ignored, so nothing was “broken” and it simply proceeded to do the upload in the usual single-threaded manner.
So, congrats and thank you very much for another great rclone feature! This is one thing I will be using a LOT.
EDIT:
Just for the record, the exactly same thing happened here with downloads from an SFTP remote: the multithread options were silently ignored and the files proceeded to transfer in “normal” single-threaded mode.
I went through my -v -v output (see my last post above) and I’m seeing literally zero time between the last multi-thread copy: stream n/4 (x-y size nM finished message and the Finished multi-thread copy with 4 parts of size xM and Multi-thread Copied (new) messages for each file (all three are logged at exactly the same second), so I guess that, at least in my case, there’s no “assembly” time…
There should be 0 time to assemble. Rclone preallocates the file (using OS magic which takes no time) then writes directly into the file at the correct place, so it should make nice non sparse files.
Just triple-checked and this is not the case; to avoid any ambiguities with specifying pure numbers to --multi-thread-cutoff as raised here, I also specified it with an explicit 1k (and yes, all the files are larger than 1KB).
Here’s what I get from -v -v (plx: is my SFTP remote, and egd: the encrypted Google Drive one):
The only thing I can see above is the Failed to copy: failed to open source object: Open: couldn't connect SSH: ssh: handshake failed: EOF message, but I’ve been getting it since forever, and it doesn’t seem to affect data transfer (as it proceeds with no issues, just single-threadedly).
Interestingly enough. Retested and see no waiting at the last stage anymore. Perhaps a Disk IO issue or something else involved. Not really sure. working great now.
Glad to hear it’s working for you now This feature really rocks, it’s going to help a lot with my usage here where it’s impossible to saturate the link with just a single TCP connection.
Ah… Multi thread copy will only work to the local backend for the time being. It could in theory work for some other backends but it is really complicated and I haven’t got my head round how to do it!
I can confirm that: I ran again the same rclone copy from SFTP that only worked single-threaded the other day, but this time I replaced the egd: part with ~/egd/(which is where I rclone mount the exact same Encrypted Google Drive remote), and multi-threading worked like a charm:
So everything is great here, only thing that would make it better would be encrypted Google Drive multi-threaded upload support :-/ Too bad Google had to FUBAR their API
PS1: so, it seems that what I did above could be used as a ‘trick’ to do multi-threaded download from supported remotes to any remote supported by rclone mount, by turning the later into a ‘pseudo local’ destination (from the point of view of the rclone copy command). How nice is that?
PS2: Humrmrmrmr… by watching the operation of the above command here, it sure seems that the upload into my encrypted Google Drive (through its rclone mount mountpoint) is happening as it’s multithreaded… would just this rclone mount ‘trick’ be enough for that? Am I missing something?
On the above multi-threaded SFTP-to-Encrypted-Google-Drive-over-rclone-mount copy, I’m seeing over 9MB/s total transfer speed (as reported in the rclone once-a-minute progress message).
With the single-threaded SFTP-to-Encrypted-Google-Drive-direct copy I used to run, I saw at most ~6MB/s;
So, as nothing else has changed, I’m seeing at least a 50% speed improvement just from this multi-threaded feature alone.
i don’t trust you
most likely it’s speed fluctuations. i saw that when played with settings in rclone-browser. with same settings every new uploading was start with different speed according to connected google server
and for best testing look at download/upload speed. in best scenario they should be at max bandwith both and almost the same. in other case it will be a testimony that the rclone/mount caching files localy.
Man of little faith But anyway, I don’t trust me either: that’s why I said “anedoctal”
It’s not impossible. But this +50% speed was the final number at the end of a very long 43GB transfer (which took over 1h to complete), so if it was a “fluctuation”, it was a pretty firm one
it’s possible. my fluctuations were from 1MB/s to 4MB/s and visible right after uploading start.
just start transfer with 10-15 threads and look at download/upload speed difference
That will work! I’m guessing you have --vfs-cache-mode writes though so this will do the multithread download to a temporary file then upload it.
I think what should happen is that the multithread download completes to the temporary file, then that is uploaded to google afterwards. Is that not what you are seeing?
Indeed, it seems to be working quite nicely. And yes, without --vfs-cache-mode writes, rclone mount threw a bunch of errors, so I added it.
Sure thing! (I was not aware that, with --vfs-cache-mode writes, rclone mount would work with a temporary file. nice trick!)
Anyway, the result was so smooth, it led me to think it was working in multithreaded mode. Say, what happens with --transfers=N? If I understand it correctly, then it would result in N temporary files being written to local storage then uploaded separately, one thread for each, correct? That would account for the 'smoothness' I was seeing here.