First a few words about what it is.
You may by now know that you can sort the order of the transfers with --order-by (size,modtime,name).
This mode is an extension of this that uses this framework to attempts to increase transfer performance.
The basic idea is that you are usually limited by 2 main factors:
- Bandwidth
- Number of new transfers that can be established pr second. (this varies a lot between providers, with "premium" services generally a lot less limited - but is very relevant for services like Gdrive).
The problem is that if you transfer a mixed workload of large files + small files you often end up maxing just your bandwidth on the large files (while wasting transfer-connections allowance) and the other way around when transferring your small files. Ideally we want to use as much as both resources at all times to minimize the total transfer time for the whole operation - and this can be accomplished by transferring the largest files along with the smallest files.
So I have compiled some initial testing data to illustrate use. This is not a "best-case" workload necessarily, but something relatively realistic. Please note that the real benefit you see will heavily depend on what you transfer. Only large files will not benefit much if any. Nor will only small files. Using this should however always be better or the same speed as not using it.
Also, Nick is considering whether to add another option to this that would make it possible to optimize further. I may produce a second set of test-data that shows an "optimal savings" scenario at that point.
Please note this feature is currently availiable in the newest beta build, and current documentation can be found here:
https://tip.rclone.org/docs/#order-by-string
From my PM to NCW:
Hey Nick, here is some test data for the new feature.
Let me first say that I didn't encounter any bugs or unexpected results - so great job on implementation
Dataset:
100x100MB (1GB)
500x1MB (0,5GB)
So this is out baseline - default.
C:\rclone>rclone copy E:\mixedtest TD1:\test1_default -P
Transferred: 10.742G / 10.742 GBytes, 100%, 12.460 MBytes/s, ETA 0s
Transferred: 1100 / 1100, 100%
Elapsed time: 14m42.8s
882,8sec
Here is using mixed mode (default 50/50 split)
C:\rclone>rclone copy E:\mixedtest TD1:\test2_mixed --order-by size,mixed -P
Transferred: 10.742G / 10.742 GBytes, 100%, 14.940 MBytes/s, ETA 0s
Transferred: 1100 / 1100, 100%
Elapsed time: 12m16.2s
736,2sec - 19,9% faster over baseline.
Already a good improvement considering this is basically "free" performance in any environment or workload that can benefit.
This test was fairly heavily limited by transfer connection limit (ie. the large files finished halfway though)
Now let's test the same, but using a 25/75 split and see if we can balance the load further.
C:\rclone>rclone copy E:\mixedtest TD1:\test3_mixed75 --order-by size,mixed,75 -P
Transferred: 10.742G / 10.742 GBytes, 100%, 15.241 MBytes/s, ETA 0s
Transferred: 1100 / 1100, 100%
Elapsed time: 12m1.7s
721,7sec - 22,3% faster over baseline
Not a lot more improvement, but some.
This was still transfer-connection limited
So let's push it a little further. 5 transfers in order to a clean 20/80 split.
C:\rclone>rclone copy E:\mixedtest TD1:\test4_mixed80 --order-by size,mixed,80 -P --transfers 5
Transferred: 10.742G / 10.742 GBytes, 100%, 19.326 MBytes/s, ETA 0s
Transferred: 1100 / 1100, 100%
Elapsed time: 9m29.1s
569,1sec - 55,1% faster over baseline.
Now we are cooking with gasoline!
Granted, a small piece of this improvement may come from just the extra transfer slot, but from my own experience and testing with Gdrive that benefit when using the old default is very minimal.