[PSA] Some initial performance testing of new mode --order-by size,mixed

First a few words about what it is.
You may by now know that you can sort the order of the transfers with --order-by (size,modtime,name).

This mode is an extension of this that uses this framework to attempts to increase transfer performance.
The basic idea is that you are usually limited by 2 main factors:
- Bandwidth
- Number of new transfers that can be established pr second. (this varies a lot between providers, with "premium" services generally a lot less limited - but is very relevant for services like Gdrive).

The problem is that if you transfer a mixed workload of large files + small files you often end up maxing just your bandwidth on the large files (while wasting transfer-connections allowance) and the other way around when transferring your small files. Ideally we want to use as much as both resources at all times to minimize the total transfer time for the whole operation - and this can be accomplished by transferring the largest files along with the smallest files.

So I have compiled some initial testing data to illustrate use. This is not a "best-case" workload necessarily, but something relatively realistic. Please note that the real benefit you see will heavily depend on what you transfer. Only large files will not benefit much if any. Nor will only small files. Using this should however always be better or the same speed as not using it.

Also, Nick is considering whether to add another option to this that would make it possible to optimize further. I may produce a second set of test-data that shows an "optimal savings" scenario at that point.
Please note this feature is currently availiable in the newest beta build, and current documentation can be found here:
https://tip.rclone.org/docs/#order-by-string


From my PM to NCW:

Hey Nick, here is some test data for the new feature.
Let me first say that I didn't encounter any bugs or unexpected results - so great job on implementation :smiley:

Dataset:

100x100MB (1GB)
500x1MB (0,5GB)

So this is out baseline - default.

C:\rclone>rclone copy E:\mixedtest TD1:\test1_default -P
Transferred: 10.742G / 10.742 GBytes, 100%, 12.460 MBytes/s, ETA 0s
Transferred: 1100 / 1100, 100%
Elapsed time: 14m42.8s

882,8sec

Here is using mixed mode (default 50/50 split)

C:\rclone>rclone copy E:\mixedtest TD1:\test2_mixed --order-by size,mixed -P
Transferred: 10.742G / 10.742 GBytes, 100%, 14.940 MBytes/s, ETA 0s
Transferred: 1100 / 1100, 100%
Elapsed time: 12m16.2s

736,2sec - 19,9% faster over baseline.
Already a good improvement considering this is basically "free" performance in any environment or workload that can benefit. :slight_smile:
This test was fairly heavily limited by transfer connection limit (ie. the large files finished halfway though)

Now let's test the same, but using a 25/75 split and see if we can balance the load further.

C:\rclone>rclone copy E:\mixedtest TD1:\test3_mixed75 --order-by size,mixed,75 -P
Transferred: 10.742G / 10.742 GBytes, 100%, 15.241 MBytes/s, ETA 0s
Transferred: 1100 / 1100, 100%
Elapsed time: 12m1.7s

721,7sec - 22,3% faster over baseline
Not a lot more improvement, but some.
This was still transfer-connection limited

So let's push it a little further. 5 transfers in order to a clean 20/80 split.

C:\rclone>rclone copy E:\mixedtest TD1:\test4_mixed80 --order-by size,mixed,80 -P --transfers 5
Transferred: 10.742G / 10.742 GBytes, 100%, 19.326 MBytes/s, ETA 0s
Transferred: 1100 / 1100, 100%
Elapsed time: 9m29.1s

569,1sec - 55,1% faster over baseline.
Now we are cooking with gasoline! :smiley:
Granted, a small piece of this improvement may come from just the extra transfer slot, but from my own experience and testing with Gdrive that benefit when using the old default is very minimal.

3 Likes

Excellent work @thestigma :smiley:

If anyone wants to try --order-by size,mixed then it is in the latest beta and will be released in 1.52

The docs are here: https://tip.rclone.org/docs/#order-by-string

2 Likes

Very initial testing - I am uploading a year of photos to Google at a time - is that with 16 transfers and mixed sizes it does keep my local DSL uplink full while still sending lots of the smaller JPGs (over the larger RAW files). Will keep an eye out for any odd things, but as the next batch will take about 12 hours...

1 Like

@thestigma This looks like a wonderful addition! Thank you

If true, will the default setting use mixed if it is always same or better than not using it?

Currently - no, it does not. The default is still the same as it has been for a long time.
That default is basically "whatever order rclone lists the files in first" - and this tends to be pseudo-random, but generally folder-by-folder most of the time.

I have made the suggestion to NCW that this might indeed make a good default replacement, but he is hesitant to make any change that could in any conceivable way break backwards compatibility in core functions (and the transfer queue would affect pretty much every operation). The current default doesn't really perform a reliable function, so in theory nobody should be affected if it was changed - but as Nick says: It would not be the first time that someone out there relied on an undocumented and "haphazard" function in the program.

I think he might come around to the idea eventually - but some caution here is definitely advisable since as I said, this is very core to the function of the all of rclone. Until such a time I'm afraid you will have to manually enable it, sorry :slight_smile:

Branch: rclone-rainbow

"We don't discriminate based on size, age or gender. We admit all files on an equal basis."

:sweat_smile:

Yep. Totally get that. Flags are easy to add in my scripts. More fun things to play with.

I'd definitely be up for changing the defaults if it helps, but not for the 1.51 release.

1 Like

Do you mean not for 1.52 release?

Indeed! (Too many releases!)

Seems to also do the right thing when just checking (using --checksum) en masse. Doing a consistency check on my music collection versus a backup portable copy and it splits checksumming between dsf and mp3 and then flac in an even looking way. This however only worked over ssh and not a local mount of the server drive; but this is more likely how rclone traverses directories when local.

1 Like