Hi guys, what flags do you find help with speeding up a large copy? I have 30 million 16-18 MB files I need to move to an on prem S3, what all can i disable since this is a fresh copy? (don't need to compare to anything already there)
thanks for any tips! So many flags, looking for some pointers where to start.
are relevant here - but this mostly benefits larger files (the examples here are the default, not necessarily the recommended). For files below 200M that means there isn't really much to be gained here, but on files larger than 200M it would certainly help a lot to increase the chunk-size to something like 64M. The benefit of this more efficient utilization of upload bandwidth as the connection will not start and stop every 5M which results in TCP "sawtoothing" and poor utilization. 64M is a good value for most cases to get closer to optimal performance while not using too much RAM (ie. 64M RAM x number of connections could be used).
Even though the above is not really super relevant given the filesizes here, I think it was worth mentioning for optimization generally.
For smaller files the only thing you can really do is to increase the number of transfers. Default is 4, but you can use as meany threads as the backend can handle. That said - the different providers have wildly differing limits on this. Some can realistically only benefit from 4 or 5, while others can do 32 or even 64 which helps vastly for those smaller files. The limits are generally far more permissive for premium services where you pay pr transaction, or at least pr stored GB. Like Wasabi for example, or Backblaze (although that one is not an S3).
Transfers can be set easily via
(again, this is the default value, not necessarily the recommended)
I do not recommend just setting this high blindly, because a higher number than your backend can actually handle will just result in a lot of rejected requests and stalling which will be counterproductive for performance.
To help suggest some reasonable numbers to use I need more info about the spesific provider you use. "premium S3" is rather vague...
Thanks for the detailed info, it is an on premises S3 (NetApp), I've fiddled with the transfer #, using 30 seems to be the magic number, cut the time down to 1/3 rd of where I started.
Was checking about chunk size or other things that might help with my file size, your explanation is very helpful.
Ah ok, then I suppose the correct answer is "as many transfers as it can handle" hehe
You can always just experiment to find this out.
use -P to get a status display on the transfer - and keep increasing the transfers until you see that it stops helping in terms of the throughput. Then maybe step back by 2 or so ... that should be about optimal.
Note about chunksize and cutoff.
Below cutoff size, basically no memory is used for the transfers. The downside is if the file is very large and has an error it needs to restart completely. BUt on a reliable network (especially an on-premise LAN) this can probablyh set very high indeed...
Above the cutoff, it gets chunked, and you need (chunksize x transfers) MBs of memory to support this. Larger chunks are always better, but at the cost of memory. 64M is generally a "sweet spot". 128M is better, but not much. Above that you are unlikely to get very much better as each doubling gives less and less returns while the RAM cost increases linearly. Do be mindful of memory usage on limited RAM systems as rclone will simply crash if you run out of memory.
You may also consider using --fast-list if you do large sync operations. This is up to 15x faster in listing very large collections - like syncing the entire storage space for example. It will have limited or no benefit in smaller move or copy jobs though (but it also shouldn't hurt). It uses some RAM to do this, but it's pretty trivial until you start to talk about hundreds of thousands of files.
That's about what you can do as far as rclone settings.
Aside from that you could consider packaging all these small files in some way into a larger file. Simple zipping, or via a backup program if that is suitable ect. Just a few huge files will surely max out your bandwidth even on gigabit and above - but it does bring some hassles of course in terms of making it more difficult to change the files later. It is thus best considered for longer-term storage not frequently updated.
Good hunting, and let me know if you have any other questions
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.