B2 sync speed (not bandwith) question

happyname · July 2, 2018, 11:55am

Hi there,
im currently working on syncing some directorys to B2 with rclone and im making some, imho, strange observations.

Im having a directory that stores 1.373.800 files and takes up 28GB. So, a lot of fairly small files (exports, some files just are indications if a job ran sucessfully and are 0 byte).

Also, there is a directory with 951.797 files that takes up 79GB, these are mostly images.

If you look at the numbers you might say, hey, that first directory will take much more time to sync. Nope, you´re as wrong as one could be

It takes about 4 minutes, while syncing the folder with less files (but higher average file size) takes about two and a half hour.

Both syncs are using the very same VPS with the same rclone settings, which are: --transfers 64 -u sync

The destination is the same Bucket, different target directorys though.

While there is a huge difference in time it needs for the sync there is as well a high difference in Backblaze Class C Transactions (https://www.backblaze.com/b2/b2-transactions-price.html).

Syncing the directory with less files need 346.200 transactions where the one with more files just needed around 1.500.

Here is a screenshot of the current report data for B2 transactions: https://imgur.com/a/xZMI4m8
(currently syncing the directorys again for testing purposes, so numbers are already higher than mentioned above)

Any idea why there is such a huge discrepancy in these two syncs? Also, let me know if you need more informations.

Edit:

While digging a little further (and verifying the numbers above) i think i discovered the core problem for this. While the one folder has 1,3 million files, it only has 53 folders. The 950k files are split up into 365k folders.

Is there a way to significantly increase the lookup speed? Im currently trying --tpslimit 16 --tpslimit-burst 32 which got me from 4 minutes to 3m 20s for the 53 sub folder one, which is nice. Will need some more time for the other folder to see the improvement

On the other hand, using --fast-list seems to be limited by the other sides performance, doubled the runtime at 6m 45s .

ncw · July 2, 2018, 1:55pm

Yes folders are the limiting factor here. You can increase the number of folders in progress at once by increasing --checkers

Fast list will get the number of transactions down to a minimum. It may or may not be quicker than listing with lots of --checkers.

happyname · July 3, 2018, 9:06am

Thanks for your reply

Actually, the initial performance with 32 checkers was way worse. My ssh connection died sometime in the middle of the run where it had made 707k checks in 4h 17m.

In detail, i was running “–checkers 32 --transfers 64 --tpslimit 16 --tpslimit-burst 32”, maybe a little short sighted on the tps limits bc checkers 32 is going to exceed those limits. I removed the tps limit options and got to slightly less than an hour, which is a great improvement.

Then i set “–checkers 32 --transfers 64 --tpslimit 128 --tpslimit-burst 256” as options and got the same time.

I think the options that have the most impact on the general performance should be highlighted in the documentation as well as in the --help output, where you can easily miss some of the cherrys with 239 lines of output. Seperating this output into commands, general flags and service specific flags actually would make it much easier to read.

ncw · July 5, 2018, 10:47am

I try to put performance tweaks for each backend in the relevant docs, eg: B2

All the backends are different and everyones use is different, so it is difficult to generalise. However if you can think of something which should go in there, then please send a pull request (or just type some text here for me to insert).