Hi there,
im currently working on syncing some directorys to B2 with rclone and im making some, imho, strange observations.
Im having a directory that stores 1.373.800 files and takes up 28GB. So, a lot of fairly small files (exports, some files just are indications if a job ran sucessfully and are 0 byte).
Also, there is a directory with 951.797 files that takes up 79GB, these are mostly images.
If you look at the numbers you might say, hey, that first directory will take much more time to sync. Nope, you´re as wrong as one could be
It takes about 4 minutes, while syncing the folder with less files (but higher average file size) takes about two and a half hour.
Both syncs are using the very same VPS with the same rclone settings, which are: --transfers 64 -u sync
The destination is the same Bucket, different target directorys though.
While there is a huge difference in time it needs for the sync there is as well a high difference in Backblaze Class C Transactions (https://www.backblaze.com/b2/b2-transactions-price.html).
Syncing the directory with less files need 346.200 transactions where the one with more files just needed around 1.500.
Here is a screenshot of the current report data for B2 transactions: https://imgur.com/a/xZMI4m8
(currently syncing the directorys again for testing purposes, so numbers are already higher than mentioned above)
Any idea why there is such a huge discrepancy in these two syncs? Also, let me know if you need more informations.
Edit:
While digging a little further (and verifying the numbers above) i think i discovered the core problem for this. While the one folder has 1,3 million files, it only has 53 folders. The 950k files are split up into 365k folders.
Is there a way to significantly increase the lookup speed? Im currently trying --tpslimit 16 --tpslimit-burst 32 which got me from 4 minutes to 3m 20s for the 53 sub folder one, which is nice. Will need some more time for the other folder to see the improvement
On the other hand, using --fast-list seems to be limited by the other sides performance, doubled the runtime at 6m 45s .