New sync method in v1.36 and Backblaze B2 Class C Transaction Cap

The new sync method in v1.36 is creating a lot of class C transactions in Backblaze B2 and using up the daily free allowance (2,500 transactions). Syncing just under 500,000 files is creating nearly 60,000 transactions - if I apply the --old-sync-method flag about 400 transactions are created.

Is this expected behaviour or a bug?

Will the --old-sync-method option be deprecated in the future?

(This is using rclone V1.36 on Linux ARM-32 bit to sync to an encrypted B2 bucket).

2 Likes

The new sync method syncs each directory individually. rclone therefore lists each directory individually. The old sync method would list the entire remote with a series of paged API calls. This definitely will use more API calls, however for most uses it is actually quicker as rclone can parallelize the directory listings. Not having to load all 500k files into memory at once uses a lot less memory which was the main point.

So yes this is expected. I expect you have lots of directories with not many files in - is that correct?

I would like to deprecate it in the future yes, as rclone now has 2 sync algorithms for me to maintain! However I don’t want to leave you in the lurch, so I’m sure there is a compromise - perhaps an extra flag which could cause rclone to list all the files first (all the bucket based Fses like S3, B2, swift, gcs provide this API).

Can you make an issue on github for me to track this please?

As I’m using a Raspberry Pi this is helpful.[quote=“ncw, post:2, topic:1441”]
So yes this is expected. I expect you have lots of directories with not many files in - is that correct?
[/quote]
Yes - 73,000 directories.[quote=“ncw, post:2, topic:1441”]
I would like to deprecate it in the future yes, as rclone now has 2 sync algorithms for me to maintain!
[/quote]

That’s very reasonable.

I have raised issue #1277.

1 Like

Hi, I’m having a similar issue. I have about ~275GB of data synced from Dropbox to B2 (~200,000 files in ~35,000 directories). I’d like to have a daily cron run to keep it up to date. However, each time I run rclone (without --old-sync-method) it uses 10,000+ transactions. Although the cost is negligible for a day, over a year this would nearly double the cost of me using B2.

Keeping some kind of low-transaction method, without too higher price to pay in terms of speed and memory usage, would be ideal for this kind of use-case. I understand this might be asking too much though! :slight_smile:

Just a data point, but for me, using a lot of memory isn’t an issue - my server, which I’m running rclone on) has 32Gb of memory. Speed for me similarly isn’t an issue, but I see how for a desktop, speed and low memory footprint would be essential.

Totally understandable for sure. I’m not a programmer, would the “extra flag” also use a lot of memory? I can see there being a trade off between memory usage and transactions.

I’d echo the impact to deprecating the old sync method. I have about 350GB that I am looking at syncing via CRON. I don’t have a lot that changes, but it still has to index and check. I just ran into the issue when I upgraded rclone and blew my safety cap at 15k transactions two days in a row.

I agree that sucking up 4gb of memory is painful, but is there some way we can do a middle ground? Or another API we can use?

For my use there is only one read/write source so technically for my case it could be cached maybe…

I do agree two algorithms is twice as hard, though.

1 Like

I’ve fixed this now.

Try the latest beta with the --fast-list flag.

–fast-list

When doing anything which involves a directory listing (eg sync,
copy, ls - in fact nearly every command), rclone normally lists a
directory and processes it before using more directory lists to
process any subdirectories. This can be parallelised and works very
quickly using the least amount of memory.

However, some remotes have a way of listing all files beneath a
directory in one (or a small number) of transactions. These tend to
be the bucket based remotes (eg S3, B2, GCS, Swift, Hubic).

If you use the --fast-list flag then rclone will use this method for
listing directories. This will have the following consequences for
the listing:

  • It will use fewer transactions (important if you pay for them)
  • It will use more memory. Rclone has to load the whole listing into memory.
  • It may be faster because it uses fewer transactions
  • It may be slower because it can’t be parallelized

rclone should always give identical results with and without
--fast-list.

If you pay for transactions and can fit your entire sync listing into
memory then --fast-list is recommended. If you have a very big sync
to do then don’t use --fast-list otherwise you will run out of
memory.

If you use --fast-list on a remote which doesn’t support it, then
rclone will just ignore it.

6 Likes

This doesn’t work for crypt with b2, does it? I tried just now with the latest beta and the transactions shoot up, so I killed rclone.

I use it with crypt. It creates more transactions than the deprecated old sync method but not so many that it blows the daily cap.

You’re right. I let it run and it finished after 12 minutes with around 1000 transactions, which seems OK.

@ncw I am just replying to say thank you for writing the --fast-list feature. It appears to run quicker than without the flag using Backblaze B2 and used much fewer of their Class C API calls. I am backing up quite a few directories (Seafile storage server) and this will save me some $$$!

2 Likes