Sync to remote only, can I cache remote file metadata only?

Hi,

I'm new to rclone. I've searched the forum and google, hoping somebody can help.

I've set up a mail server that saves each email to a file, and I want an offsite backup of the folders and email files. I have a Backblaze account and just set up another bucket on that.

So far, so good. I plan to run the sync between the email folder and the Backblaze bucket periodically (say hourly atm), but looking at the verbose output, rsync requests a list of the files at the server which returns a lot more info than just faith/file, size and modified, for all files.

the issue I know I'm going to run into is one of volume of files. I have many email accounts I am going to migrate to this server, some I have had for over a decade. There is easily 200,000 emails (therefore files) which I will want to keep in sync, and asking for these 200k files each hour is gonna be inefficient and just going to get worse.

As this is a one way sync (I will never touch the files in the b2 bucket, and if I did, I own it was my fault if something is missed!), is there a way to configure a local cache of file info already in the bucket?

By cache, I just mean meta data without actually having a cache of the files, otherwise I will have the files, the cache of files, and the remote files. Not what I am looking for!

I saw there were --cache-remote and also --cache-db-path, but suspect this is part of file caching, not meta data caching?

Can anybody throw any light on whether what I am trying to achieve is possible??

Many thanks!

1 Like

You want the --no-traverse flag for copying a small number of files to a place where there are many files. I think that should fix your problem.

If I specify --no-traverse, how will it know which of the local 200k files to copy to the destination? There will be several hundred new files per day...?

Thanks

I'm not sure that fixes this problem either (and I'm loathe to disagree with Nick on anything in rclone lol).

In a (full) sync you will inevitably have to compare all files on both sides since any could potentially need processing. no-traverse just check files one-by-one rather than doing a whole directory at a time. Much more efficient if you only need to tranfer a handful of files into a huge archive, but If it has to check every file in a directory anyway then it would be (very) counter-productive I think.

no-traverse would be great if you only considered the new files though, as it would only need to query the server for any files with the same name/location as those you were trying to transfer.

Perhaps the best solution here is to use filtering.
If you are syncing hourly... try adding this:
--max-age 2h --no-traverse
This should make your sync only consider files newer than 2 hours - catching everything that is relevant but still keeping the comparisons you need to do pretty limited - and thus fast.

Every 24hrs, or 7 days you might want to schedule a full sync - just in case anything was missed (the power might go out sometime, or the server needed maintenance).

But also - there are ways to cache for sure. If your cloud-service supported polling that would be fairly easily achieved actually, but I'm not sure if Backblaze does? - and if not that case I think the suggestion over is probably more practical.

Ah yes, that makes more sense now! Filtering by some timeframe would definitely work in this scenario.

Many thanks for taking the time to expand.

I was contemplating a local rsync based on timeframe to do something similar (rater than compare all files), but having the facility built into rclone is even better!

There's a lot of neat filtering functions, so just go have a look though in docs. You never know when you might need to filter something else in the future, and knowing what is possible is a good idea :wink:

Also, I was doing a sync, when I should have been doing a copy (this makes sense as its one-way):

ERROR : Ignoring --no-traverse with sync

Sorted, many thanks.

Yes, copy would make sense here. Copy will by default just skip any identical files anyway.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.