Cache hashes generated on remote with hasher backend


I'd like to request a feature, or ask if this even makes sense.

I'm syncing files to a crypt backend on top of an sftp backend. This works great, except that about 95% of the time, I want to run rclone sync and 95% of the files are unchanged. Because it needs to talk to the remote to check the files, though, rclone sync takes about 1 hour 45 minutes with checking hashes, and not that much shorter with --size-only --ignore-checksum --no-traverse.

Since my limiting factor is bandwidth, and I trust this backend to never change unless I'm syncing to it, I would ideally like to cache every hash I can get my hands on, and never talk to the backend unless a check against the local cache determines it's changed.

Is this something within the scope of rclone?

I looked into the hasher backend, and I've tried it, but there are a few problems:

  • I can't seem to populate the cache for all files unless I download my entire repository - which will take a long time, given bandwidth restrictions.

  • Per the manpage,

       1. if requested hash is supported by lower level, just pass it.

    It looks like this backend doesn't even use the hash cache unless it's not supported on the remote?

  • Again per the manpage, if it does cache:

    1. if unsupported and the size is big enough, build object fingerprint (including size, modtime if supported, first-found other hash if any).

    This looks like it stores the cache per size and modtime, other information that needs to be gotten from the remote. As far as I can tell (and from tests), this still creates a bandwidth restriction asking for and retrieving size and modtime.

Is the kind of local hash caching that I'm looking for something that's within scope for the hasher backend?

Is it supported today, and have I just misunderstood the docs or not looked in the right place?

Last, would this be useful for anyone else? Tyty.

You could try using a top-up sync strategy to speed up syncs enormously

Let's say you run a top-up sync every 1 hour, then you might do

rclone copy --max-age 1h /path remote:

To only consider files which has changed within the last hour.

Then once a day (say) you run a full rclone sync which will sync deletions and anything missed.

That makes sense!

I think I'll have to go with this.

I regret not being able to sync deletions quickly, nor files that are freshly created with an old modification date (for example, untarred directories from an old tar file). However, it's definitely better than nothing - thanks!