Could rclone support a local cache of checksums to accelerate local/remote comparison?

First of all, thanks for building this…this is precisely the tool I’ve been looking for, for backing up my Linux machine to OneDrive!

I think a nice feature would be the ability to generate a cache of checksums for local files, which could be used to accelerate subsequent calls to rclone. This would make the following sequence of operations much faster, by avoiding redundant checksum computations on the local side:

rclone check --checksum cloud:/path /local/path
[user thinks about why the files might be different, and decides what to do]
rclone copy some files
[repeat as necessary]
rclone check --checksum cloud:/path /local/path
[see where we stand now]

I’m thinking that this might be accomplished with a single command-line argument…something like --local-file-checksum-cache=path/to/database/file. If the file is missing, it’s created, and in either case is written out when rclone terminates. The checksum DB would cache the file’s size and mod-time along with the checksum, and when the checksum is needed for a local file, the cached one is used if the size and mod-time match.

3 Likes

That is a good idea and clearly put.

I think there are some issues on github with similar ideas, but I like the idea of making this a local file system option only - that simplifies things enormously.

Do you fancy making an option for this on github and seeing if you can collect up links to all the other issues with similar ideas?

1 Like

Great idea and it would speed up things a lot.

1 Like

Sure, happy to help. But I’m not sure exactly what you mean by “making an option for this on GitHub”…

1 Like

D*** you autocorrect :wink: I meant…

Do you fancy making an issue for this on github and seeing if you can collect up links to all the other issues with similar ideas?

1 Like

I just want to chime in and say how useful this would be. I use rclone to backup a 2tb drive. Even if only one byte has changed on that drive, it takes over 3 days to run the backup. This is running on a 2007 Macbook that works fine as a backup server, but calculating checksums on that much data is a slow process. Various other backups I run would probably also become about 10 times faster.

Yes. I did this search:

is:issue is:open checksum

Reading through the results, I found these items of interest:



Those issues seem related, and I think the design jediry suggested would work well in addressing what has been discussed previously. There was discussion about creating individual md5 files, but it seems much better to just put all the md5 data in one file.

The underlying issue is that Amazon does not support date modified, which means we have to rely on checksum. But repeatedly doing that local checksum work on large, mostly static files seems like a big waste of time. Seems like it should be easy enough to store the checksums locally for future use, as jediry suggested.

OK . I just logged issue https://github.com/ncw/rclone/issues/949

1 Like