Compare remote against the hasher cache (not the checksums file)?

What is the problem you are having with rclone?

I'd like to do the following (one or both):

  1. compare the remote against the hasher cache
  2. compare the checksums file against the hasher cache

I've tried running the command:

rclone checksum md5 "checksums.txt" hasher_main: --differ "DATA/diff_modified.txt"

However, it rather does the following:

  1. it updates the remote (not the checksums file) in the hasher cache
  2. it compares the remote (not the checksums file) against the checksums file (not the hasher cache)

I'm not sure if that's the intended behavior or a bug as this behavior seems strange to me for the following reasons:

  • #1 should be a part of 'rclone hashsum' rather than 'rclone checksum' responsibility, at least that's the behavior when running without the hasher, so possibly a bug (anyway #1 can already be achieved by running: rclone hashsum md5 hasher_01: > NUL)

  • #2 can already be achieved without the hasher by running: rclone checksum md5 "checksums.txt" "C:\Sync" --progress --differ "DATA/diff_modified.txt"

Run the command 'rclone version' and share the full output of the command.

rclone v1.66.0
- os/version: Microsoft Windows 10 Pro 22H2 (64 bit)
- os/kernel: 10.0.19045.3693 (x86_64)
- os/type: windows
- os/arch: amd64
- go/version: go1.22.1
- go/linking: static
- go/tags: cmount

Which cloud storage system are you using? (eg Google Drive)

none

The command you were trying to run (eg rclone copy /tmp remote:tmp)

<create "aaa.txt file on remote>
rclone hashsum md5 "C:\Users\zac\Desktop\Sync" --output-file "DATA/checksums.txt"
rclone backend import hasher_main: md5 "DATA/checksums.txt" -vv -–log-file "DATA/log.txt"
rclone backend fulldump hasher_main:
<modify "aaa.txt" file on remote>
rclone checksum md5 "DATA/checksums.txt" hasher_main: --progress --differ "DATA/diff_modified.txt" -vv -–log-file "DATA/log.txt"
rclone backend fulldump hasher_main:

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

[hasher_main]
type = hasher
remote = C:\Users\zac\Desktop\Sync
hashes = md5
max_age = off

A log from the command that you were trying to run with the -vv flag

2024/03/26 22:58:18 DEBUG : rclone: Version "v1.66.0" starting with parameters ["rclone" "backend" "import" "hasher_main:" "md5" "DATA/checksums.txt" "-vv" "--log-file" "DATA/log.txt"]
2024/03/26 22:58:18 DEBUG : Using config file from "C:\\Users\\zac\\AppData\\Roaming\\rclone\\rclone.conf"
2024/03/26 22:58:18 INFO : Hasher is EXPERIMENTAL!
2024/03/26 22:58:18 DEBUG : Creating backend with remote "C:/Users/zac/Desktop/Sync"
2024/03/26 22:58:18 DEBUG : fs cache: renaming cache item "C:/Users/zac/Desktop/Sync" to be canonical "//?/C:/Users/zac/Desktop/Sync"
2024/03/26 22:58:18 DEBUG : hasher::hasher_main:: Groups by usage: cached [md5], passed [], auto [], slow [md5, sha1, whirlpool, crc32, sha256, dropbox, hidrive, mailru, quickxor], supported [md5, sha1, whirlpool, crc32, sha256, dropbox, hidrive, mailru, quickxor]
2024/03/26 22:58:18 DEBUG : Creating backend with remote "DATA/checksums.txt"
2024/03/26 22:58:18 DEBUG : fs cache: adding new entry for parent of "DATA/checksums.txt", "//?/C:/Users/zac/Desktop/rclone/DATA"
2024/03/26 22:58:18 DEBUG : local~hasher.bolt: Opened for writing in 2.9331ms
2024/03/26 22:58:18 INFO : Summary: 1 imported, 0 skipped
2024/03/26 22:58:18 DEBUG : 3 go routines active
2024/03/26 22:58:32 DEBUG : rclone: Version "v1.66.0" starting with parameters ["rclone" "checksum" "md5" "DATA/checksums.txt" "hasher_main:" "--progress" "--differ" "DATA/diff_modified.txt" "-vv" "--log-file" "DATA/log.txt"]
2024/03/26 22:58:32 DEBUG : Creating backend with remote "DATA/checksums.txt"
2024/03/26 22:58:32 DEBUG : Using config file from "C:\\Users\\zac\\AppData\\Roaming\\rclone\\rclone.conf"
2024/03/26 22:58:32 DEBUG : fs cache: adding new entry for parent of "DATA/checksums.txt", "//?/C:/Users/zac/Desktop/rclone/DATA"
2024/03/26 22:58:32 DEBUG : Creating backend with remote "hasher_main:"
2024/03/26 22:58:32 INFO : Hasher is EXPERIMENTAL!
2024/03/26 22:58:32 DEBUG : Creating backend with remote "C:/Users/zac/Desktop/Sync"
2024/03/26 22:58:32 DEBUG : fs cache: renaming cache item "C:/Users/zac/Desktop/Sync" to be canonical "//?/C:/Users/zac/Desktop/Sync"
2024/03/26 22:58:32 DEBUG : hasher::hasher_main:: Groups by usage: cached [md5], passed [], auto [], slow [md5, sha1, whirlpool, crc32, sha256, dropbox, hidrive, mailru, quickxor], supported [md5, sha1, whirlpool, crc32, sha256, dropbox, hidrive, mailru, quickxor]
2024/03/26 22:58:32 DEBUG : local~hasher.bolt: Opened for reading in 0s
2024/03/26 22:58:32 DEBUG : aaa.txt: getHash: fingerprint changed
2024/03/26 22:58:32 DEBUG : aaa.txt: slow md5
2024/03/26 22:58:32 DEBUG : local~hasher.bolt: released
2024/03/26 22:58:32 DEBUG : local~hasher.bolt: Opened for writing in 0s
2024/03/26 22:58:32 DEBUG : md5 = d41d8cd98f00b204e9800998ecf8427e (sum)
2024/03/26 22:58:32 DEBUG : aaa.txt: md5 = 594f803b380a41396ed63dca39503542 (hasher::hasher_main:)
2024/03/26 22:58:32 ERROR : aaa.txt: files differ
2024/03/26 22:58:32 NOTICE: hasher::hasher_main:: 1 differences found
2024/03/26 22:58:32 NOTICE: hasher::hasher_main:: 1 errors while checking
2024/03/26 22:58:32 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Errors: 1 (retrying may help)
Checks: 1 / 1, 100%
Elapsed time: 0.1s
2024/03/26 22:58:32 DEBUG : 3 go routines active
2024/03/26 22:58:32 DEBUG : local~hasher.bolt: released
2024/03/26 22:58:32 Failed to checksum: 1 differences found

I think what you want in this case is rclone check (not rclone checksum). For example:

rclone check C:\Users\zac\Desktop\Sync hasher_main: --differ "DATA/diff_modified.txt"

I've just tried both:

rclone check "C:\Users\zac\Desktop\Sync" hasher_01: --differ "DATA/diff_modified.txt"
rclone check hasher_01: "C:\Users\zac\Desktop\Sync" --differ "DATA/diff_modified.txt"

But unfortunately all they do is update the hasher cache. The output shows "0 differences found ... 1 matching files" and the output diff file is empty.

It appears that both 'rclone checksum' and 'rclone check' update the hasher cache... Is this the expected behavior?

If they are "slow hashes" (such as on local) and they're already cached in the database, and you're not using auto_size or --download, rclone check should be using the cached hash from the database instead of recalculating it.

If that's not happening, it would be helpful to see a debug log! (The one in your previous post shows rclone checksum, not rclone check)

This just means there was 1 file found and the hash was the same on both sides. It doesn't necessarily indicate where that hash came from (cache vs. not). Or is your point that you know for sure that the hashes should not match?

Yes exactly because I simulated the following situation:

  1. create a file called "aaa.txt"
  2. execute 'rclone hashsum" to create SUM file
  3. import SUM file to the hasher cache
  4. use 'fulldump' to check the checksum of "aaa.txt"
  5. make a change in the "aaa.txt" file
  6. execute 'rclone check' as suggested
  7. use 'fulldump' to check the checksum of "aaa.txt" (the checksum is now different)

Running the following commands:

<create "aaa.txt file on remote>
rclone hashsum md5 "C:\Users\zac\Desktop\Sync" --output-file "DATA/checksums.txt" -vv -–log-file "DATA/log.txt"
rclone backend import hasher_main: md5 "DATA/checksums.txt" -vv -–log-file "DATA/log.txt"
rclone backend fulldump hasher_main: -vv -–log-file "DATA/log.txt"
<modify "aaa.txt" file on remote>
rclone check "C:\Users\zac\Desktop\Sync" hasher_main: --differ "DATA/diff_modified.txt" -vv -–log-file "DATA/log.txt"
rclone backend fulldump hasher_main: -vv -–log-file "DATA/log.txt"

Gives the following log:

2024/03/27 01:20:45 DEBUG : rclone: Version "v1.66.0" starting with parameters ["rclone" "hashsum" "md5" "C:\\Users\\zac\\Desktop\\Sync" "--output-file" "DATA/checksums.txt" "-vv" "--log-file" "DATA/log.txt"]
2024/03/27 01:20:45 DEBUG : Creating backend with remote "C:\\Users\\zac\\Desktop\\Sync"
2024/03/27 01:20:45 DEBUG : Using config file from "C:\\Users\\zac\\AppData\\Roaming\\rclone\\rclone.conf"
2024/03/27 01:20:45 DEBUG : fs cache: renaming cache item "C:\\Users\\zac\\Desktop\\Sync" to be canonical "//?/C:/Users/zac/Desktop/Sync"
2024/03/27 01:20:45 DEBUG : 2 go routines active

2024/03/27 01:20:49 DEBUG : rclone: Version "v1.66.0" starting with parameters ["rclone" "backend" "import" "hasher_main:" "md5" "DATA/checksums.txt" "-vv" "--log-file" "DATA/log.txt"]
2024/03/27 01:20:49 DEBUG : Using config file from "C:\\Users\\zac\\AppData\\Roaming\\rclone\\rclone.conf"
2024/03/27 01:20:49 INFO  : Hasher is EXPERIMENTAL!
2024/03/27 01:20:49 DEBUG : Creating backend with remote "C:/Users/zac/Desktop/Sync"
2024/03/27 01:20:49 DEBUG : fs cache: renaming cache item "C:/Users/zac/Desktop/Sync" to be canonical "//?/C:/Users/zac/Desktop/Sync"
2024/03/27 01:20:49 DEBUG : hasher::hasher_main:: Groups by usage: cached [md5], passed [], auto [], slow [md5, sha1, whirlpool, crc32, sha256, dropbox, hidrive, mailru, quickxor], supported [md5, sha1, whirlpool, crc32, sha256, dropbox, hidrive, mailru, quickxor]
2024/03/27 01:20:49 DEBUG : Creating backend with remote "DATA/checksums.txt"
2024/03/27 01:20:49 DEBUG : fs cache: adding new entry for parent of "DATA/checksums.txt", "//?/C:/Users/zac/Desktop/rclone/DATA"
2024/03/27 01:20:49 DEBUG : local~hasher.bolt: Opened for writing in 1.6791ms
2024/03/27 01:20:49 INFO  : Summary: 1 imported, 0 skipped
2024/03/27 01:20:49 DEBUG : 3 go routines active

2024/03/27 01:20:53 DEBUG : rclone: Version "v1.66.0" starting with parameters ["rclone" "backend" "fulldump" "hasher_main:" "-vv" "--log-file" "DATA/log.txt"]
2024/03/27 01:20:54 DEBUG : Using config file from "C:\\Users\\zac\\AppData\\Roaming\\rclone\\rclone.conf"
2024/03/27 01:20:54 INFO  : Hasher is EXPERIMENTAL!
2024/03/27 01:20:54 DEBUG : Creating backend with remote "C:/Users/zac/Desktop/Sync"
2024/03/27 01:20:54 DEBUG : fs cache: renaming cache item "C:/Users/zac/Desktop/Sync" to be canonical "//?/C:/Users/zac/Desktop/Sync"
2024/03/27 01:20:54 DEBUG : hasher::hasher_main:: Groups by usage: cached [md5], passed [], auto [], slow [md5, sha1, whirlpool, crc32, sha256, dropbox, hidrive, mailru, quickxor], supported [md5, sha1, whirlpool, crc32, sha256, dropbox, hidrive, mailru, quickxor]
2024/03/27 01:20:54 DEBUG : local~hasher.bolt: Opened for reading in 0s
2024/03/27 01:20:54 DEBUG : Creating backend with remote "C:\\Users\\zac\\Desktop\\Sync"
2024/03/27 01:20:54 DEBUG : fs cache: renaming cache item "C:\\Users\\zac\\Desktop\\Sync" to be canonical "//?/C:/Users/zac/Desktop/Sync"
2024/03/27 01:20:54 INFO  : C:\Users\zac\AppData\Local\rclone\kv\local~hasher.bolt: 0 records out of 1
2024/03/27 01:20:54 DEBUG : 3 go routines active

2024/03/27 01:21:06 DEBUG : rclone: Version "v1.66.0" starting with parameters ["rclone" "check" "C:\\Users\\zac\\Desktop\\Sync" "hasher_main:" "--differ" "DATA/diff_modified.txt" "-vv" "--log-file" "DATA/log.txt"]
2024/03/27 01:21:06 DEBUG : Creating backend with remote "C:\\Users\\zac\\Desktop\\Sync"
2024/03/27 01:21:06 DEBUG : Using config file from "C:\\Users\\zac\\AppData\\Roaming\\rclone\\rclone.conf"
2024/03/27 01:21:06 DEBUG : fs cache: renaming cache item "C:\\Users\\zac\\Desktop\\Sync" to be canonical "//?/C:/Users/zac/Desktop/Sync"
2024/03/27 01:21:06 DEBUG : Creating backend with remote "hasher_main:"
2024/03/27 01:21:06 INFO  : Hasher is EXPERIMENTAL!
2024/03/27 01:21:06 DEBUG : Creating backend with remote "C:/Users/zac/Desktop/Sync"
2024/03/27 01:21:06 DEBUG : fs cache: renaming cache item "C:/Users/zac/Desktop/Sync" to be canonical "//?/C:/Users/zac/Desktop/Sync"
2024/03/27 01:21:06 DEBUG : hasher::hasher_main:: Groups by usage: cached [md5], passed [], auto [], slow [md5, sha1, whirlpool, crc32, sha256, dropbox, hidrive, mailru, quickxor], supported [md5, sha1, whirlpool, crc32, sha256, dropbox, hidrive, mailru, quickxor]
2024/03/27 01:21:06 DEBUG : local~hasher.bolt: Opened for reading in 769.1µs
2024/03/27 01:21:06 INFO  : Using md5 for hash comparisons
2024/03/27 01:21:06 DEBUG : hasher::hasher_main:: Waiting for checks to finish
2024/03/27 01:21:06 DEBUG : aaa.txt: getHash: fingerprint changed
2024/03/27 01:21:06 DEBUG : aaa.txt: slow md5
2024/03/27 01:21:06 DEBUG : local~hasher.bolt: released
2024/03/27 01:21:06 DEBUG : local~hasher.bolt: Opened for writing in 0s
2024/03/27 01:21:06 DEBUG : aaa.txt: md5 = 594f803b380a41396ed63dca39503542 OK
2024/03/27 01:21:06 DEBUG : aaa.txt: OK
2024/03/27 01:21:06 NOTICE: hasher::hasher_main:: 0 differences found
2024/03/27 01:21:06 NOTICE: hasher::hasher_main:: 1 matching files
2024/03/27 01:21:06 INFO  : 
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Checks:                 1 / 1, 100%
Elapsed time:         0.1s

2024/03/27 01:21:06 DEBUG : 4 go routines active
2024/03/27 01:21:06 DEBUG : local~hasher.bolt: released

2024/03/27 01:21:12 DEBUG : rclone: Version "v1.66.0" starting with parameters ["rclone" "backend" "fulldump" "hasher_main:" "-vv" "--log-file" "DATA/log.txt"]
2024/03/27 01:21:12 DEBUG : Using config file from "C:\\Users\\zac\\AppData\\Roaming\\rclone\\rclone.conf"
2024/03/27 01:21:12 INFO  : Hasher is EXPERIMENTAL!
2024/03/27 01:21:12 DEBUG : Creating backend with remote "C:/Users/zac/Desktop/Sync"
2024/03/27 01:21:12 DEBUG : fs cache: renaming cache item "C:/Users/zac/Desktop/Sync" to be canonical "//?/C:/Users/zac/Desktop/Sync"
2024/03/27 01:21:12 DEBUG : hasher::hasher_main:: Groups by usage: cached [md5], passed [], auto [], slow [md5, sha1, whirlpool, crc32, sha256, dropbox, hidrive, mailru, quickxor], supported [md5, sha1, whirlpool, crc32, sha256, dropbox, hidrive, mailru, quickxor]
2024/03/27 01:21:12 DEBUG : local~hasher.bolt: Opened for reading in 0s
2024/03/27 01:21:12 DEBUG : Creating backend with remote "C:\\Users\\zac\\Desktop\\Sync"
2024/03/27 01:21:12 DEBUG : fs cache: renaming cache item "C:\\Users\\zac\\Desktop\\Sync" to be canonical "//?/C:/Users/zac/Desktop/Sync"
2024/03/27 01:21:12 INFO  : C:\Users\zac\AppData\Local\rclone\kv\local~hasher.bolt: 0 records out of 1
2024/03/27 01:21:12 DEBUG : 3 go routines active

It doesn't show it on the log but the first time 'fulldump' is run it prints:

ext md5:d41d8cd98f00b204e9800998ecf8427e        4s /?/C:/Users/zac/Desktop/Sync/aaa.txt

While the second time 'fulldump' is run it prints:

ext md5:594f803b380a41396ed63dca39503542        5s /?/C:/Users/zac/Desktop/Sync/aaa.txt

I see. So, this line is telling:

It corresponds to step #5 here:

5. if remote found but fingerprint mismatched, then purge the entry and proceed to step 6.

Basically what is happening is that hasher has detected your change by noticing that either the size or modtime has changed (or both), and because of this it no longer trusts the hash it has cached. In most cases this is what you want, because if the file changed then the old hash is no longer correct, and using an incorrect hash isn't helpful.

I think that if you had run rclone check without first changing the file, it would have given you the cached hash. In other words, it still helps speed things up by avoiding the slow hash recalculation process when it has no reason to think that anything has changed.

If you really want to prove this definitively, you could probably "fool" hasher by changing the file in such a way that the file size does not change, and then retroactively rclone touch the modtime (on the remote directly -- not through hasher) back to the prior value. In other words, make a change that should change the file's hash without changing its "fingerprint", and see if hasher notices the change.

I guess I just don't understand the hasher's proposition well enough and therefore when using the "regular commands" to interact with it - the behavior seems strange to me.
I was thinking about the cache as a "more advance SUM file" and was expecting the behavior of the "read commands" like 'check' and 'checksum' to not update things.
Anyway, I appreciate you taking your time to explain and help :slight_smile:

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.