Hasher: import vs stickyimport (and bonus question)

What is the problem you are having with rclone?

Not a problem (yet). I am trying to understand conceptually what is the difference between import and stickyimport in the new hasher backend.

I am just playing right now and I am not seeing stickyimport doing anything but import works fine. But before I would call this an issue, I want to try to understand the difference.

Where would you use one over the other? What do you lose by using sticky (assuming I can get it to work and it is just user error). Are there gotchas?

(and eventually, can I help clarify the docs too)

What is your rclone version (output from rclone version)

rclone v1.57.0
- os/version: darwin 10.15.7 (64 bit)
- os/kernel: 19.6.0 (x86_64)
- os/type: darwin
- os/arch: amd64
- go/version: go1.17.2
- go/linking: dynamic
- go/tags: cmount

Which cloud storage system are you using? (eg Google Drive)

Local, hasher, crypt

The command you were trying to run (eg rclone copy /tmp remote:tmp)

Kind of N/A but

$ rclone backend stickyimport crypt: SHA1 sha1.SUM -vv

and

$ rclone backend import crypt: SHA1 sha1.SUM -vv

The rclone config contents with secrets removed.

This is just testing. Passwords are not sensitive

[crypt_sub]
type = crypt
remote = /Users/<REDACTED>/test/crypt
filename_encryption = off
directory_name_encryption = false
password = MlM19IWiaFHY3-FZ9zmwq9oQoA
password2 = mh2a05_ltWbFVHITTHC59Mxj3w

[crypt]
type = hasher
remote = crypt_sub:

A log from the command with the -vv flag

$ rclone backend stickyimport crypt: SHA1 sha1.SUM -vv

2021/11/29 11:37:12 DEBUG : Setting --config "/Users/<REDACTED>/test/rclone.conf" from environment variable RCLONE_CONFIG="/Users/<REDACTED>/test/rclone.conf"
2021/11/29 11:37:12 DEBUG : Setting --cache-dir "/Users/<REDACTED>/test/cache" from environment variable RCLONE_CACHE_DIR="/Users/<REDACTED>/test/cache"
2021/11/29 11:37:12 DEBUG : Setting --temp-dir "/Users/<REDACTED>/test/tmp" from environment variable RCLONE_TEMP_DIR="/Users/<REDACTED>/test/tmp"
2021/11/29 11:37:12 DEBUG : Setting --password-command "rclone-pass-store echo" from environment variable RCLONE_PASSWORD_COMMAND="rclone-pass-store echo"
2021/11/29 11:37:12 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "backend" "stickyimport" "crypt:" "SHA1" "sha1.SUM" "-vv"]
2021/11/29 11:37:12 DEBUG : Using config file from "/Users/<REDACTED>/test/rclone.conf"
2021/11/29 11:37:12 INFO  : Hasher is EXPERIMENTAL!
2021/11/29 11:37:12 DEBUG : Creating backend with remote "crypt_sub:"
2021/11/29 11:37:12 DEBUG : Creating backend with remote "/Users/<REDACTED>/test/crypt"
2021/11/29 11:37:12 DEBUG : hasher::crypt:: Groups by usage: cached [md5, sha1], passed [], auto [md5, sha1], slow [], supported [md5, sha1]
2021/11/29 11:37:12 DEBUG : crypt_sub~hasher.bolt: Opened for reading in 85.157µs
2021/11/29 11:37:12 DEBUG : Creating backend with remote "sha1.SUM"
2021/11/29 11:37:12 DEBUG : fs cache: adding new entry for parent of "sha1.SUM", "/Users/<REDACTED>/test"
2021/11/29 11:37:12 DEBUG : crypt_sub~hasher.bolt: released
2021/11/29 11:37:12 DEBUG : crypt_sub~hasher.bolt: Opened for writing in 79.489µs
2021/11/29 11:37:13 DEBUG : crypt_sub~hasher.bolt: released
2021/11/29 11:37:13 DEBUG : crypt_sub~hasher.bolt: Opened for writing in 180.599µs
2021/11/29 11:37:14 INFO  : Summary: 41 checksum(s) imported
2021/11/29 11:37:14 DEBUG : 5 go routines active

and

$ rclone backend import crypt: SHA1 sha1.SUM -vv

2021/11/29 11:37:47 DEBUG : Setting --config "/Users/<REDACTED>/test/rclone.conf" from environment variable RCLONE_CONFIG="/Users/<REDACTED>/test/rclone.conf"
2021/11/29 11:37:47 DEBUG : Setting --cache-dir "/Users/<REDACTED>/test/cache" from environment variable RCLONE_CACHE_DIR="/Users/<REDACTED>/test/cache"
2021/11/29 11:37:47 DEBUG : Setting --temp-dir "/Users/<REDACTED>/test/tmp" from environment variable RCLONE_TEMP_DIR="/Users/<REDACTED>/test/tmp"
2021/11/29 11:37:47 DEBUG : Setting --password-command "rclone-pass-store echo" from environment variable RCLONE_PASSWORD_COMMAND="rclone-pass-store echo"
2021/11/29 11:37:47 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "backend" "import" "crypt:" "SHA1" "sha1.SUM" "-vv"]
2021/11/29 11:37:47 DEBUG : Using config file from "/Users/<REDACTED>/test/rclone.conf"
2021/11/29 11:37:47 INFO  : Hasher is EXPERIMENTAL!
2021/11/29 11:37:47 DEBUG : Creating backend with remote "crypt_sub:"
2021/11/29 11:37:47 DEBUG : Creating backend with remote "/Users/<REDACTED>/test/crypt"
2021/11/29 11:37:47 DEBUG : hasher::crypt:: Groups by usage: cached [md5, sha1], passed [], auto [md5, sha1], slow [], supported [md5, sha1]
2021/11/29 11:37:47 DEBUG : crypt_sub~hasher.bolt: Opened for reading in 88.554µs
2021/11/29 11:37:47 DEBUG : Creating backend with remote "sha1.SUM"
2021/11/29 11:37:47 DEBUG : fs cache: adding new entry for parent of "sha1.SUM", "/Users/<REDACTED>/test"
2021/11/29 11:37:47 DEBUG : crypt_sub~hasher.bolt: released
2021/11/29 11:37:47 DEBUG : crypt_sub~hasher.bolt: Opened for writing in 85.739µs
2021/11/29 11:37:49 DEBUG : crypt_sub~hasher.bolt: released
2021/11/29 11:37:49 DEBUG : crypt_sub~hasher.bolt: Opened for writing in 185.512µs
2021/11/29 11:37:49 INFO  : Summary: 41 imported, 0 skipped
2021/11/29 11:37:49 DEBUG : 5 go routines active

Bonus Questions

  • I saw --checkers is an option. I assume the global --fast-list flag works too?
  • Will uploading files via mount add a hash? Does the mount setting (i.e. --vfs-cache-mode) change that?
  • You say this is "one time" but I can run this anytime I want, right? Lets say I want to purge my cache dirs. I can just hashsum to a file and reload with that? (or using lsjson --hash and turn it into a SUM file)
1 Like

pinging @ivandeex whom I think wrote this remote

tl;dr import is for normal storage, stickyimport is for archives.

Every cache item can be labeled with size/modtime of the file. At the time the cached checksum is requested, the labels are compared against remote. If hasher finds changes, it will invalidate stale checksum and use fresh one if possible.

import quickly parses sum file, then slowly scans remote tree using checkers collecting size/modtime and fills in cache with checksums from the file and labels items with time/size from remote. The result is more precise but takes time to scan.

stickyimport was designed for the case of rarely/never changed remotes with slow/costly access (archives). The scan step here is skipped. The cache is filled with checksums from the parsed file but instead of size/modtime the items are just marked as "sticky". Even if remote file would change, hasher will not find it and return cached checksum as is. The only way to invalidate such items is to delete the remote file or purge cache.

Note: if remote can be queried for instant file checksums, these can be used to label cache items together with size/modtime, i.e. in fact we fill the gap - associate a checksum type supported by remote with new checksum type.

Producing a checksum requires that a file be transferred across hasher in full. VFS transfers files in chunks so it does not affect hasher.

I didn't specifically test whether --fast-list works. Hasher just calls the remote traverser framework to scan remote for labels. I guess it will use the fastest access method possible.

You can rerun sticky import with new version of sum file. It will blindly override cache items. That's it.

P.S. I don't see evident bugs in your report.

Thanks for the reply.

I wasn't sure it was one but when I did stickyimport and then looked at the hashes, it didn't show me any. I am about to head to a meeting but I can try to show that. I wasn't sure that was a bug yet.

The reason I am asking all of this is that the local nature of the database lessens the utility of the remote. Let's say I write to a crypt remote on different machines, I would like to sync their hash state so I thought I could use import and hashsum to transfer the state. Maybe not...

Anyway, I will followup about the file listings with a better temp example and you can tell me if (a) I am using it wrong and/or (b) this is intended

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.