When hasher is recommended/required?

gapa · October 16, 2023, 7:50am

rclone.conf:

[azure]
type = azureblob
account = xxx
key = xxx
directory_markers = true

[azure-crypt]
type = crypt
remote = azure:crypt
filename_encoding = base64
password = xxx
password2 = xxx

Test commands:

rclone hashsum md5 azure:crypt/file_crypted
# hash1 returned

rclone hashsum md5 azure:crypt/file_crypted --download
# hash1 returned

rclone hashsum md5 azure_crypt:file
2023/10/16 09:00:57 ERROR : file: hash unsupported: hash type not supported
2023/10/16 09:00:57 Failed to hashsum with 2 errors: last error was: hash unsupported: hash type not supported

rclone hashsum md5 azure_crypt:file --download
# hash2 returned

In this case backend supports md5, so rclone operations directly on azure may use hashes of files without need to download them.
But crypt wrapper doesn't support hashes, so as I understand, files before compare (f.ex sync command) need to be downloaded from base backend and decrypt to calculate md5 (or am I wrong?).

So to optimize operations (eliminate the need for download files content) I need to add additional layer like this, right?

[_azure-crypt]
type = crypt
remote = azure:crypt
...

[azure-crypt]
type = hasher
remote = _azure-crypt:

As I understand, to optimize remote operations even more (eliminate the need for hash queries to remote), do I may add another layer also over base azure configuration like this?

[_azure]
type = azureblob
...

[azure]
type = hasher
remote = _azure:

I'm not sure when hasher is recommended and when it may just spoil things, so I'm asking for some advices

kapitainsky · October 16, 2023, 10:14am

rclone only looks at modification time and size of files to see if they are equal. You could use --checksum flag and then it will check hashes but only when available so nothing will be downloaded from your crypt backend when running e.g. sync.

Using hasher backend is fully optional - do it if you need hashes in a crypt backend. Other option (sort of workaround) is to use chunker and specify hash_type for all files - more details here. I use the latter successfully for my data - the drawback is that for every file there is side car file with metadata.

rclone also has special cryptcheck command you can use to check a remote against an encrypted one - it will utilise hashes provided by crypt underlying remote.

gapa · October 16, 2023, 12:11pm

Thank you for your reply, I've forgot about this flag.
But let's say I'm using it - is my scenario correct then?
To summarize:

adding hasher layer over crypt remote can optimize operations to avoid downloading files from base remote
adding hasher layer over base remote (that itself supports hashes) can optimize operations to avoid downloading metadata (hashes)

And third point, which I've not mentioned yet:

adding hasher layer over base remote (that itself DOES NOT support hashes) can optimize operations to avoid downloading metadata (hashes) AND files content

Does it sound generic (and true) enough, to be taken like some kind of universal recommendations?

kapitainsky · October 16, 2023, 12:21pm

Yes - if you require files hashes on remote without it then you can use hasher. But it is not required for any rclone commands to works. E.g. sync will work with or without hash support.

What you list in your points 1,2 and 3 is true. But I would not say that it is universal recommendation. rclone can work with or without hashes. And hasher stores hashes locally on your client machine - so it is your responsibility to ensure that your remote is modified only by using hasher remote.

system · November 15, 2023, 12:22pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.