Crypt hash questions

I have two questions about how crypt works with file hashes. This is a combination of my understanding of the documentation and my own empirical playing around.

I set up a B2 remote and then a crypt version of it. I used copyto to copy the same file into two different file names on the crypt. I did them one at a time so I could also then do an ls on the non-crypt version to see the resulting filenames. So what I have is two nominally identical files, their encrypted names, and the hash (via the non-crypt) remote.

  1. [Why] Does the name of the file affect the hash of the encrypted file? I thought that crypt was deterministic. Encrypt the same file twice and get the same result? The hashes do not represent that

  2. Assuming I get Q 1 figured out, I know from the documentation that I cannot get the hash of the original file but is there a way to get the hash of the resulting file? I am not a Go programmer so I haven’t looked at the code. I figure the concept of a remote is abstracted so this information may be lost. But if it isn’t, can I get it? I can’t just do the hash of the non-crypt remote since I want encrypted filenames (and the only reason I could before was that I carefully uploaded it that way)

Thanks!

(BTW, I am more impressed with rclone every day! I am doing a ton of background research to see if I can make it work with my personal two-way sync script )

What are you doing to check the hash of the file?

Do you mean how? For this test case, I am using lsjson with --hash on the non-encrypted remote. Since I uploaded the files one at a time, I know which encrypted file name corresponds to which file. Just to confirm they are the same file, when I copy them back to my local compute (via the encrypted remote), they maintain the same hash.

Or, do you mean why? I want the hash of the file, even if it is on the encrypted version, so I can track moves and dedupe. I plan to use rclone as an interface and not its core functionality. But, that doesn’t help me if the same file gets encrypted differently. (Is there a random hash per file?)

There aren’t hashes for anything crypt:

https://rclone.org/crypt/#modified-time-and-hashes

If you want hashes for encrypted files, you should encrypt them on your end before uploading (that is actually what I do, for several reasons).

hmmm. I guess that is an option and may be my only choice if I wish to go down this path. It is not really what I was looking for though. But thanks.

Out of curiosity, what is your process for doing it?

I am aware of that. My question was (b) whether it was still possible to get the hash of the encrypted file (which is available on any remote that supports it) and (a) why the hashes don’t agree

I actually wrote up a script and posted a preliminary version here, but the final(-ish…there’s a lot more I’d like to do) version is at:

There isn’t a hash to get from the crypt remote:

[felix@gemini ~]$ rclone lsjson --hash gcrypt:hosts
[
{"Path":"hosts","Name":"hosts","Size":205,"MimeType":"application/octet-stream","ModTime":"2019-04-26T18:03:35.924Z","IsDir":false}
]

You need an unencrypted remote for hash.

There is my hash:

[felix@gemini ~]$ rclone lsjson --hash GD:hosts
[
{"Path":"hosts","Name":"hosts","Size":205,"MimeType":"application/octet-stream","ModTime":"2019-04-12T14:53:36.298Z","IsDir":false,"Hashes":{"MD5":"82a4e10ea77d46a70b8588b24f1cfada"},"ID":"1xOaJ--9mWkxSM8xowuSyZAHHmxQZ-SUn"}
]
1 Like

I think your second part was you can’t compare a hash from an encrypted file as the if you redo the same file twice, the file ‘looks’ different even though it’s the same file. You can redo that with gpg or something and see.

[felix@gemini ~]$ gpg -c hosts
[felix@gemini ~]$ ls
go  hosts  hosts.gpg  logs  scripts  yay
[felix@gemini ~]$ mv hosts.gpg  hosts.gpg.1
[felix@gemini ~]$ gpg -c hosts
[felix@gemini ~]$ md5sum hosts.gpg
4a7de587c1ec16aaedd9e963d8637d6a  hosts.gpg
[felix@gemini ~]$ md5sum hosts.gpg.1
b4589b980df942716b6cea311748769e  hosts.gpg.1

as an example I did a hosts file twice and you can see the md5sums are different meaning the hash for the file would be different.

1 Like

I can second this - I actually ran into the issue myself when I was trying to do incremental uploads of encrypted files - every time I re-encrypted the same (unencrypted) file, I got a completely different hash.

So basically, if you have an unencrypted file and you encrypt and directly stream to the remote server, there is no way for you to get the hash. The only way to get the hash is to encrypt it locally and then upload it - because you still have the encrypted file, you can then compare hashes, verify integrity, all that good stuff.

One solution for the crypt backend might be retaining a local cache of encrypted files (long enough to upload), but that might very quickly add up space-wise?

That is correct. The rclone encryption system (NACL secretbox) has a random “nonce” (a sequence of random bytes) at the start which is required for security. If you re-use the “nonce” then the crypto becomes breakable. Hence a difference nonce for each file and each version of each file.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.