I have two questions about how crypt works with file hashes. This is a combination of my understanding of the documentation and my own empirical playing around.
I set up a B2 remote and then a crypt version of it. I used copyto to copy the same file into two different file names on the crypt. I did them one at a time so I could also then do an ls on the non-crypt version to see the resulting filenames. So what I have is two nominally identical files, their encrypted names, and the hash (via the non-crypt) remote.
[Why] Does the name of the file affect the hash of the encrypted file? I thought that crypt was deterministic. Encrypt the same file twice and get the same result? The hashes do not represent that
Assuming I get Q 1 figured out, I know from the documentation that I cannot get the hash of the original file but is there a way to get the hash of the resulting file? I am not a Go programmer so I haven’t looked at the code. I figure the concept of a remote is abstracted so this information may be lost. But if it isn’t, can I get it? I can’t just do the hash of the non-crypt remote since I want encrypted filenames (and the only reason I could before was that I carefully uploaded it that way)
Thanks!
(BTW, I am more impressed with rclone every day! I am doing a ton of background research to see if I can make it work with my personal two-way sync script )
Do you mean how? For this test case, I am using lsjson with --hash on the non-encrypted remote. Since I uploaded the files one at a time, I know which encrypted file name corresponds to which file. Just to confirm they are the same file, when I copy them back to my local compute (via the encrypted remote), they maintain the same hash.
Or, do you mean why? I want the hash of the file, even if it is on the encrypted version, so I can track moves and dedupe. I plan to use rclone as an interface and not its core functionality. But, that doesn’t help me if the same file gets encrypted differently. (Is there a random hash per file?)
I am aware of that. My question was (b) whether it was still possible to get the hash of the encrypted file (which is available on any remote that supports it) and (a) why the hashes don’t agree
I think your second part was you can’t compare a hash from an encrypted file as the if you redo the same file twice, the file ‘looks’ different even though it’s the same file. You can redo that with gpg or something and see.
I can second this - I actually ran into the issue myself when I was trying to do incremental uploads of encrypted files - every time I re-encrypted the same (unencrypted) file, I got a completely different hash.
So basically, if you have an unencrypted file and you encrypt and directly stream to the remote server, there is no way for you to get the hash. The only way to get the hash is to encrypt it locally and then upload it - because you still have the encrypted file, you can then compare hashes, verify integrity, all that good stuff.
One solution for the crypt backend might be retaining a local cache of encrypted files (long enough to upload), but that might very quickly add up space-wise?
That is correct. The rclone encryption system (NACL secretbox) has a random “nonce” (a sequence of random bytes) at the start which is required for security. If you re-use the “nonce” then the crypto becomes breakable. Hence a difference nonce for each file and each version of each file.