?-> crypt and how sync works

as per the docs
Hashes are not stored for crypt. However the data integrity is protected by an extremely strong crypto authenticator.

  1. so when i do a rclone sync, how are the files compared?
    modtime?
    size?

  2. is there a reason why hashes are not stored for crypt. i might imagine that the hash might give away the encrypted file.

thanks,
thanks

modtime+size (both combined, although with a backend-determined "precision margin" for the modtime component as backends store to different levels of precision.)

We can't control the server-side generated hash. Not on most backends anyway. It simply is what the file makes it. We could store it in the file itself though (for that, see further down in the post)

if there are not compatible hashes between the who systems then it falls back to size+modtime. This is usually pretty accurate all things considered.

Of course the core problem here is that the hash of the original file will not be the same as the hash of the encrypted file that is generated on the server-side. And since server can't decrypt to look at the underlying file we are at am impasse.

Not having comparable hashes is not purely a crypt issue pr se. If you sync two encrypted volumes (directly, not through the crypt remote) then these can be hash-compared just fine. But any time you compare non-encrypted to encrypted - or two encrypted systems where files aren't necessarily always originating from one source - then you have a problem.

What further complicates things is the nonce, the "random seed" for the encryption that makes sure that a hashed name is not the same each time for security/obfuscation reasons. This makes it so that we can not simply encrypt locally and hash that to compare with the encrypted file on the server. These will not match even if the file inside is identical.

What can be done is to download the nonce, then using that same nonce, encrypt locally and hash. Then the hashes will match (if the files were identical underneath obviously). Having a function that can automatically do this has been suggested. It actually already exists in the form of rclone cryptcheck, but there's no flag you can use to make use of this technique in a copy/move/sync. This is something that probably should be added...

Lastly, let me inform you that I've had some chats with Nick on this already and we have come to the agreement that it would be wise to bake the original hash into the crypt-format itself (and potentially also other metadata). This would allow to easily access the "original-file-hash" and compare based on that. it's not going to be quite as fast as grabbing all that info from a listing - but it will surely be a worthwhile compromise in the return for the ability to use --checksum, --track-renames and much more between any two remotes - regardless of encryption or not. Even regardless of if there are several different crypt-keys in play.

That hash will have to be generated locally, but because that data will reside within the crypt structure it will be inherently protected against failure by the data-integrity of the format. Thus there shouldn't be any way for the locally calculated hash to "not be true" because it got corrupted on transfer somehow.

An issue has been started on that topic here:

Hope this helped. I'm sure you have followups to this as usual :stuck_out_tongue:
Probably need to wait until tomorrow though...

i was hoping you would reply with your usual verbosity level set to HIGH.
for once, i have no followups, a credit to you.

i will say that being given my paranoia level, i cannot use crypt until checksums can be checked during sync

Whaa? ... Satisfied from the first answer?
Who are you, and what have you done with the real Jojo? O_o

As I said I think we will have a solution for this soon, but here are some workarounds you could consider for the meantime if data-integrity is your utmost concern:

(1)
You could store the data you want to server-sync in rclone-crypt format locally and then transfer that to the server directly (ie. not using a crypt remote on the upload) then you will be able to use checksums and all functions that rely on checksums to your hearts content. Although this probably requires a little bit of reorganization it's not really much of an issue as you can just have a crypt remote mounted into the directory you already store the files right now - where they will still be normally readable as they were before.

It's not elegant, but it does work without issue.


(2)
You could make use of the chunker, as this actually stores the original hash also (to work around the fact that of course the generated hashes for each part won't correspond to the original). If that format is crypted afterwards then it will similarly be safe data-integrity-wise. The main problem I can see with this is that chunker was not designed for storing hashes but to split files. Thus it will just not affect files under a specified size. The alternative of setting that threshold low enough to catch all files would work, but you'd end up with larger files having thousands of parts... So I don't think this is great solution.

If you contacted the author it would probably be an easy change to have a flag that didn't split, but still processed all files though. Only a slight tweaking to the rules would be needed for that - no actual change in functionality - so I doubt it would be work-intensive.


(3)
Lastly, let me make a note that even though we can not currently access the original hash of a crypted remote file - it still has a hash. This hash should be very easily available to us during the upload process (it can be calculated as it is read for transfer), and that should let us have a 100% guarantee that the upload was successful and the file is healthy. I assume that this check is already performed under the hood, because I can not see any reason that it would not do that... So if it is actually protection against transfer-corruption you are mainly worried about you may already have this excellent protection on any backend that support any sort of server-side hashing.

This does not solve the problem of being unable to compare-on-hash later or use functions like --track-renames, but that's really a different issue altogether, not related to data-integrity but to functionality and convenience.

@ncw Could you perhaps verify this for the record and for Jojo's sanity? :smiley:

Yes we check hashes of the encrypted data on both upload and download.

:+1:

1 Like

@ncw <3
I knew you were far too good to miss something like that :smiley:

@asdffdsa Happy Jojo? :smiley:
So that means it's basically mathematically impossible for an undetected corruption to happen on transfers on a backend that supports server-side hashing (as Wasabi does). The worst that can happen is rclone needs to re-transfer it. I assume that probably your main point of worry.

For the rest - ie. the missing functionality of not being able to checksum-compare against the unencrypted hash - you have to either wait a bit for "crypt V3" , or see if one of the suggested workarounds are acceptable for you as a temporary solution.

thanks for the clarification.

i will wait for crypt V3.2

i use rclone mostly to copy veeam backup files and 7zip files to the cloud

i need for rclone to checksum the local files, compare to cloud, as there is a chance the local file got corrupted and catch that in the log file when rclone re-uploads the file.

i thought i might use the crypt for other smaller files but my script can use 7zip to encrypt the files and filenames.

thanks much,

Sorry, i think I meant v2, not v3. AFAIK there hasn't been a major revision if it before.
Actually, this isn't really major revision we are talking about either. it is just a new improved header.
It may potentially even be partially backwards compatible. Will depend on the implementation spesifics.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.