Does MOVE checksum files that are already on remote crypt?

rclone v1.51.0
os/arch: linux/amd64
go version: go1.13.7
remote: Google Drive / Google Suite
command: rclone -P move local remote-crypt

I have two hard drives with almost exactly the same files on them. One had already been uploaded to Google Drive using 'rclone copy'. I then ran 'rclone move' on the second drive expecting it to move files that weren't already on Google Drive, and leave the rest. What it did instead (which I now know is the expected behavior) is move the files that weren't already on Google Drive, and delete the rest.

My remote on Google Drive is a crypt, so my question is; in what way did 'rclone move' check the files that were already on my remote before deleting them from my local? Did it just do a modtime + size check, or did it do some form of checksum? As it did almost 4TB of files in about 14 minutes, I'm guessing it's the first, but would like that confirmed.

Thanks! :slight_smile:

to see what is going on,
you need add a log file and enable debug output

https://rclone.org/docs/#log-file-file
and
https://rclone.org/docs/#log-level-level

You can't checksum on crypted remotes.

It does size + mod time basically.

The log file with -vv would confirm why something was copied.

It's too late for me to make a log because the files are deleted (so I can't easily run the command again). I guess I could try to recreate a similar situation?

until you are sure about what a command will do, especially move and sync, which will delete files.
use the flag --dry-run and use a log file with debug.

and perhaps instead of a move, do

  1. copy
  2. manual delete

ncw has said that 'rclone move' does checksum files during uploads. Make --checksum work with crypt remotes

Can it not do this to check files that are already on the remote?

to compare checksums of files already in the crypted remote, you would need to use
rclone cryptcheck

That's not what he says. That post says there is feature request to add checksum flag called cryptsum and the issue is still open.

You can follow that issue for updates.

As it stands now, you cannot use checksums on a crypt remote.

You can use cryptcheck if you want to validate a file matches.

I'm confused...

ncw (in response to "But after a second thought: why don’t we encrypt the files locally for the only purpose of computing a check sum, and compare this to the remote checksum?") says the following:

Cryptcheck does exactly this.

The upload process does too - so it computes a hash of the data that is being read and at the end of the upload compares it to the hash produced by the remote end thus making a very strong integrity check.

Are you saying the upload process doesn't do this?

yes, this can be confusing.
but both @ncw and @Animosity022 are correct.

i know, let's have @thestigma use his high level of verbosity and clear the confusion.

Let's remember that in the case of encrypted files there are 2 hashes...

1 is the hash of the actual (encrypted) file that the server has on it's harddrive. This is very easy (pretty trivial CPU use) to calculate as part of the read-in on upload. I believe rclone does this on basically everything that actually gets transferred (not the same as --checksum which uses hashes for comparison of fiels before even deciding which need uploading). The server also automatically calculates a hash - so this means it is not hard to just compare the two and conclude that the transfer must have been successful before removing the local file. (you probably have a better chance of winning the lottery every day for a whole week than a false positive on most hash-types).

2 is the hash for the users file (that is "inside" the encrypted file). The server can not know this hash (unless we stored it in the crypt-format which is actually something that is planned as an enhancement later on).

To further complicate things there is a "nonce" used in encryption (something like a random seed number) that ensures that 2 identical files encrypted will not produce files with the same name or hash (to prevent anyone "guessing" the contents of the file this way). This means that to compare a local unencrypted file to an encrypted remote file (like cryptcheck does) we actually have to access each file, download the original secure nonce from it, and then encrypt-and-hash the original file with that nonce (rather than generating a new one as normal). This will make the hashes for the encrypted files match if the files "inside" are identical.

rclone currently dose not do this to "solve" --checksum for uncrypted->crypted hash-comparisons. Why? Well...

  • It would be quite slow
  • Simply not prioritized to be implemented yet as I understand, Nick suggested as much last time we talked about these things.
  • The same problem would be fixed by the "enhanced" crypt format that is in planning anyway.

I believe hash-comparisons are made on any files that are transferred to ensure the file arrives safely. This seems to be what you are asking about. This is pretty straightforward as we expect to see the same hash on the server as we calculated when we read it on for upload.

But Animosity is also correct when he says

because --checksum refers to comparisons of files (as the user sees them), and between an unencrypted location and an encrypted location the hashes of the files are different due to encryption.

I hope this made some kind of sense :slight_smile:
If nothing else I think you can take away from this that it is basically "impossible" for a file to randomly be corrupted between you and the server in a transfer.
As always with such complicated things, take what I say with a grain of salt. This is just my best current understanding of the topic

Actually @ncw , feel free to just skim this and call me out if there are any major mistakes here that need correcting. I really don't want to mis-teach others when it comes to very important data-integrity features.

Not all backends support checksums but for those that do that is an accurate description!

It would involve downloading the file, decrypting it then making the hash, so yes, very slow!

I think you did very well :star:

Thank you guys. That was very well explained, and I understand it all now.

Thank you for this great software. I've been using it almost daily for about two years. :slight_smile:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.