rclone v1.51.0
os/arch: linux/amd64
go version: go1.13.7
remote: Google Drive / Google Suite
command: rclone -P move localremote-crypt
I have two hard drives with almost exactly the same files on them. One had already been uploaded to Google Drive using 'rclone copy'. I then ran 'rclone move' on the second drive expecting it to move files that weren't already on Google Drive, and leave the rest. What it did instead (which I now know is the expected behavior) is move the files that weren't already on Google Drive, and delete the rest.
My remote on Google Drive is a crypt, so my question is; in what way did 'rclone move' check the files that were already on my remote before deleting them from my local? Did it just do a modtime + size check, or did it do some form of checksum? As it did almost 4TB of files in about 14 minutes, I'm guessing it's the first, but would like that confirmed.
It's too late for me to make a log because the files are deleted (so I can't easily run the command again). I guess I could try to recreate a similar situation?
until you are sure about what a command will do, especially move and sync, which will delete files.
use the flag --dry-run and use a log file with debug.
ncw (in response to "But after a second thought: why don’t we encrypt the files locally for the only purpose of computing a check sum, and compare this to the remote checksum?") says the following:
Cryptcheck does exactly this.
The upload process does too - so it computes a hash of the data that is being read and at the end of the upload compares it to the hash produced by the remote end thus making a very strong integrity check.
Are you saying the upload process doesn't do this?
Let's remember that in the case of encrypted files there are 2 hashes...
1 is the hash of the actual (encrypted) file that the server has on it's harddrive. This is very easy (pretty trivial CPU use) to calculate as part of the read-in on upload. I believe rclone does this on basically everything that actually gets transferred (not the same as --checksum which uses hashes for comparison of fiels before even deciding which need uploading). The server also automatically calculates a hash - so this means it is not hard to just compare the two and conclude that the transfer must have been successful before removing the local file. (you probably have a better chance of winning the lottery every day for a whole week than a false positive on most hash-types).
2 is the hash for the users file (that is "inside" the encrypted file). The server can not know this hash (unless we stored it in the crypt-format which is actually something that is planned as an enhancement later on).
To further complicate things there is a "nonce" used in encryption (something like a random seed number) that ensures that 2 identical files encrypted will not produce files with the same name or hash (to prevent anyone "guessing" the contents of the file this way). This means that to compare a local unencrypted file to an encrypted remote file (like cryptcheck does) we actually have to access each file, download the original secure nonce from it, and then encrypt-and-hash the original file with that nonce (rather than generating a new one as normal). This will make the hashes for the encrypted files match if the files "inside" are identical.
rclone currently dose not do this to "solve" --checksum for uncrypted->crypted hash-comparisons. Why? Well...
It would be quite slow
Simply not prioritized to be implemented yet as I understand, Nick suggested as much last time we talked about these things.
The same problem would be fixed by the "enhanced" crypt format that is in planning anyway.
I believe hash-comparisons are made on any files that are transferred to ensure the file arrives safely. This seems to be what you are asking about. This is pretty straightforward as we expect to see the same hash on the server as we calculated when we read it on for upload.
But Animosity is also correct when he says
because --checksum refers to comparisons of files (as the user sees them), and between an unencrypted location and an encrypted location the hashes of the files are different due to encryption.
I hope this made some kind of sense
If nothing else I think you can take away from this that it is basically "impossible" for a file to randomly be corrupted between you and the server in a transfer.
As always with such complicated things, take what I say with a grain of salt. This is just my best current understanding of the topic
Actually @ncw , feel free to just skim this and call me out if there are any major mistakes here that need correcting. I really don't want to mis-teach others when it comes to very important data-integrity features.