Encryption + --checksum + koofr does not work

I am trying to syn a directory to koofr using encryption and --checksum
It seems to work without encryption but not with encryption

using v1.49.5

using Windows 10 for Workstations 64

using koofr

rclone sync ..\cwrsync_5.7.2_x86_free koofrCrypt:cwrsyncCrypt -vv --checksum

output:

2019/10/17 14:17:43 DEBUG : rclone: Version "v1.49.5" starting with parameters ["rclone" "sync" "..\cwrsync_5.7.2_x
6_free" "koofrCrypt:cwrsyncCrypt" "-vv" "--checksum"]
2019/10/17 14:17:43 DEBUG : Using config file from "C:\Users\jilic\.config\rclone\rclone.conf"
2019/10/17 14:17:45 NOTICE: Encrypted drive 'koofrCrypt:cwrsyncCrypt': --checksum is in use but the source and desti
ation have no hashes in common; falling back to --size-only
2019/10/17 14:17:45 DEBUG : README.cwrsync.txt: Size of src and dst objects identical
2019/10/17 14:17:45 DEBUG : README.cwrsync.txt: Unchanged skipping
2019/10/17 14:17:45 DEBUG : README.rsync.txt: Size of src and dst objects identical

more of the same messages....................................

This is the problem...

The crypt backend doesn't support checksums unfortunately.

Koofr people told me that they use rclone and provide md5 hash for all files on their storage and that it might work with rclone. rclone does work with --checksum if not encrypted, see below.

$ rclone sync ..\cwrsync_5.7.2_x86_free koofr:cwrsync --checksum -vv
2019/10/17 15:19:11 DEBUG : rclone: Version "v1.49.5" starting with parameters ["rclone" "sync" "..\cwrsync_5.7.2_x86_free"
"koofr:cwrsync" "--checksum" "-vv"]
2019/10/17 15:19:11 DEBUG : Using config file from "C:\Users\jilic\.config\rclone\rclone.conf"
2019/10/17 15:19:12 DEBUG : README.cwrsync.txt: MD5 = 7a7aa4e38f21790c19f74f6a30e9273d OK
2019/10/17 15:19:12 DEBUG : README.cwrsync.txt: Size and MD5 of src and dst objects identical
2019/10/17 15:19:12 DEBUG : README.cwrsync.txt: Unchanged skipping
2019/10/17 15:19:12 DEBUG : README.rsync.txt: MD5 = 43c5583be00f8aaed32345776ff6241f OK

The problem associated with crypt as checksums go is that when you upload an encrypted file the checksum on the server will be for the encrypted file. It can't calculate the hash of the true file inside the encrypted one, so this just isn't something you can compare.

That is what NCW means when he says crypt doesn't support checksum.

Buuut... you can work around this. If you encrypt the data before you upload (assuming same crypt key) then suddenly you have matching hashes on both sides again and --checksum will work fine then.

So if you regularly sync a lot of local files and need checksum, it mgiht be advantageous to store them encrypted locally. Kinda hard to do that when it's user-files, but there are other use-cases where it would be not much of a bother. But to be fair - checksum is usually not necessary. size\modtime tends to do a pretty good job.

I am hoping that eventually we could bake in this functionality so we could crypt+hash local files to compare to a crypted remote's hashes whenever this is required. In theory this should be possible to do on-the-fly I think. It would require that you read the contents of all files so it will use significantly more HDD activity and and a little more CPU cycles for the hashing, but it would be nice to have this option for when you need exact checks.

@ncw What do you think?

The reason why I wanted to use checksum is because I noticed that something, somehow, somewhere causes occasional time stamp difference between some source and destination files and files are marked as changed.

Encrypting everything at the source and then syncing was my idea before I discovered rclone and and its encryption capabilities.

rclone seems to be comparing MD5 of unencrypted local files and synced encrypted files. Of course they are not going to check as the same files. When synced files are not encrypted, as I showed, --checksum works as expected. Three ways to make this work from a script come to my mind:

  1. Store hash values for all encrypted files locally and compare them with hash values of remote, encrypted files. Not a good idea.

  2. Use rclone to download (unencrypted) files and compare hashes. Not a good idea.

  3. Use rclone to encrypt each file locally, calculate the hash and compare it with the hash of the remote (encrypted) file. This could work but it should be part of rclone. The remote site cannot decrypt files and provide hash values.

I'd like to hear from others if --checksum works with encryption and some other online storage providers. If it does, I don't see why it could not work with koofr as well.

If you want to check checksums in a crypted remote then you need to use rclone cryptcheck

This is essentially your option 3)

It is quite an expensive operation as it has to download the nonce from the encrypted file then use that to encrypt the local file to produce the hash.

I am not an encryption expert but I was afraid of something like that. I don't have time to play with this right now, I'll have to live with size+date for now.

Just one more question, the docs mention that --checksum can be used with services that support it. Does this mean that there is a service which supports it with encryption? From what you explained to me it does not seem possible doing it in an efficient way, right?

Thanks everybody for the responses.

The encrypted remote does not support checksum.

If you have a file and encrypt it, that changes what the file looks like so you can't match it with a md5sum.

You'll not find an answer to that other than decrypting the file and comparing like to original.

Well, yes. It is obvious that encrypted and non-encrypted files will not pass
checksum. The question is/was how do we get around it in an efficient
way without downloading the entire file, decrypt it and then run the checksum.
From what everybody contributed, it seems that there might not be such a way
efficient in terms of communications, storage and processing power and time.

Thanks everybody. We can put this to rest for now.

rclone cryptcheck only downloads the nonce which is 16 bytes from each file. It uses this to encrypt the local file and hash the encrypted file to check the checksums.

It would be possible to do this with (let's say) a --cryptchecksum option which you could use in syncing...

If you wrap remote <- crypt <- chunker and use the md5all or sha1all options in the chunker then this will store hashes for each file as additional metadata which is stored as a small file.

I haven't tried this yet myself but it should work!

You really can't without downloading it, which is the crux of it.

The encryption is being done by rclone.

The provider on the other side has no idea about the encryption so it can't do anything.

The only way to get around that is to download the file, decrypt and run the checksm on it if you want to be sure that the file is the same.

The provider would have to offer the encryption and be integrated into rclone for something along the chain to work.

That sounds like a great idea, and very much along the lines of what I was thinking of (except that I forgot that we needed to download the nonce). Cryptcheck is nice, but you can't use that in move/copy/sync , so it is way too much hassle to try to use to guarantee a fault-less transfer. You'd have to use several commands + manually track and re-upload any failures which is obviously too impractical.

I would definitely support such a feature and have use for it myself. Let me know if you want me sum up the idea in an issue. @jojo Or would you like to?

There may be an issue for it already... But yes one of you could make ask issue that would be great :grin:

Go ahead and sum it up. You have much more experience with rclone than me. Thanks.

Ok, I'll put it on my to-do and get it done within the next few days.

When I ran into rclone I was impressed by list of features and flexibility. After I used it a little bit, and I mean a little bit, I started finding missing features that would be nice to have. For example, I hard link duplicates (deduplicate) on my server and when I sync it to remote it seems to copy them as regular files. I understand that adding an option to recognize hard links and do something about it besides blindly copying them is much to ask but it would surely be a nice to have. Actually, when I think about it, a full deduplication would have to be implemented since source files could be on different networked mounts and inodes are not usable any more to detect hard links. Although I hate commenting on other developers' design decisions I'll do it here. Maybe many of these feature requests and lots of flags could be avoided by allowing for hookup procedures, something like plugins, dynamic libs, interprocess communications, etc. Maybe some of public domain deduplication packages could be made part of rclone that way.

There is some stuff in the "local filesystem" remote relating how to treat links:
https://rclone.org/local/

But it believe it only talks about sym-links, so I'm not sure how hardlinks are included in that, if at all.
I would first check with @Animosity022 if he knows if rclone has any support for this already, because I know he at least uses some hard-linking in his system with mergerFS.

But assuming that there is no existing support, you might make an issue for it since you seem to have a grasp on the basics of what would be needed:

Lastly, if you are very motivated and know how to code go (or care to learn - it's similar to java and C#), then the code is open and anyone can make contributions. Nick is always very glad for the assistance and will guide you as needed (because he's just a swell guy like that :wink: )