Fascinating info, thank you! It makes me think that perhaps I ought to be using the --download
option in check
more than I currently do...
I was actually looking at this recently for something related to bisync. One thing that troubles me a bit is the way it silently ignores blank hashes, even if it doesn't expect them to be blank. (for example, here and here.) I think the intent was to allow comparison with something like Google Docs where the lack of hash is expected on a remote that otherwise supports them -- but it looks to me like it's letting unexpected blanks through too. This came up because while I was testing the --compare
PR, I noticed that Google Drive in particular will often (but not every time) return a blank MD5 for a recently uploaded file. My guess is this is because it is pending in some server-side async queue for processing for a short while after uploading (just a guess -- could be wrong.) If I'm right, it seems like there's a possible (but unlikely) scenario where a file is corrupted on transfer but not detected immediately because hash is blank. A subsequent cryptcheck
would probably spot this, as I've not yet seen a hash that stays blank forever (but that does require keeping a copy of the original file after uploading.) It would also probably be spotted on download (but what if that's 10 years from now...) It seems to me that there probably ought to at least be an INFO
log (if not ERROR
) for unexpectedly blank checksums... it's actually yet another project I started tinkering with but then decided to spare you from for the time being (as I know the last thing you need right now is more PRs from me! )
Also, to be clear -- this has never actually happened to me, it's purely tinfoil-hat speculation on my part
This was also a fascinating read. FWIW, I'd vote for putting the metadata in the file header (not a sidecar), and including BOTH versions of the hash (the decrypted original and the encrypted one used by cryptcheck
.*) That way rclone can easily read the original back when needed, while also raising an error if the hash reported by the remote does not match our expected value for the other one. This assumes that nothing but rclone can edit the file -- but since this is crypt
, that is kind of already the case.
(*I realize one potential problem with this is that the file would need to somehow contain its own checksum...I'm not sure how possible that is, at least without crazy amounts of computing power!)
The unsupported metadata use case is also quite interesting... it would help with the length limit issue I ran into recently on the xattrs
ticket! Is this something you're looking for help with? I'd potentially be interested in working on it.