I'm told that after a transfer is completed, rclone automatically checks the resulting file to make sure it was successful. However, some storage systems don't support checksums and rclone will only check size. However, rclone does not currently warn the user if the target storage system doesn't support checksums. Or, sometimes a source storage system and a target storage system don't interact well enough to be able to perform a checksum operation on the resulting file.
In these cases, it would be good to emit a warning to let the user know that the file may not be identical.
I was told that this isn't possible with some storage systems, hence the suggestion to emit a warning if it cannot.
To explain how I got to the point of suggesting this, I do know that rclone runs checks on existing files when starting a transfer. So, looking at this, I thought to myself, "does it also do this same check operation after a transfer is completed, to make sure that the file does match exactly, or do I need to run a separate command after this finishes to make sure it all copied over correctly?"
Hey @ncw thanks, yeah I think just at INFO or DEBUG would be perfect.
I think that in the future even crypt would benefit from this info because it's very useful to be sure the data is 100% checked with checksums or at least being aware if a particular combo or incompatibility doesn't provide the required safety net of checksums.
It would be something in the same category as the warning issued by some software asking for confirmation before doing something potentially dangerous.
Remotes support is growing and with it the possible confusion or not keeping track of which one is supporting checksums.
Moving files relying only on file size is dangerous and can lead to data corruption. A courtesy warning can go a long way in making people aware of this so also potentially drive them to change remote to a checksum supported one.
A slightly-related suggestion: if a remote doesn't support checksums, and the user has supplied the --checksum flag, the fallback should be to the default of modtime and size, not just size. My rationale: if a user has requested a higher degree of accuracy than the default, it seems odd to give them a lower one instead. The name --checksum implies that it's more about the "I care about the integrity of this data" part, and less about the "I don't want to check modtimes" part. (There's already a --size-only flag for exactly that.)
Similarly, perhaps rclone check should fall back to rclone cryptcheck for crypt remotes, instead of rclone check --size-only.
Just my two cents! (I realize it would be a breaking change, and that this thread is more about checking for corruption post-transfer than comparing files pre-transfer.)
I'm not sure if I'd go that far... there is something nice about being able to just append -c to every command in the hopes of a "best available" option, without having to think each time about whether the backend supports it. (Similarly, I usually also include -M, even though very few backends support metadata.)
A "best available" mode also makes it easier to script automations and reuse the same commands across different remotes, some of which may support hashes and some not. You might otherwise need to script an additional call to rclone backend features first and try to parse the output -- possible, but a lot of effort for the casual user.