Emit warning if target file system does not support checksums, or

eagerbeaver · September 17, 2023, 6:33pm

Following Jojo's suggestion I'm making a new thread for this feature request.

I'm told that after a transfer is completed, rclone automatically checks the resulting file to make sure it was successful. However, some storage systems don't support checksums and rclone will only check size. However, rclone does not currently warn the user if the target storage system doesn't support checksums. Or, sometimes a source storage system and a target storage system don't interact well enough to be able to perform a checksum operation on the resulting file.

In these cases, it would be good to emit a warning to let the user know that the file may not be identical.

Animosity022 · September 17, 2023, 6:34pm

I was going to reply in the other thread, but rclone already does checking on transfers.

I'm not sure why you think/feel it doesn't.

Depending on the backend, it'll use use checksums and fall back to other methods if the backend doesn't support a checksum.

Do you have a specific backend/example or something that isn't working like you'd expect? If so, please share.

eagerbeaver · September 17, 2023, 6:35pm

I was told that this isn't possible with some storage systems, hence the suggestion to emit a warning if it cannot.

To explain how I got to the point of suggesting this, I do know that rclone runs checks on existing files when starting a transfer. So, looking at this, I thought to myself, "does it also do this same check operation after a transfer is completed, to make sure that the file does match exactly, or do I need to run a separate command after this finishes to make sure it all copied over correctly?"

eagerbeaver · September 17, 2023, 6:43pm

I've updated the OP to try and clarify what I meant by this feature request.

Animosity022 · September 17, 2023, 6:45pm

Again, any examples would be great as I think what you are asking is already there.

eagerbeaver · September 17, 2023, 6:53pm

Jojo seemed to imply that it wasn't and wanted me to make a thread for it, sorry.

I guess you can just delete this.

dia3olik · September 17, 2023, 7:20pm

I think he meant to have a clearer warning if the local/remote combo doesn't support checksums and fallback on size comparison only, to be aware of the potential risks of undetected bitrot/corruption.

I add that probably if that's the case it would be very handy to have also a report at the end of the rclone job like this:

"WARNING: xx files of xx were NOT verified using checksum."

Animosity022 · September 17, 2023, 7:23pm

Can you share an example of that scenario?

ncw · September 18, 2023, 9:19am

This would happen when transferring to/from any of the backends listed in the overview which don't have hashes

Name	Hash
Enterprise File Fabric	-
FTP	-
Google Photos	-
HDFS	-
HTTP	-
Mega	-
premiumize.me	-
Quatrix by Maytech	-
Seafile	-
SFTP	MD5, SHA1 ²
Sia	-
SMB	-
SugarSync	-
Storj	-
Uptobox	-
Zoho WorkDrive	-

Note that not all SFTP servers support hashes so I left that on the list.

It would also happen when transferring from a system with incompatible hashes, say from S3 to Onedrive. S3 supports MD5 but Onedrive support QuickXorHash.

The easiest thing to do would be to write a single warning, something like

NOTICE Note that transfers from X to Y are not verified with a hash

For each combination of X and Y that rclone comes across.

For crypt this is slightly complicated as the backend itself verifies the checksums if it can so that would need a bit of special logic.

Would this be too noisy? I guess we could do it at a lower log level, INFO or DEBUG.

dia3olik · September 18, 2023, 3:06pm

Hey @ncw thanks, yeah I think just at INFO or DEBUG would be perfect.

I think that in the future even crypt would benefit from this info because it's very useful to be sure the data is 100% checked with checksums or at least being aware if a particular combo or incompatibility doesn't provide the required safety net of checksums.

It would be a single warning but VERY useful imho

Animosity022 · September 18, 2023, 3:17pm

Notice is probably the best, but not sure anyone would do anything with it anyway.

What would you do with said message though as a result? Still trying to figure out the use.

dia3olik · September 18, 2023, 4:04pm

It would be something in the same category as the warning issued by some software asking for confirmation before doing something potentially dangerous.

Remotes support is growing and with it the possible confusion or not keeping track of which one is supporting checksums.

Moving files relying only on file size is dangerous and can lead to data corruption. A courtesy warning can go a long way in making people aware of this so also potentially drive them to change remote to a checksum supported one.

Animosity022 · September 18, 2023, 4:21pm

I think you misunderstood/missed my question.

The warning pops up - what do you do?

The wording too as nothing is 'dangerous' about copying against a remote that doesn't have a checksum as other checks happen. The checksum is a nice added benefit on top of things.

I'm trying to get a real life example of a combination that could occur, warning pops up, what do you do?

nielash · September 18, 2023, 10:36pm

A slightly-related suggestion: if a remote doesn't support checksums, and the user has supplied the --checksum flag, the fallback should be to the default of modtime and size, not just size. My rationale: if a user has requested a higher degree of accuracy than the default, it seems odd to give them a lower one instead. The name --checksum implies that it's more about the "I care about the integrity of this data" part, and less about the "I don't want to check modtimes" part. (There's already a --size-only flag for exactly that.)

Similarly, perhaps rclone check should fall back to rclone cryptcheck for crypt remotes, instead of rclone check --size-only.

Just my two cents! (I realize it would be a breaking change, and that this thread is more about checking for corruption post-transfer than comparing files pre-transfer.)

dia3olik · September 18, 2023, 10:51pm

Nice idea @nielash !

But probably the best approach would be for rclone to just stop and exit issuing an error if the --checksum option is used but not supported by remotes, no?

nielash · September 18, 2023, 11:21pm

I'm not sure if I'd go that far... there is something nice about being able to just append -c to every command in the hopes of a "best available" option, without having to think each time about whether the backend supports it. (Similarly, I usually also include -M, even though very few backends support metadata.)

A "best available" mode also makes it easier to script automations and reuse the same commands across different remotes, some of which may support hashes and some not. You might otherwise need to script an additional call to rclone backend features first and try to parse the output -- possible, but a lot of effort for the casual user.

system · November 17, 2023, 11:22pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.