But you don't use Azure do you? Large files are handled differently pr backend...
And also I don't think regular transfer chunking is the same as parting (to overcome inherent backend limitations).
Rclone already does transfer chunking but I think that is basically like a "resume" that writes to the same destination. In that case the server should be able to make a hash.
I think "parting" (but also called chunking depending on where you read) means you actually save to multiple files to overcome a max-size limit. In that case the server often/always(?) can't make a hash for the entire thing.
This new backend does that I believe: https://tip.rclone.org/chunker/
Besides, even if you can't get a hash on the entire file if it needs to be split, you can still hash each part and make sure they arrive unmolested. That should equally guarantee that the full file is also correct. If that logic is implemented in that backend I don't know, but it would make sense if it were. I don't see any big obstacle to doing that.
Take all of that with a gran of salt and you may have to get NCW to verify. My understanding is not complete on this topic.
Finally, don't be so paranoid. The transport-layer has basic error-detection already. the odds of there being an in-flight corruption that can't be detected except for a full hash is pretty darn low from a mathematical/statistic perspective. hashes have many utilities, but I wouldn't say they are required to have stable and error-free transfers. Not unless it's truly mission-critical data.
I'm pretty sure this is what normally happens, for all "normal" files - ie. the ones that do not need special handling due to size limitations. That's where hashing stuff becomes a little trickier.
On Gdrive which I use, the max size is very large, so all those hashes should be generated by the server (presumably calculated in a rolling fashion as they are received). This is generally how filesystems which include hashing metadata work - and that's basically what cloud-backends use. (it's not common in end-user systems but you could run it even on a Windows system if you were willing to use a different filesystem).
Once the file is uploaded, the other side calculates a md5 on the file and stores it.
It compares the local md5 to the remote md5 on the provider so if they match, it's the same file.
If you read a bit further down, you can use --download and compare the file all the way.
If you supply the --download flag, it will download the data from both remotes and check them against each other on the fly. This can be useful for remotes that don’t support hashes or if you really want to check all the data.
but as per this: MD5 sums are only uploaded with chunked files if the source has an MD5 sum. This will always be the case for a local to azure copy
to me, rclone takes the md5 of the local file and turns that into a metadata of the remote file.
we could imagine a bug in rclone that miscalculates the md5 of the local file and turns that into a md5 metadata of the remote file and in that case, we have no way to know the md5 of the real remote file.
But this is a spesific exception to Azure as I noted, not the general rule. It's specifically under the "limitations" section for the azure backend. You're on Wasabi aren't you, so is this even relevant to you?
As both me and Ani have already said - the normal way is that the hash for the transferred file is generated on the server-side. You then compare it to local-side. If they match they must be bit-identical.
EDIT: If the question is spesific to Azure (which hasn't really been clear), then I can't really tell you any more than what the documentation states. The phrasing would seem to indicate that it does copy the metadata from local - presumably due to some technical limitation of Azure.
Any other backend that does not specifically list it as a limitation... ie. most of them.
The limitation may be due it being a blob-type storage. Do we have any more of those in the backend-list? May be worth checking if that is a thing.
And of course, Azure will do this too for non-chunked files. The limitation is spesific to chunking according to the docs.
This. AKA server-side.
I'm not sure why you refuse to trust me on this - this is how it's normally done UNLESS there are spesific limitations and workarounds needed (typically to work around some maximum file-size restriction in either the file-system or the backend servers).
I guess you have to either trust that the backend implementors will note such limitations when they exist - or else do some research on your own and verify that the systems a given provider candidate use can support a true max-filesize (ie. without parting) that is above what you realistically require. If they do then they should never need to do any workarounds related to this.