How do we really know a file uploaded is really the same as the local flie

i never thought such a basic and important question would not have an clear answer.

i am not refusing to trust you, and i mean no disrespect, but you have not given me anything to trust.
and you did write
Take all of that with a gran of salt and you may have to get NCW to verify. My understanding is not complete on this topic

i just would like to know when rclone is just copying the local md5 to remote metadata and when cloud provider is calculating the checksum.

thanks

That was mostly in relation to the chunker backend and the technical difference between parting and chunking as part of a transfer.

For the main (repeating) question here, I don't know how to be clearer and more explicit. It seems like you won't be satisfied unless you get hold of who whoever coded the Azure backend. In that case, I can't help you further :frowning:

@ncw,

is there a way for rclone to tell us if it is copying the local md5 to remote metadata or relying on the cloud provider to calculate the md5 after upload?

when rclone is doing rclone check, does it always use the metadata that rclone copied from the local file or does rclone check use the md5, as calculated by the provider AFTER the file has been uploaded?

thanks

i am sorry but i think i am asking a real basic question about how rclone works?

I think the question here is - what answer are you willing to accept..?
Does it need to be formal documentation? I'd just go look at the code if so. Everything is there on github. If rclone never sets the hash then it's not transferring any either.

This depends on the provider... I think cloud providers generally compute checksums on upload and store them along side the data. There are periodic processes that verify that checksum (I'm reasonably sure I read this about S3).

For example azureblob provides md5sums calculated by them, for all non chunked files. For chunked files it doesn't provide an md5sum so we have to upload one as metadata. However you might be comforted by this note from the source

	// Compute the Content-MD5 of the file, for multiparts uploads it
	// will be set in PutBlockList API call using the 'x-ms-blob-content-md5' header
	// Note: If multipart, a MD5 checksum will also be computed for each uploaded block
	// in order to validate its integrity during transport

If you want to know more check out the docs and look for the Content-MD5 info.

Data integrity is really important to rclone. rclone check does a pretty good job, but if you are paranoid about your data then rclone check --download is the best check.

If you upload a file with a Content-MD5 then azure will check the MD5 is correct when the file arrives.

Azure then keep that Content-MD5 and return it to you when you ask. If you don't supply one then they calculate it during upload.

All of them that support checksums calculate it themselves or check it during upload I believe - it isn't just stored as metadata.

There are a few exceptions I can think of - S3 large files the MD5 is stored as metadata, however the file chunks are all sha256 protected so if rclone gets a good upload we can be confident that the data arrived.

thank you very much

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.