Check with Md5chksum

fktssm · June 7, 2024, 3:57pm

What is the problem you are having with rclone?

When I use rclone check sour:bucket dest:bucket some files are not being checked.

The method I use to connect: s3

After investigating the problem, I found that the source returns the md5 hash to me in Etag, and the destination returns it through the X-Amz-Meta-Md5chksum field in base64 format. After searching the forum, I found that many people had such a problem (2020-2023).

My current version is rclone 1.66.0. Apparently, there has not yet been an update that, in case of receiving an Etag with a tail of "-30" (or the like), automatically tried to take md5 from X-Amz-Meta-Md5chksum and convert it from base64 to hex.

Maybe there is some parameter that will solve this problem?
Initially, I wrote in support of the source and destination, but they threw up their hands. I'm afraid here is the last hope.

asdffdsa · June 7, 2024, 6:46pm

welcome to the forum,

just some random idea.
on the dest:

maybe try adding use_multipart_etag to the dest remote config
maybe try taking the output of rclone check --download and feed that to another rclone command, such as rclone copy
maybe try using hasher in the middle.

fktssm · June 10, 2024, 9:07am

Thanks for the advice.
use_multipart_tag = true didn't help, "dest" still returns the wrong hash.
rclone check --download requires too many resources, because large objects are stored (not suitable)
hasher, unfortunately, also returns the wrong hash

To analyze md5, i use the command rclone cat dest:path/file.txt --head 1 -var --dump headers 2>&1 | grep -v "Excluded" in it "X-Amz-Meta-Md5chksum" corresponds to md5 in base64 format.
But when executing the rclone md5sum command dest:path/file.txt --base64 -v2>&1 | grep -v "Excluded" i get a completely different md5. (exactly the same as when using rclone check)

Tell me, what other commands can be used for analysis to find the problem? I do not understand how a completely different md5 is formed. (I suspect that it is from a chunk)

fktssm · June 10, 2024, 11:15am

I tried using the --dump bodies option and searching for my file in the output using rclone cat.
It turned out that in <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> it contains a completely different md5 than it actually is.
As far as I understand, rclone takes it exactly when checking, because it matches what md5sum and rclone check output
The copying itself was done using rclone copy.

How to solve this problem?

asdffdsa · June 10, 2024, 1:20pm

--- post the output of rclone config redacted
--- pick a single file that has the issue and post complete debug log for
rclone check sour:bucket/path dest:bucket/path --dump=bodies --include=file.txt

fktssm · June 19, 2024, 11:05am

It seems I found the cause of the error.

According to this documentation: Amazon S3
When performing rclone check, for large objects additional manipulation with reading headers is required, because md5 is not returned in listbucketresult.

Note that reading this from the object takes an additional HEAD request as the metadata isn't returned in object listings.

I don’t see any commands that would do this during rclone check (querying the “X-Amz-Meta-Md5chksum” field from the header and checking md5 with the translation from base64 to hex).

Tell me, is it possible to automate this?

system · July 19, 2024, 11:05am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.