Minio to s3 MD5 Mismatch [minio compression]

Minio Server running on windows


I’m trying to clone a minio source to a aws s3 destination

rclone sync my_minio_source: my_aws_destination:

This was working perfectly until the minio server I am trying to copy from turned on compression.
I have reproduced the error on my own minio server to confirm that the compression is what triggered this error.

The error seems to be caused by aws calculating a md5 checksum that is different to the one rclone is telling it to expect.

What is the best way to get around this?
I’ve tried this flag which did not do anything [–s3-disable-checksum]

Minio compression is a bit funny in that the files are compressed on disk but when you download them you get the uncompressed version. My suspicion is that this is where the mismatch comes from.

ERROR : site-1v2-phase-a0-traffic-movements.geojson: Failed to copy: s3 upload: 400 Bad Request:

<?xml version="1.0" encoding="UTF-8"?>
  <Message>The Content-MD5 you specified did not match what we received.</Message> 

I also have ran this to confirm the md5 checksum where “himas-test-bucket” only has the uncompressed file.
rclone md5sum local-minio:/himas-test-bucket/

which returned

c6a9a32896b1b0708b76df1823d21ae3 site-1v2-phase-a0-traffic-movements.geojson

this checksum c6a9a32896b1b0708b76df1823d21ae3 is xqmjKJaxsHCLdt8YI9Ia4w== when you convert hex to base64.

This is minio returning the checksum of the compressed file with Content-Encoding: gzip or similar. There are some issues about this sort of thing, but I haven’t found a satisfactory solution.

What is happening here is that s3 needs a hash before doing the upload so rclone asks the source to provide one. If it can it uses it rather than calculating it again.

You could probably fix this by setting --s3-upload-cutoff 0 which will make all files be uploaded with multipart uploads which doesn’t need the hash in advance.

Hey ncw, thanks for looking into this.

That flag does seem to work for some files but not others and throws a similar but different error.

"2019/05/08 09:24:51 ERROR : site-1v2-shape-all (3).geojson: corrupted on transfer: MD5 hash differ “b8ee42ceb74a740554613e3f5201564e” vs “2366cc1c6cba7429ca07a54d21588b32”

(this is a different file)

I did however find that minio provides a client executable that can do mirroring between minio and s3 which I might just have to use instead.

mc.exe mirror minio/ s3/

Just an update on this, it seems when I use --s3-upload-cutoff 0 the files that are failing are those that are less than some size x. When I use --s3-upload-cutoff 0 --ignore-checksum all files are now being copied. I don’t quite know if this has some side effects I am unaware of but it seems to have done the trick.
Again, thanks for your help ncw

The original error was S3 checking the checksums, that is rclone checking the checksums! Using --ignore-checksum will fix this as you’ve discovered.

Glad you got it working.

I hit the same when trying to move SSE objects from AWS S3 to IBM COS. The solution works, but makes me nervous about data integrity since the flags seem to drop the checking functions. Reading the issues, I see a possible fix is roadmapped, but not getting traction. One comment worried about the extra overhead of the HEAD call, but if it solved data integrity, that overhead is worth it. If I understand it correctly, this solution is also extra overhead as now all objects are multipart.

Which issue is that? Would you like to help with it?

There is another request or two overhead for uploading everything with multipart so it isn't massive but it could be noticeable.

I'd love to have some help with this - there aren't enough hours in the day to keep up with everything at the moment!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.