Yandex.Cloud (S3 API), unexpected MD5 hash mismatch (talk with tech support)

Thank you, @ncw, for sharing your thoughts and ideas on this complicated issue.

Your broad view of data integrity question gave me good starting point to create bullet-proof test case and find out which one of two guns is still smoking.

This was the very first step I took to check if my backup data was corrupted. Hashes for both local and downloaded files turned out to be the same. And this became a clear sign that, most likely, only metadata in the cloud (ETag) contains an error.

The following is a brief description of test case scenario that I carried out.


Prerequisites

  • Focus on a single file VTS_01_1.VOB stored on FreeBSD NAS
  • Create twin copy VTS_01_1dp.VOB and keep it on MS Windows PC only.

Sequence of steps

1. Calculate local MD5

1st env: FreeBSD NAS
tools: openssl dgst -MD5, rclone md5sum local:
file: VTS_01_1.VOB

2nd env: MS Windows PC
tool: HashTab
file: VTS_01_1dp.VOB

Step result
Identical for both files, in all environments and with all tools.

27ba7feaa6eaffda8b4c51be0375333d  VTS_01_1.VOB
27ba7feaa6eaffda8b4c51be0375333d  VTS_01_1dp.VOB

2. Upload to Yandex.Cloud

1st env: FreeBSD NAS
tool: rclone copy local: remote:
file: VTS_01_1.VOB

2nd env: MS Windows PC
tool: WinSCP copy
file: VTS_01_1dp.VOB

Step result
Both files were sent to the cloud from independent sources using independent tools.

3. Get cloud MD5 (ETag object property)

1st env: FreeBSD NAS
tool: rclone md5sum remote:
files: VTS_01_1.VOB, VTS_01_1dp.VOB

2nd env: MS Windows PC
tool: aws s3api list-objects
files: VTS_01_1.VOB, VTS_01_1dp.VOB

Step result
Identical for both files, in an independent environment, using independent tools.

36bd5ce679ce97325b9973c6a850a6ac  VTS_01_1.VOB	
36bd5ce679ce97325b9973c6a850a6ac  VTS_01_1dp.VOB

Overall result

Hash identical files, being uploaded to the Yandex.Cloud via independent paths and using independent tools, get an identical and invalid ETag (MD5) object property.

36bd5ce679ce97325b9973c6a850a6ac  invalid cloud ETag
27ba7feaa6eaffda8b4c51be0375333d  correct MD5

Solution

If I understand the test results correctly, there is no way to use rclone --checksum option while working with Yandex.Cloud.

Corrupted hash resulted in corrupted trust to the particular cloud service.

Anyway, my dialogue with Yandex tech support guys is not yet complete, and I will keep rclone community informed.