`rclone hashsum` only shows partial checksums

Hi,

when I do rclone hashsum md5 REMOTE:PATH on a s3-storage (provider is Other), there is not a checksum for each object. I got that checksum comes from the storage device via Etag, but I don't get why this is missing for some files. Could someone explain it?

I have found a work-around where a checksum is calculated locally for each object that has no checksum (therefore it has to be downloaded first, which is more costly of course than when checksum just comes from storage).

rclone version is

rclone v1.72.1
- os/version: ubuntu 22.04 (64 bit)
- os/kernel: 5.15.0-164-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.25.5
- go/linking: static
- go/tags: none

Regards
samthu

welcome to the forum,

please post:

  • rclone config redacted
  • debug output from the rclone hashsum command

Hi,

I already provided relevant config-parameters. I cannot show the debug log due to sensitive information.

Nevertheless, I observed another strange thing: I analyzed the xml-file given by --dump=bodies. Interesting part of the structure is like:

<Contents>
<Key>FILENAME</Key>
<LastModified>DATE_TIME_TIMEZONE</LastModified>
<ETag>"HASH"</ETag>
<Size>NUMBERS</Size>
<StorageClass>STANDARD</StorageClass>
<Owner>
<ID>XXX</ID>
<DisplayName>XXX</DisplayName>
</Owner>
</Contents>

The xml-file contains as much <Contents>-blocks as objects existing / expected. And in each block there is a <ETag>-field with non-zero content. Fine.

Some ETags are just hashes, like "ef9a764df8dc7b05d4d71b29305b5ddd", but some ETags have a hash, a dash and a number, like "ef9a764df8dc7b05d4d71b29305b5ddd-14". Those Content-blocks containing ETag-Fields with a dash have the same Key-Field as the filenames in rclone hashsum ...-output without hash.

With an example, it seems as if

<Contents>
<Key>my-file-without-hash.odt</Key>
<LastModified>DATE_TIME_TIMEZONE</LastModified>
<ETag>"ef9a764df8dc7b05d4d71b29305b5ddd-14"</ETag>
<Size>NUMBERS</Size>
<StorageClass>STANDARD</StorageClass>
<Owner>
<ID>XXX</ID>
<DisplayName>XXX</DisplayName>
</Owner>
</Contents>

would lead to a rclone hashsum ...-output of

                                  my-file-without-hash.odt

whereas

<Contents>
<Key>my-file-with-hash.odt</Key>
<LastModified>DATE_TIME_TIMEZONE</LastModified>
<ETag>"ef9a764df8dc7b05d4d71b29305b5ddd"</ETag>
<Size>NUMBERS</Size>
<StorageClass>STANDARD</StorageClass>
<Owner>
<ID>XXX</ID>
<DisplayName>XXX</DisplayName>
</Owner>
</Contents>

would lead to a rclone hashsum ...-output of

ef9a764df8dc7b05d4d71b29305b5ddd  my-file-with-hash.odf

So, is the ETag with dash illegal? Or maybe it's a side issue not handled well by rclone?

It possibly corresponds with these lines: rclone/backend/s3/s3.go at 4fd5a3d0a2440693d6d8bd2c036948b52cb7724b · rclone/rclone · GitHub

This generally represents a multi-part upload and the etag value can no longer be relied upon for a hash of the complete file.

In such cases, rclone adds a custom metadata attribute X-Amz-Meta-Md5chksum for storing the hash of the complete file but obviously this will only happen if the files have been uploaded by rclone itself and not another tool.

Further explanation here: Amazon S3