Are MD5 checksums fetched in a batch?

Pavel_Dudko · April 12, 2024, 6:50pm

Hello guys! RClone is a great peace of software, I've laying with it for several days, reading the docs. But I have a conceptual question I have not understood yet.

I want to minimize number of requests (as I use S3 Glacier Deep Archive storage)
I want to figure out whether MD5 checksum is stored in Metadata (that means 1 additional request per each file) or it is returned in a batch, that gets even more efficient when using --quicklist.

I refer to docs and see contradictory statements. Here

It is stated

reading this from the object takes an additional HEAD request as the metadata isn't returned in object listings.

For objects sized below --s3-upload-cutoff ETag is used (it is said there), not sure if it means that these are fetched in butch. But for larger objects MD5 is stored in Metadata for sure.

A little bit below it is said that --checksum

Uses no extra transactions

How that is possible?

Also in bugs section Amazon S3
It is said

uploading multipart files via the S3 gateway causes them to lose their metadata

and as a workaround against not having modtime it is suggested:

This can be worked around with --checksum

The same contradiction again. Please help to solve this confusion.

Also I wonder what command rclone md5sum --qucklist ... does on S3 (Deep Archive) backend. Does it retrieve all checksums in one go?

Thanks for reading this

asdffdsa · April 13, 2024, 2:22pm

welcome to the forum,

with --fast-list, 1,000 files per single api call

to confirm for yourself,
rclone md5sum remote: --dump=responses --fast-list -vv
rclone lsf remote: --format=psh --dump=responses --fast-list -vv