Checking MD5 sums of files uploaded to S3

What is the problem you are having with rclone?

It is a (pair of) question(s) rather than a problem.

  1. Does rclone verify the MD5 sum of the data received by S3 against what is uploaded during upload, by default. I believe there is mechanism to do this in the S3 API, but I'm not sure if rclone uses it. I'd be grateful if anyone knows. I haven't looked at the source code, and I could but I have not looked at it at all before, so I expect someone else can answer this much more quickly than I can, so would be grateful if anyone can. I have searched the web and forum for an answer but not found anything.

  2. When I run rclone md5sum remote_name:bucket_name/ the output is produced too fast to have actually downloaded and calculated the MD5 sum of the files on S3, but I'm not sure how else this is done reliably. Can anyone explain how this is being done? I guess something must be being cached (either MD5 sum and version or at least version and whether it was uploaded by rclone on the same machine and whether it was a single part upload). I know that the etag of files uploaded as a single part upload are the MD5 sums, but I am not aware of any other way to get the MD5 sum of an object in an S3 bucket using the API. I'm not that familiar with the API TBH though, so mayube I am missing something.

Basically I want to verify uploads, but it isn't obvious to me whether just using rclone md5sum is enough for what I want. I guess it probably is, but I can't really tell without knowing more about it. Thanks for any help in advance.

What is your rclone version (output from rclone version)

I'm currently using

rclone v1.44
- os/arch: linux/amd64
- go version: go1.10.3

on Gentoo Linux, but my question is more general.

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Linux AMD64.

Which cloud storage system are you using? (eg Google Drive)

AWS S3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone md5sum AE_S3:arcolaenergy-backup-files-uploaded-with-rclone/

The rclone config contents with secrets removed.

[AE_G_Drive]
type = drive
client_id = 351885574745-g3m83hq41hm28en4p1p32opf3srhtrpd.apps.googleusercontent.com
client_secret = XXXREDACTEDXXX
scope = drive
acknowledge_abuse = true
root_folder_id = XXXREDACTEDXXX
token = {"access_token":"XXXREDACTEDXXX","token_type":"Bearer","refresh_token":"XXXREDACTEDXXX","expiry":"2021-03-11T16:41:24.513242227Z"}

[AE_S3]
type = s3
provider = AWS
env_auth = false
access_key_id = XXXREDACTEDXXX
secret_access_key = XXXREDACTEDXXX
region = eu-west-2
location_constraint = eu-west-2
acl = bucket-owner-full-control
server_side_encryption = AES256

A log from the command with the -vv flag

2021/05/01 19:44:14 DEBUG : Using config file from "/home/sipos/.config/rclone/rclone.conf"
2021/05/01 19:44:15 DEBUG : pacer: Reducing sleep to 0s
f240b4c7ecc7e892e8432174970b7978  nadjas_old_laptop.img.gz.gpg
a858ec8baf7fd3daa29bea3bb507d849  test.txt
2021/05/01 19:44:15 DEBUG : 4 go routines active
2021/05/01 19:44:15 DEBUG : rclone: Version "v1.44" finishing with parameters ["rclone" "md5sum" "-vv" "AE_S3:arcolaenergy-backup-files-uploaded-with-rclone/"]

Your version is dinosaur age old. You'd want to update that.

Yes, rclone checks checksums when it uploads.

It doesn't calculate it, it's pulled from the metadata on the object.

Rclone checks it and so does S3 since rclone provides the MD5SUM on upload.

Note that there are different rules for large files (bigger than --s3-upload-cutoff) - these have an md5sum supplied by rclone for the whole file which s3 stores, however each individual chunk is protected by an sha256 checksum.

It is reading the pre-calculated hash. This is either the Etag for non-multipart files or the hash rclone stored in the metadata for long files.

rclone check is the canonical way of verifying an upload - it checks the md5sums of all the files in the source and the dest.

You can also do rclone check --download which will download all the files and check they are all identical to the source if you are feeling super paranoid.