Not checking md5 on s3 backend in copy or sync?

Hi,

I've been doing going with some issues with copy or synchronization regarding MD5 checksums on S3 backends - as far as my tests goes, it seems that these MD5 are not being checked when using rclone copy or sync, but are checked when using rclone check.

Tests go like this:

  1. Created files file1 and file2, with same size and same timestamp, but different content. Use md5sum to verify the md5 of each file.
  2. rclone -P copy folder-with-testfiles/ my-s3-backend:mybucket/myfolder/
  3. On AWS S3 console, check that files were correctly uploaded, with ETag matching MD5 of each file. Use aws cli to get the same results (noticed that the only metadata is mtime, which is identical in both files, as expected)
  4. Swap the 2 files (which as mentioned have are different but have the same timestamp and size).
  5. rclone copy again.

I expected the files to be sent again to S3, but aren't. Thus, it seems only timestamps are being checked.

I tested with DEEP_ARCHIVE and STANDARD storage types, with AES encryption and no encryption, with rclone versions versions 1.48.0 and 1.49.2, with copy and sync rclone comands.

However, when I do rclone check, I get what I expected all along:
file1: MD5 differ
file2: MD5 differ
S3 bucket my-bucket/my-path: 2 differences found

On the other hand, if I do similar tests, but instead of swapping the files I just touch them to change the timestamp, they do get uploaded.

Am I missing something? Is there any way to enable MD5 checking when using rclone for copy or sync?

Regards.

Well, just found out that using flag -c I get the behavior I was expecting.

1 Like

The default behaviour is to check size+modtime to see if a file has changed (this is also how rsync works). It is cheap and easy and pretty reliable. However you have -c / --checksum should you wish to check size+checksum instead.

Note that reading modtimes on s3 takes an extra HEAD transaction so using --checksum you are winning there. However the checksums will take longer to calculate on local disk so you'll use more local CPU.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.