Adding MD5SUM on S3

What is the problem you are having with rclone?

Some files (either because of their size or because they were uploaded with another utility) do not have checksums in their S3 metadata (actually Wasabi)

What is your rclone version (output from rclone version)

rclone v1.57.0

  • os/version: darwin 10.15.7 (64 bit)

  • os/kernel: 19.6.0 (x86_64)

  • os/type: darwin

  • os/arch: amd64

  • go/version: go1.17.2

  • go/linking: dynamic

  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Wasabi

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone check wasabi:drix08/OECI_DriX_Data_Public/DRIX/logs/mission_logs```

The rclone config contents with secrets removed.

[wasabi]

type = s3

provider = Wasabi

access_key_id =XXXXX

secret_access_key =XXXXX

region = us-east-2

endpoint = s3.us-east-2.wasabisys.com

A log from the command with the -vv flag

rclone check wasabi:drix08/OECI_DriX_Data_Public/DRIX/logs/mission_logs wasabi:drix08/OECI_DriX_Data_Public/DX082101/drix08/02-raw/drix/mission_logs -P -vv
...
2021-11-16 11:52:28 DEBUG : 31-03-2021-12-57-59.293_IdleBoot/2021-03-31-14-37-15_2.bag: OK - could not check hash

2021-11-16 11:52:28 DEBUG : 31-03-2021-12-57-59.293_IdleBoot/2021-03-31-12-58-11_0.bag: OK - could not check hash

2021-11-16 11:52:28 DEBUG : 31-03-2021-09-51-51.105_IdleBoot/2021-03-31-11-37-59_2.bag: OK - could not check hash

2021-11-16 11:52:28 DEBUG : 31-03-2021-09-51-51.105_IdleBoot/2021-03-31-10-45-54_1.bag: OK - could not check hash

2021-11-16 11:52:28 DEBUG : 31-03-2021-09-51-51.105_IdleBoot/2021-03-31-09-52-04_0.bag: OK - could not check hash

2021-11-16 11:52:28 DEBUG : 31-03-2021-15-46-33.081_Idle/2021-03-31-15-46-36_0.bag: OK - could not check hash

2021-11-16 11:52:28 NOTICE: S3 bucket drix08 path OECI_DriX_Data_Public/DX082101/drix08/02-raw/drix/mission_logs: 0 differences found

2021-11-16 11:52:28 NOTICE: S3 bucket drix08 path OECI_DriX_Data_Public/DX082101/drix08/02-raw/drix/mission_logs: 347 hashes could not be checked

2021-11-16 11:52:28 NOTICE: S3 bucket drix08 path OECI_DriX_Data_Public/DX082101/drix08/02-raw/drix/mission_logs: 1242 matching files

I apologize, I could not find an answer to this question - but I fear I've missed it and am wasting everyone's time.

I'm trying to check the contents of two directories on a Wasabi S3 bucket using the "check" command to ensure I have made no blunders in my re-organization of data. However for many files I receive "could not check hash". In reading other posts on this forum I've learned that the client calculates the hash on upload and this is stored in the file's metadata, and that not all clients calculate the hash (so some may be missing), and finally, that for some large files hashes are not calculated even by rclone. Also on this forum I learned how to use rclone md5sum to check which files are missing hashes. and I have verified that this is indeed the problem. Some of my files were initially uploaded with another utility (cyberduck). Here are my questions:

  1. Is there a way to use rclone to calculate the hashes and populate the metadata for files already in the s3 bucket so that a proper rclone check can be performed without error?
  2. Is there a way to force rclone to calculate them, even for large files when uploading?
  3. What is the file size limit above which rclone will not calculate the hash automatically.
    Thanks.

hello and welcome to the forum,

i also use wasabi.

  1. as far as i know, not possible to change metadata once a file is uploaded without this caveat
    add-object-metadata
    "This action creates a copy of the object with updated settings"
    note 1: i have tested this, to edit metadata will trigger a copy operation and create a new version, if versioning is enabled.
    note 2: even if you could edit the metadata without triggering a copy, rclone sets x-amz-meta-md5chksum as read-only.
  2. one way or another, the hash is saved when the file is uploaded.
    hashes
  3. as far as i know, there is no limit.

You could probably do it with use of the hasher backend to read the checksums then using server side copies to add the hashes... Maybe!

The S3 protocol allows a server side copy to adjust the metadata so in theory it is possible.

Rclone will do its best to calculate them but if the source backend can't give it an MD5 then it can't. For example the local backend can produce any kind of hash, whereas the crypt backend can't produce any hashes.

It might be possible to wrap the source in the hasher backend and use its facilities to generate all the MD5sums before the transfer.

All files below --s3-upload-cutoff should have an MD5SUM generated by S3 itself. So you could just set --s3-upload-cutoff 5G for all files below 5G to have MD5SUMs. Note that 5G is the maximum size possible here. This will make the uploads slower but may be what you want.

keep in mind that wasabi has a 90 day retention period, tho in my case it is just 30 days.

if the file is server-side copied and
--- versioning is disabled
--- the filenames stay the same.
then there will be two copies of the file.

  1. the original file which is now deleted but still subject to the retention period.
  2. the copied file.

Foo...

In general server side copies in S3 are quite expensive operations. If you are keeping revisions doubly so!

I just remembered why we don't MD5 the file as it is being uploaded and add it to the metadata of the object - to avoid that expensive server side copy is precisely the reason.

i did some testing with wasabi, using s3browser

i changed the value of x-amz-meta-md5chksum and it did NOT trigger a server-side copy.

as i understand it, this is not the same behavior as aws s3 so i contacted wasabi and they responded.
"Thanks for confirming that you are using S3 browser. If you have Versioning enabled, metadata modifications will create a new version of the object resulting in additional charges. But if Versioning is not enabled, metadata changes will not create an additional version. I tested this behavior using S3 Browser. So the charges will be based on what you have stored with us."

i did more testing and found sometimes that a server-side copy was triggered.
tho i think that is a quirk of s3browser.

  • If the file size is smaller than the value to trigger a multipart upload, then the header is updated WITHOUT triggering a server-side copy
  • If the file size is larger than the value to trigger a multipart upload, then the header is updated WITH triggering a server-side copy.

so i contacted wasabi again and got
"Thanks David for the additional testing from your end. I have reached out internally & will get back to you once I have the confirmation on the behavior"

So its likely that rclone will have the same behavior for files less than --s3-copy-cutoff

  --s3-copy-cutoff SizeSuffix     Cutoff for switching to multipart copy (default 4.656Gi)

(That should really be 5 GiB but backblaze interpreted it as 5GB which is only 4.656GiB :man_shrugging: )

i could not find a way to get rclone to update/add metadata for an existing file.

for example, these commands, rclone responds
"don't need to copy/move file.txt, it is already at target location"

rclone copyto wasabi01:backupdirtest/file.txt wasabi01:backupdirtest/file.txt -vv --header="x-amz-meta-md5chksumtest: xMpCOKC5I4INzFCab3WEmw==" 
rclone copyto wasabi01:backupdirtest/file.txt wasabi01:backupdirtest/file.txt -vv --header="x-amz-meta-md5chksumtest1: xMpCOKC5I4INzFCab3WEmw==" 
rclone copyto wasabi01:backupdirtest/file.txt wasabi01:backupdirtest/file.txt -vv --header-upload="x-amz-meta-md5chksumtest: xMpCOKC5I4INzFCab3WEmw==" 
rclone copyto wasabi01:backupdirtest/file.txt wasabi01:backupdirtest/file.txt -vv --header-upload="x-amz-meta-md5chksumtest1: xMpCOKC5I4INzFCab3WEmw==" 

i wrote a python script to update the metadata.

not sure how to know if the change triggered a server side copy.
each copy creates a new `x-amz-id-2:

You probably want

  --no-check-dest   Don't check the destination, copy regardless

that might work :crossed_fingers:

hmm,

  1. --no-check-dest did not work for server-side copy - see debug.log.01
  2. --no-check-dest --no-check-dest same result see - debug.log.02
  3. --no-check-dest --s3-no-head --s3-no-head-object same result, but strange output - see debug.log.03
    rclone would not output the --dump=headers info, just the normal debug info.

--- debug.log.01

2021/11/19 12:42:28 DEBUG : rclone: Version "v1.57.0" starting with parameters ["c:\\data\\rclone\\scripts\\rclone.exe" "copyto" "wasabi01:backupdirtest/file.txt" "wasabi01:backupdirtest/file.txt" "--no-check-dest" "--header-upload=x-amz-meta-md5chksumtest: xMpCOKC5I4INzFCab3WEmwxxx==" "--header-upload=x-amz-meta-md5chksumtest1: xMpCOKC5I4INzFCab3WEmw==" "--dump=headers" "--retries=1" "--low-level-retries=1" "--log-level=DEBUG" "--log-file=rclone.log"]
2021/11/19 12:42:28 DEBUG : Creating backend with remote "wasabi01:backupdirtest/file.txt"
2021/11/19 12:42:28 DEBUG : Using config file from "C:\\data\\rclone\\scripts\\rclone.conf"
2021/11/19 12:42:28 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:42:28 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:42:28 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/11/19 12:42:28 DEBUG : HTTP REQUEST (req 0xc0000cfb00)
2021/11/19 12:42:28 DEBUG : HEAD /backupdirtest/file.txt HTTP/1.1
Host: s3.us-east-2.wasabisys.com
User-Agent: rclone/v1.57.0
Authorization: XXXX
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20211119T174228Z

2021/11/19 12:42:28 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/11/19 12:42:28 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/11/19 12:42:28 DEBUG : HTTP RESPONSE (req 0xc0000cfb00)
2021/11/19 12:42:28 DEBUG : HTTP/1.1 200 OK
Content-Length: 1
Accept-Ranges: bytes
Content-Type: text/plain; charset=utf-8
Date: Fri, 19 Nov 2021 17:42:28 GMT
Etag: "c4ca4238a0b923820dcc509a6f75849b"
Last-Modified: Fri, 19 Nov 2021 17:34:44 GMT
Server: WasabiS3/7.1.262-2021-11-09-1bb0faf (head6)
X-Amz-Id-2: R1syvAMvZIA7P5yWmORECn5T7kQ5aQ9XSb4a/9ORjC8+vs+SOrK1cHxJyq2OJ4GsigGh9ibAzrmV
X-Amz-Meta-Mtime: 1637097029.6586656
X-Amz-Request-Id: 6E71D55EFBC6ED3D
X-Amz-Version-Id: null

2021/11/19 12:42:28 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/11/19 12:42:28 DEBUG : fs cache: adding new entry for parent of "wasabi01:backupdirtest/file.txt", "wasabi01:backupdirtest"
2021/11/19 12:42:28 DEBUG : Creating backend with remote "wasabi01:backupdirtest/"
2021/11/19 12:42:28 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:42:28 DEBUG : fs cache: renaming cache item "wasabi01:backupdirtest/" to be canonical "wasabi01:backupdirtest"
2021/11/19 12:42:28 DEBUG : S3 bucket backupdirtest: don't need to copy/move file.txt, it is already at target location
2021/11/19 12:42:28 INFO  : 
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:         0.1s

2021/11/19 12:42:28 DEBUG : 4 go routines active

--- debug.log.02

2021/11/19 12:43:59 DEBUG : rclone: Version "v1.57.0" starting with parameters ["c:\\data\\rclone\\scripts\\rclone.exe" "copyto" "wasabi01:backupdirtest/file.txt" "wasabi01:backupdirtest/file.txt" "--no-check-dest" "--s3-no-head" "--header-upload=x-amz-meta-md5chksumtest: xMpCOKC5I4INzFCab3WEmwxxx==" "--header-upload=x-amz-meta-md5chksumtest1: xMpCOKC5I4INzFCab3WEmw==" "--dump=headers" "--retries=1" "--low-level-retries=1" "--log-level=DEBUG" "--log-file=rclone.log"]
2021/11/19 12:43:59 DEBUG : Creating backend with remote "wasabi01:backupdirtest/file.txt"
2021/11/19 12:43:59 DEBUG : Using config file from "C:\\data\\rclone\\scripts\\rclone.conf"
2021/11/19 12:43:59 DEBUG : wasabi01: detected overridden config - adding "{NBTUO}" suffix to name
2021/11/19 12:43:59 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:43:59 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:43:59 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/11/19 12:43:59 DEBUG : HTTP REQUEST (req 0xc000331600)
2021/11/19 12:43:59 DEBUG : HEAD /backupdirtest/file.txt HTTP/1.1
Host: s3.us-east-2.wasabisys.com
User-Agent: rclone/v1.57.0
Authorization: XXXX
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20211119T174359Z

2021/11/19 12:43:59 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/11/19 12:43:59 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/11/19 12:43:59 DEBUG : HTTP RESPONSE (req 0xc000331600)
2021/11/19 12:43:59 DEBUG : HTTP/1.1 200 OK
Content-Length: 1
Accept-Ranges: bytes
Content-Type: text/plain; charset=utf-8
Date: Fri, 19 Nov 2021 17:43:58 GMT
Etag: "c4ca4238a0b923820dcc509a6f75849b"
Last-Modified: Fri, 19 Nov 2021 17:34:44 GMT
Server: WasabiS3/7.1.262-2021-11-09-1bb0faf (head3)
X-Amz-Id-2: 6nCenBzxsCOkHwwzfebxrPUwKqKm4cm8T5sg1o+rHAo089xwZYgX9EJMGjjfrtdQ/v5HHHxTDlm7
X-Amz-Meta-Mtime: 1637097029.6586656
X-Amz-Request-Id: 0C73D76D7405CD50
X-Amz-Version-Id: null

2021/11/19 12:43:59 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/11/19 12:43:59 DEBUG : fs cache: adding new entry for parent of "wasabi01:backupdirtest/file.txt", "wasabi01{NBTUO}:backupdirtest"
2021/11/19 12:43:59 DEBUG : Creating backend with remote "wasabi01:backupdirtest/"
2021/11/19 12:43:59 DEBUG : wasabi01: detected overridden config - adding "{NBTUO}" suffix to name
2021/11/19 12:43:59 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:43:59 DEBUG : fs cache: renaming cache item "wasabi01:backupdirtest/" to be canonical "wasabi01{NBTUO}:backupdirtest"
2021/11/19 12:43:59 DEBUG : S3 bucket backupdirtest: don't need to copy/move file.txt, it is already at target location
2021/11/19 12:43:59 INFO  : 
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:         0.1s

2021/11/19 12:43:59 DEBUG : 4 go routines active

--- debug.log.03

2021/11/19 12:52:56 DEBUG : rclone: Version "v1.57.0" starting with parameters ["c:\\data\\rclone\\scripts\\rclone.exe" "copyto" "wasabi01:backupdirtest/file.txt" "wasabi01:backupdirtest/file.txt" "--no-check-dest" "--s3-no-head" "--s3-no-head-object" "--header-upload=x-amz-meta-md5chksumtest: xMpCOKC5I4INzFCab3WEmwxxx==" "--header-upload=x-amz-meta-md5chksumtest1: xMpCOKC5I4INzFCab3WEmw==" "--dump=headers" "--retries=1" "--low-level-retries=1" "--log-level=DEBUG" "--log-file=rclone.log"]
2021/11/19 12:52:56 DEBUG : Creating backend with remote "wasabi01:backupdirtest/file.txt"
2021/11/19 12:52:56 DEBUG : Using config file from "C:\\data\\rclone\\scripts\\rclone.conf"
2021/11/19 12:52:56 DEBUG : wasabi01: detected overridden config - adding "{ZcSut}" suffix to name
2021/11/19 12:52:56 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:52:56 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:52:56 DEBUG : fs cache: renaming cache item "wasabi01:backupdirtest/file.txt" to be canonical "wasabi01{ZcSut}:backupdirtest/file.txt"
2021/11/19 12:52:56 DEBUG : fs cache: switching user supplied name "wasabi01:backupdirtest/file.txt" for canonical name "wasabi01{ZcSut}:backupdirtest/file.txt"
2021/11/19 12:52:56 ERROR : S3 bucket backupdirtest path file.txt: Nothing to do as source and destination are the same
2021/11/19 12:52:56 INFO  : 
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:         0.0s

2021/11/19 12:52:56 DEBUG : 2 go routines active

You might get a different result if you use rclone copy from a directory and to a directory rather than using copyto or copy with the source pointing to a file.

If it is different then it is a bug in copyto/copy when the source is a file.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.