Some files (either because of their size or because they were uploaded with another utility) do not have checksums in their S3 metadata (actually Wasabi)
What is your rclone version (output from rclone version)
rclone v1.57.0
os/version: darwin 10.15.7 (64 bit)
os/kernel: 19.6.0 (x86_64)
os/type: darwin
os/arch: amd64
go/version: go1.17.2
go/linking: dynamic
go/tags: none
Which cloud storage system are you using? (eg Google Drive)
Wasabi
The command you were trying to run (eg rclone copy /tmp remote:tmp)
[wasabi]
type = s3
provider = Wasabi
access_key_id =XXXXX
secret_access_key =XXXXX
region = us-east-2
endpoint = s3.us-east-2.wasabisys.com
A log from the command with the -vv flag
rclone check wasabi:drix08/OECI_DriX_Data_Public/DRIX/logs/mission_logs wasabi:drix08/OECI_DriX_Data_Public/DX082101/drix08/02-raw/drix/mission_logs -P -vv
...
2021-11-16 11:52:28 DEBUG : 31-03-2021-12-57-59.293_IdleBoot/2021-03-31-14-37-15_2.bag: OK - could not check hash
2021-11-16 11:52:28 DEBUG : 31-03-2021-12-57-59.293_IdleBoot/2021-03-31-12-58-11_0.bag: OK - could not check hash
2021-11-16 11:52:28 DEBUG : 31-03-2021-09-51-51.105_IdleBoot/2021-03-31-11-37-59_2.bag: OK - could not check hash
2021-11-16 11:52:28 DEBUG : 31-03-2021-09-51-51.105_IdleBoot/2021-03-31-10-45-54_1.bag: OK - could not check hash
2021-11-16 11:52:28 DEBUG : 31-03-2021-09-51-51.105_IdleBoot/2021-03-31-09-52-04_0.bag: OK - could not check hash
2021-11-16 11:52:28 DEBUG : 31-03-2021-15-46-33.081_Idle/2021-03-31-15-46-36_0.bag: OK - could not check hash
2021-11-16 11:52:28 NOTICE: S3 bucket drix08 path OECI_DriX_Data_Public/DX082101/drix08/02-raw/drix/mission_logs: 0 differences found
2021-11-16 11:52:28 NOTICE: S3 bucket drix08 path OECI_DriX_Data_Public/DX082101/drix08/02-raw/drix/mission_logs: 347 hashes could not be checked
2021-11-16 11:52:28 NOTICE: S3 bucket drix08 path OECI_DriX_Data_Public/DX082101/drix08/02-raw/drix/mission_logs: 1242 matching files
I apologize, I could not find an answer to this question - but I fear I've missed it and am wasting everyone's time.
I'm trying to check the contents of two directories on a Wasabi S3 bucket using the "check" command to ensure I have made no blunders in my re-organization of data. However for many files I receive "could not check hash". In reading other posts on this forum I've learned that the client calculates the hash on upload and this is stored in the file's metadata, and that not all clients calculate the hash (so some may be missing), and finally, that for some large files hashes are not calculated even by rclone. Also on this forum I learned how to use rclone md5sum to check which files are missing hashes. and I have verified that this is indeed the problem. Some of my files were initially uploaded with another utility (cyberduck). Here are my questions:
Is there a way to use rclone to calculate the hashes and populate the metadata for files already in the s3 bucket so that a proper rclone check can be performed without error?
Is there a way to force rclone to calculate them, even for large files when uploading?
What is the file size limit above which rclone will not calculate the hash automatically.
Thanks.
as far as i know, not possible to change metadata once a file is uploaded without this caveat add-object-metadata
"This action creates a copy of the object with updated settings"
note 1: i have tested this, to edit metadata will trigger a copy operation and create a new version, if versioning is enabled.
note 2: even if you could edit the metadata without triggering a copy, rclone sets x-amz-meta-md5chksum as read-only.
one way or another, the hash is saved when the file is uploaded. hashes
You could probably do it with use of the hasher backend to read the checksums then using server side copies to add the hashes... Maybe!
The S3 protocol allows a server side copy to adjust the metadata so in theory it is possible.
Rclone will do its best to calculate them but if the source backend can't give it an MD5 then it can't. For example the local backend can produce any kind of hash, whereas the crypt backend can't produce any hashes.
It might be possible to wrap the source in the hasher backend and use its facilities to generate all the MD5sums before the transfer.
All files below --s3-upload-cutoff should have an MD5SUM generated by S3 itself. So you could just set --s3-upload-cutoff 5G for all files below 5G to have MD5SUMs. Note that 5G is the maximum size possible here. This will make the uploads slower but may be what you want.
In general server side copies in S3 are quite expensive operations. If you are keeping revisions doubly so!
I just remembered why we don't MD5 the file as it is being uploaded and add it to the metadata of the object - to avoid that expensive server side copy is precisely the reason.
i changed the value of x-amz-meta-md5chksum and it did NOT trigger a server-side copy.
as i understand it, this is not the same behavior as aws s3 so i contacted wasabi and they responded.
"Thanks for confirming that you are using S3 browser. If you have Versioning enabled, metadata modifications will create a new version of the object resulting in additional charges. But if Versioning is not enabled, metadata changes will not create an additional version. I tested this behavior using S3 Browser. So the charges will be based on what you have stored with us."
i did more testing and found sometimes that a server-side copy was triggered.
tho i think that is a quirk of s3browser.
If the file size is smaller than the value to trigger a multipart upload, then the header is updated WITHOUT triggering a server-side copy
If the file size is larger than the value to trigger a multipart upload, then the header is updated WITH triggering a server-side copy.
so i contacted wasabi again and got
"Thanks David for the additional testing from your end. I have reached out internally & will get back to you once I have the confirmation on the behavior"
--no-check-dest did not work for server-side copy - see debug.log.01
--no-check-dest --no-check-dest same result see - debug.log.02
--no-check-dest --s3-no-head --s3-no-head-object same result, but strange output - see debug.log.03
rclone would not output the --dump=headers info, just the normal debug info.
--- debug.log.01
2021/11/19 12:42:28 DEBUG : rclone: Version "v1.57.0" starting with parameters ["c:\\data\\rclone\\scripts\\rclone.exe" "copyto" "wasabi01:backupdirtest/file.txt" "wasabi01:backupdirtest/file.txt" "--no-check-dest" "--header-upload=x-amz-meta-md5chksumtest: xMpCOKC5I4INzFCab3WEmwxxx==" "--header-upload=x-amz-meta-md5chksumtest1: xMpCOKC5I4INzFCab3WEmw==" "--dump=headers" "--retries=1" "--low-level-retries=1" "--log-level=DEBUG" "--log-file=rclone.log"]
2021/11/19 12:42:28 DEBUG : Creating backend with remote "wasabi01:backupdirtest/file.txt"
2021/11/19 12:42:28 DEBUG : Using config file from "C:\\data\\rclone\\scripts\\rclone.conf"
2021/11/19 12:42:28 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:42:28 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:42:28 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/11/19 12:42:28 DEBUG : HTTP REQUEST (req 0xc0000cfb00)
2021/11/19 12:42:28 DEBUG : HEAD /backupdirtest/file.txt HTTP/1.1
Host: s3.us-east-2.wasabisys.com
User-Agent: rclone/v1.57.0
Authorization: XXXX
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20211119T174228Z
2021/11/19 12:42:28 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/11/19 12:42:28 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/11/19 12:42:28 DEBUG : HTTP RESPONSE (req 0xc0000cfb00)
2021/11/19 12:42:28 DEBUG : HTTP/1.1 200 OK
Content-Length: 1
Accept-Ranges: bytes
Content-Type: text/plain; charset=utf-8
Date: Fri, 19 Nov 2021 17:42:28 GMT
Etag: "c4ca4238a0b923820dcc509a6f75849b"
Last-Modified: Fri, 19 Nov 2021 17:34:44 GMT
Server: WasabiS3/7.1.262-2021-11-09-1bb0faf (head6)
X-Amz-Id-2: R1syvAMvZIA7P5yWmORECn5T7kQ5aQ9XSb4a/9ORjC8+vs+SOrK1cHxJyq2OJ4GsigGh9ibAzrmV
X-Amz-Meta-Mtime: 1637097029.6586656
X-Amz-Request-Id: 6E71D55EFBC6ED3D
X-Amz-Version-Id: null
2021/11/19 12:42:28 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/11/19 12:42:28 DEBUG : fs cache: adding new entry for parent of "wasabi01:backupdirtest/file.txt", "wasabi01:backupdirtest"
2021/11/19 12:42:28 DEBUG : Creating backend with remote "wasabi01:backupdirtest/"
2021/11/19 12:42:28 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:42:28 DEBUG : fs cache: renaming cache item "wasabi01:backupdirtest/" to be canonical "wasabi01:backupdirtest"
2021/11/19 12:42:28 DEBUG : S3 bucket backupdirtest: don't need to copy/move file.txt, it is already at target location
2021/11/19 12:42:28 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 0.1s
2021/11/19 12:42:28 DEBUG : 4 go routines active
--- debug.log.02
2021/11/19 12:43:59 DEBUG : rclone: Version "v1.57.0" starting with parameters ["c:\\data\\rclone\\scripts\\rclone.exe" "copyto" "wasabi01:backupdirtest/file.txt" "wasabi01:backupdirtest/file.txt" "--no-check-dest" "--s3-no-head" "--header-upload=x-amz-meta-md5chksumtest: xMpCOKC5I4INzFCab3WEmwxxx==" "--header-upload=x-amz-meta-md5chksumtest1: xMpCOKC5I4INzFCab3WEmw==" "--dump=headers" "--retries=1" "--low-level-retries=1" "--log-level=DEBUG" "--log-file=rclone.log"]
2021/11/19 12:43:59 DEBUG : Creating backend with remote "wasabi01:backupdirtest/file.txt"
2021/11/19 12:43:59 DEBUG : Using config file from "C:\\data\\rclone\\scripts\\rclone.conf"
2021/11/19 12:43:59 DEBUG : wasabi01: detected overridden config - adding "{NBTUO}" suffix to name
2021/11/19 12:43:59 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:43:59 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:43:59 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/11/19 12:43:59 DEBUG : HTTP REQUEST (req 0xc000331600)
2021/11/19 12:43:59 DEBUG : HEAD /backupdirtest/file.txt HTTP/1.1
Host: s3.us-east-2.wasabisys.com
User-Agent: rclone/v1.57.0
Authorization: XXXX
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20211119T174359Z
2021/11/19 12:43:59 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/11/19 12:43:59 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/11/19 12:43:59 DEBUG : HTTP RESPONSE (req 0xc000331600)
2021/11/19 12:43:59 DEBUG : HTTP/1.1 200 OK
Content-Length: 1
Accept-Ranges: bytes
Content-Type: text/plain; charset=utf-8
Date: Fri, 19 Nov 2021 17:43:58 GMT
Etag: "c4ca4238a0b923820dcc509a6f75849b"
Last-Modified: Fri, 19 Nov 2021 17:34:44 GMT
Server: WasabiS3/7.1.262-2021-11-09-1bb0faf (head3)
X-Amz-Id-2: 6nCenBzxsCOkHwwzfebxrPUwKqKm4cm8T5sg1o+rHAo089xwZYgX9EJMGjjfrtdQ/v5HHHxTDlm7
X-Amz-Meta-Mtime: 1637097029.6586656
X-Amz-Request-Id: 0C73D76D7405CD50
X-Amz-Version-Id: null
2021/11/19 12:43:59 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/11/19 12:43:59 DEBUG : fs cache: adding new entry for parent of "wasabi01:backupdirtest/file.txt", "wasabi01{NBTUO}:backupdirtest"
2021/11/19 12:43:59 DEBUG : Creating backend with remote "wasabi01:backupdirtest/"
2021/11/19 12:43:59 DEBUG : wasabi01: detected overridden config - adding "{NBTUO}" suffix to name
2021/11/19 12:43:59 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:43:59 DEBUG : fs cache: renaming cache item "wasabi01:backupdirtest/" to be canonical "wasabi01{NBTUO}:backupdirtest"
2021/11/19 12:43:59 DEBUG : S3 bucket backupdirtest: don't need to copy/move file.txt, it is already at target location
2021/11/19 12:43:59 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 0.1s
2021/11/19 12:43:59 DEBUG : 4 go routines active
--- debug.log.03
2021/11/19 12:52:56 DEBUG : rclone: Version "v1.57.0" starting with parameters ["c:\\data\\rclone\\scripts\\rclone.exe" "copyto" "wasabi01:backupdirtest/file.txt" "wasabi01:backupdirtest/file.txt" "--no-check-dest" "--s3-no-head" "--s3-no-head-object" "--header-upload=x-amz-meta-md5chksumtest: xMpCOKC5I4INzFCab3WEmwxxx==" "--header-upload=x-amz-meta-md5chksumtest1: xMpCOKC5I4INzFCab3WEmw==" "--dump=headers" "--retries=1" "--low-level-retries=1" "--log-level=DEBUG" "--log-file=rclone.log"]
2021/11/19 12:52:56 DEBUG : Creating backend with remote "wasabi01:backupdirtest/file.txt"
2021/11/19 12:52:56 DEBUG : Using config file from "C:\\data\\rclone\\scripts\\rclone.conf"
2021/11/19 12:52:56 DEBUG : wasabi01: detected overridden config - adding "{ZcSut}" suffix to name
2021/11/19 12:52:56 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:52:56 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/11/19 12:52:56 DEBUG : fs cache: renaming cache item "wasabi01:backupdirtest/file.txt" to be canonical "wasabi01{ZcSut}:backupdirtest/file.txt"
2021/11/19 12:52:56 DEBUG : fs cache: switching user supplied name "wasabi01:backupdirtest/file.txt" for canonical name "wasabi01{ZcSut}:backupdirtest/file.txt"
2021/11/19 12:52:56 ERROR : S3 bucket backupdirtest path file.txt: Nothing to do as source and destination are the same
2021/11/19 12:52:56 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Elapsed time: 0.0s
2021/11/19 12:52:56 DEBUG : 2 go routines active
You might get a different result if you use rclone copy from a directory and to a directory rather than using copyto or copy with the source pointing to a file.
If it is different then it is a bug in copyto/copy when the source is a file.