Hashsum: two values sometimes, one value others

What is the problem you are having with rclone?

I'm perplexed by "rclone hashsum's" output format. I don't understand why it's inconsistent.

Run the command 'rclone version' and share the full output of the command.

rclone v1.67.0

  • os/version: ubuntu 22.04 (64 bit)
  • os/kernel: 6.5.0-44-generic (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.22.4
  • go/linking: static
  • go/tags: none

Are you on the latest version of rclone? You can validate by checking the version listed here: Rclone downloads
I wasn't, but I installed it for this report. It's giving the same result.

Which cloud storage system are you using? (eg Google Drive)

minio-s3 in a docker container in this case. We use several others though.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

$ rclone hashsum MD5 dev-env-s3:<bucket>/bs1/blah_pl1/data/33/06/
                                  3306c357d2cd2e91534b300c9bf4ca05
$ rclone hashsum MD5 dev-env-s3:<bucket>/bs1/blah_pl1/data/31/b0/
284b2c9addcad99c23f636a7dbb1c315  31b070bcdbf701a6ceefac670faad2ce

Why are there many leading spaces on one and not the other?

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

[dev-env-s3]
type = s3
provider = Minio
env_auth = false
access_key_id = XXX
secret_access_key = XXX
endpoint = http://localhost:9000/

A log from the command that you were trying to run with the -vv flag

2024/08/07 06:55:29 DEBUG : rclone: Version "v1.67.0" starting with parameters ["rclone" "hashsum" "-vv" "MD5" "dev-env-s3:/bs1/blah_pl1/data/33/06/"]
2024/08/07 06:55:29 DEBUG : Creating backend with remote "dev-env-s3:/bs1/blah_pl1/data/33/06/"
2024/08/07 06:55:29 DEBUG : Using config file from "/data/home/dstromberg/.config/rclone/rclone.conf"
2024/08/07 06:55:29 DEBUG : Resolving service "s3" region "us-east-1"
2024/08/07 06:55:29 DEBUG : fs cache: renaming cache item "dev-env-s3:/bs1/blah_pl1/data/33/06/" to be canonical "dev-env-s3:/bs1/blah_pl1/data/33/06"
3306c357d2cd2e91534b300c9bf4ca05
2024/08/07 06:55:29 DEBUG : 6 go routines active

Why are there leading spaces on the output sometimes and not others?

Thanks!

when rclone uploads a file, it adds a header with the md5sum, else uses the ETag.
was that file uploaded by rclone?

https://rclone.org/s3/#hashes

for a deeper look, can use --dump, check the docs.

No, my data was not uploaded by rclone. It was uploaded by an internal-only application.

Assuming you mean "--dump headers", I've included the output here:
2024/08/07 09:52:41 NOTICE: Automatically setting -vv as --dump is enabled
2024/08/07 09:52:41 DEBUG : rclone: Version "v1.67.0" starting with parameters ["rclone" "hashsum" "MD5" "dev-env-s3:/bs1/blah_pl1/data/33/06/" "--dump" "headers"]
2024/08/07 09:52:41 DEBUG : Creating backend with remote "dev-env-s3:/bs1/blah_pl1/data/33/06/"
2024/08/07 09:52:41 DEBUG : Using config file from "/data/home/dstromberg/.config/rclone/rclone.conf"
2024/08/07 09:52:41 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2024/08/07 09:52:41 DEBUG : Resolving service "s3" region "us-east-1"
2024/08/07 09:52:41 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2024/08/07 09:52:41 DEBUG : fs cache: renaming cache item "dev-env-s3:/bs1/blah_pl1/data/33/06/" to be canonical "dev-env-s3:/bs1/blah_pl1/data/33/06"
2024/08/07 09:52:41 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/08/07 09:52:41 DEBUG : HTTP REQUEST (req 0xc00023f7a0)
2024/08/07 09:52:41 DEBUG : GET /?delimiter=&encoding-type=url&list-type=2&max-keys=1000&prefix=bs1%2Fblah_pl1%2Fdata%2F33%2F06%2F HTTP/1.1
Host: localhost:9000
User-Agent: rclone/v1.67.0
Authorization: XXXX
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240807T165241Z
Accept-Encoding: gzip

2024/08/07 09:52:41 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/08/07 09:52:41 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/08/07 09:52:41 DEBUG : HTTP RESPONSE (req 0xc00023f7a0)
2024/08/07 09:52:41 DEBUG : HTTP/1.1 200 OK
Content-Length: 702
Accept-Ranges: bytes
Content-Security-Policy: block-all-mixed-content
Content-Type: application/xml
Date: Wed, 07 Aug 2024 16:52:41 GMT
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Request-Id: 17E98072722A4998
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block

2024/08/07 09:52:41 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/08/07 09:52:41 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/08/07 09:52:41 DEBUG : HTTP REQUEST (req 0xc00080e000)
2024/08/07 09:52:41 DEBUG : HEAD //bs1/blah_pl1/data/33/06/3306c357d2cd2e91534b300c9bf4ca05 HTTP/1.1
Host: localhost:9000
User-Agent: rclone/v1.67.0
Authorization: XXXX
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240807T165241Z

2024/08/07 09:52:41 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/08/07 09:52:41 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/08/07 09:52:41 DEBUG : HTTP RESPONSE (req 0xc00080e000)
2024/08/07 09:52:41 DEBUG : HTTP/1.1 200 OK
Content-Length: 805647372
Accept-Ranges: bytes
Content-Security-Policy: block-all-mixed-content
Content-Type: binary/octet-stream
Date: Wed, 07 Aug 2024 16:52:41 GMT
Etag: "7e0432080a6bfbc7b8065d403cd1038b-97"
Last-Modified: Tue, 06 Aug 2024 21:24:44 GMT
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Request-Id: 17E98072723FF164
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block

2024/08/07 09:52:41 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
3306c357d2cd2e91534b300c9bf4ca05
2024/08/07 09:52:41 DEBUG : 6 go routines active

Do I need to upload an MD5 hash header along with the data in my app, to make it so rclone can verify MD5 hashes more quickly? And is the possible lack of that header likely to be why "rclone hashsum MD5" is giving inconsistent output?

yes, that could work.
and/or write a script that adds the header to files already uploaded.

another option might be to use hasher remote

Is there a good reason for the leading blanks in the output? It seems like sorting by filename is complicated by this output format. EG:

$ rclone hashsum MD5 dev-env-s3:<bucket>/bs1/blah_pl1/ 2>&1 | sed -e 's#^   #N/A#' -e 's/   */|/' | sort -k 2,2 -t'|' | tr '|' '\t' | expand-nicely
below cmd output started 2024 Wed Aug 07 12:17:18 PM PDT
1c9fdc28ce112935e86c66185b37e761  config/1c/9f/1c9fdc28ce112935e86c66185b37e761
390f3f5057e25af05ab9c8478b54f843  config/39/0f/390f3f5057e25af05ab9c8478b54f843
3ccc560c00d5fac966695b4f6faa5020  config/3c/cc/3ccc560c00d5fac966695b4f6faa5020
4ebb0c8ba1edd4c232ce7ad04da84a16  config/4e/bb/4ebb0c8ba1edd4c232ce7ad04da84a16
93d00a715ce7b2f214c29cab9d0d5d40  config/93/d0/93d00a715ce7b2f214c29cab9d0d5d40
bbaf9263fefd934caad1cee028607f9a  config/bb/af/bbaf9263fefd934caad1cee028607f9a
0d44e8a4220591149326d53714b0bc06  data/27/53/27539304ff8733860f429ce8550b055b
284b2c9addcad99c23f636a7dbb1c315  data/31/b0/31b070bcdbf701a6ceefac670faad2ce
N/A                               data/33/06/3306c357d2cd2e91534b300c9bf4ca05
4e787ed2862caf49fbe9390baca2a2c3  data/5e/4a/5e4a8b98979e495b7f92b8e1b1a30a33
4dcf085b02231c9874b462aecc2972e4  data/78/34/78343b28e472badbf30141db204862d1
b13127acbda65e5cf6bfce28220444f7  data/8d/c6/8dc627fdea0248e634a706b291528504.index
above cmd output done    2024 Wed Aug 07 12:17:19 PM PDT

...where "expand-nicely" is from svn - Revision 11329: /expand-nicely/trunk

It looks like the only file doing different format is the only one that's kind of long:

$ rclone lsl dev-env-s3:<bucket>/bs1/blah_pl1/
below cmd output started 2024 Wed Aug 07 12:23:11 PM PDT
      498 2024-08-06 15:24:54.182000000 config/1c/9f/1c9fdc28ce112935e86c66185b37e761
      498 2024-08-06 15:24:48.737000000 config/39/0f/390f3f5057e25af05ab9c8478b54f843
      498 2024-08-06 15:25:07.679000000 config/3c/cc/3ccc560c00d5fac966695b4f6faa5020
      498 2024-08-06 14:24:38.087000000 config/4e/bb/4ebb0c8ba1edd4c232ce7ad04da84a16
      498 2024-08-06 15:25:00.531000000 config/93/d0/93d00a715ce7b2f214c29cab9d0d5d40
       91 2024-08-06 14:24:37.959000000 config/bb/af/bbaf9263fefd934caad1cee028607f9a
      137 2024-08-06 14:24:37.935000000 data/27/53/27539304ff8733860f429ce8550b055b
     8551 2024-08-06 14:24:37.991000000 data/31/b0/31b070bcdbf701a6ceefac670faad2ce
805647372 2024-08-06 14:24:44.752000000 data/33/06/3306c357d2cd2e91534b300c9bf4ca05
      121 2024-08-06 14:24:38.047000000 data/5e/4a/5e4a8b98979e495b7f92b8e1b1a30a33
      132 2024-08-06 14:24:38.039000000 data/78/34/78343b28e472badbf30141db204862d1
     4148 2024-08-06 14:24:37.931000000 data/8d/c6/8dc627fdea0248e634a706b291528504.index
above cmd output done    2024 Wed Aug 07 12:23:11 PM PDT

...which is kind of the opposite of what I was (slightly) anticipating. I was thinking maybe long files would have more than one hash because of chunking that isn't needed for short files.

?

there is only one hash per file.

tho, internally, rclone does hash each chunk, as an additional type of file verification.
DEBUG : VeeamAgentUser3bef6c4c-360f-11b2-a85c-9070b4e81f40/ABP_EN10_CDRIVE_-_EN10/ABP_EN10_C2024-08-03T105418.vbk: multipart upload wrote chunk 10 with 268435456 bytes and etag "f1f8048a8817b980d29dfe7b6cefd972"

Maybe it's zero or one hashes per file?

I'm realizing the "second hash" was because the files are named by an md5 hash!

But that doesn't explain why sometimes there's no hash reported, apart from the filename.

s3 objects can be missing a hash, in which case you'll get blank spaces before the file name.

The fact that your file names look like md5sums is further confusing the issue!

You can use rclone lsjson or rclone lsf to read this info too and it might be more obvious using those formats, eg rclone lsf --csv -Fph remote:path

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.