Summary (and TL/DR)
Priority: LOW
Behavior of rclone when --checksum
but checksum is missing is undocumented (it falls back to --size-only
)
and arguably unexpected.
Proposed Fixes:
- Update documentation (official and in code comments) to current behavior (i.e. fall back to
--size-only
)- I am happy to do this once we have clarity
- Change behavior such that if
--checksum
is used but missing, fall back to default comparison unless another flag is also set (I wouldn't want to do--size-only
since that is confusing but that may still be the right answer)
What is the problem you are having with rclone?
The empirical behavior of rclone when a hash is missing (e.g. bucket remote in a scenario when it doesn't get set) is undocumented officially and inconsistent with source code comments
Run the command 'rclone version' and share the full output of the command.
rclone v1.62.2
- os/version: debian 11.7 (64 bit)
- os/kernel: 5.10.0-23-amd64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.20.2
- go/linking: static
- go/tags: none
Which cloud storage system are you using? (eg Google Drive)
B2
The command you were trying to run (eg rclone copy /tmp remote:tmp
)
Setup
To test this, I created two files on B2 as seen here. One is small and has a hash and one was stream uploaded and is large so rclone does not set a hash
$ rclone lsjson --fast-list --files-only -R --hash b2:jgwtestingrclone/copies/
[
{"Path":"onegb.bin","Name":"onegb.bin","Size":1073741824,"MimeType":"application/octet-stream","ModTime":"2023-06-21T21:05:48.775Z","IsDir":false,"ID":"4_z37bfa7d759395daa6aad0913_f208632bdf84de16e_d20230621_m210610_c000_v0001076_t0031_u01687381570463"},
{"Path":"ten.bin","Name":"ten.bin","Size":10,"MimeType":"application/octet-stream","ModTime":"2023-06-21T21:03:19.043Z","IsDir":false,"Hashes":{"sha1":"fda2742273ae22087862a5590cfe83a885b43c71"},"ID":"4_z37bfa7d759395daa6aad0913_f1040a26bc9eec501_d20230621_m210328_c000_v0001058_t0001_u01687381408332"}
]
I then did a vanilla rclone copy
to my "local" machine (really a VPS). I then changed the first 10 bytes on the files (in Python)
with open('tmp/onegb.bin','r+b') as fp:
fp.write(os.urandom(10))
with open('tmp/ten.bin','r+b') as fp:
fp.write(os.urandom(10))
which makes the files be the same size but different ModTime and different checksum
The Test
Without --checksum
, it will compare ModTime, see it is wrong, and do a transfer
rclone copy -vv b2:jgwtestingrclone/copies/ tmp/ -n
2023/06/21 21:45:20 DEBUG : Setting --password-command "rclone-pass-store echo" from environment variable RCLONE_PASSWORD_COMMAND="rclone-pass-store echo"
2023/06/21 21:45:20 DEBUG : rclone: Version "v1.62.2" starting with parameters ["rclone" "copy" "-vv" "b2:jgwtestingrclone/copies/" "tmp/" "-n"]
2023/06/21 21:45:20 DEBUG : Creating backend with remote "b2:jgwtestingrclone/copies/"
2023/06/21 21:45:20 DEBUG : Using config file from "/home/jwinokur/.config/rclone/rclone.conf"
2023/06/21 21:45:21 DEBUG : Couldn't decode error response: EOF
2023/06/21 21:45:21 DEBUG : fs cache: renaming cache item "b2:jgwtestingrclone/copies/" to be canonical "b2:jgwtestingrclone/copies"
2023/06/21 21:45:21 DEBUG : Creating backend with remote "tmp/"
2023/06/21 21:45:21 DEBUG : fs cache: renaming cache item "tmp/" to be canonical "/home/jwinokur/tmp"
2023/06/21 21:45:21 DEBUG : onegb.bin: Modification times differ by 36m12.509589187s: 2023-06-21 21:05:48.775 +0000 UTC, 2023-06-21 21:42:01.284589187 +0000 UTC
2023/06/21 21:45:21 DEBUG : ten.bin: Modification times differ by 41m18.508929363s: 2023-06-21 21:03:19.043 +0000 UTC, 2023-06-21 21:44:37.551929363 +0000 UTC
2023/06/21 21:45:21 DEBUG : onegb.bin: Src hash empty - aborting Dst hash check
2023/06/21 21:45:21 DEBUG : Local file system at /home/jwinokur/tmp: Waiting for checks to finish
2023/06/21 21:45:21 DEBUG : ten.bin: sha1 = fda2742273ae22087862a5590cfe83a885b43c71 (B2 bucket jgwtestingrclone path copies)
2023/06/21 21:45:21 DEBUG : ten.bin: sha1 = 1bad39537bcc6f0e60046fcedef1afdb8f0cf76f (Local file system at /home/jwinokur/tmp)
2023/06/21 21:45:21 DEBUG : ten.bin: sha1 differ
2023/06/21 21:45:21 NOTICE: ten.bin: Skipped copy as --dry-run is set (size 10)
2023/06/21 21:45:21 DEBUG : Local file system at /home/jwinokur/tmp: Waiting for transfers to finish
2023/06/21 21:45:21 NOTICE: onegb.bin: Skipped copy as --dry-run is set (size 1Gi)
2023/06/21 21:45:21 NOTICE:
Transferred: 1.000 GiB / 1.000 GiB, 100%, 0 B/s, ETA -
Checks: 2 / 2, 100%
Transferred: 2 / 2, 100%
Elapsed time: 0.7s
2023/06/21 21:45:21 DEBUG : 9 go routines active
(There is also the oddity that computed the hashes of my small local file???)
Now, with the --checksum
flag:
$ rclone copy -vv --checksum b2:jgwtestingrclone/copies/ tmp/ -n
2023/06/21 21:46:51 DEBUG : Setting --password-command "rclone-pass-store echo" from environment variable RCLONE_PASSWORD_COMMAND="rclone-pass-store echo"
2023/06/21 21:46:51 DEBUG : rclone: Version "v1.62.2" starting with parameters ["rclone" "copy" "-vv" "--checksum" "b2:jgwtestingrclone/copies/" "tmp/" "-n"]
2023/06/21 21:46:51 DEBUG : Creating backend with remote "b2:jgwtestingrclone/copies/"
2023/06/21 21:46:51 DEBUG : Using config file from "/home/jwinokur/.config/rclone/rclone.conf"
2023/06/21 21:46:51 DEBUG : Couldn't decode error response: EOF
2023/06/21 21:46:51 DEBUG : fs cache: renaming cache item "b2:jgwtestingrclone/copies/" to be canonical "b2:jgwtestingrclone/copies"
2023/06/21 21:46:51 DEBUG : Creating backend with remote "tmp/"
2023/06/21 21:46:51 DEBUG : fs cache: renaming cache item "tmp/" to be canonical "/home/jwinokur/tmp"
2023/06/21 21:46:51 DEBUG : Local file system at /home/jwinokur/tmp: Waiting for checks to finish
2023/06/21 21:46:51 DEBUG : onegb.bin: Src hash empty - aborting Dst hash check
2023/06/21 21:46:51 DEBUG : ten.bin: sha1 = fda2742273ae22087862a5590cfe83a885b43c71 (B2 bucket jgwtestingrclone path copies)
2023/06/21 21:46:51 DEBUG : ten.bin: sha1 = 1bad39537bcc6f0e60046fcedef1afdb8f0cf76f (Local file system at /home/jwinokur/tmp)
2023/06/21 21:46:51 DEBUG : ten.bin: sha1 differ
2023/06/21 21:46:51 NOTICE: ten.bin: Skipped copy as --dry-run is set (size 10)
2023/06/21 21:46:51 DEBUG : onegb.bin: Size of src and dst objects identical
2023/06/21 21:46:51 DEBUG : onegb.bin: Unchanged skipping
2023/06/21 21:46:51 DEBUG : Local file system at /home/jwinokur/tmp: Waiting for transfers to finish
2023/06/21 21:46:51 NOTICE:
Transferred: 10 B / 10 B, 100%, 0 B/s, ETA -
Checks: 2 / 2, 100%
Transferred: 1 / 1, 100%
Elapsed time: 0.6s
2023/06/21 21:46:51 DEBUG : 9 go routines active
Now you see it incorrectly doesn't want to copy the large file. It aborts the hash check and just uses size.
First of all, this is not documented as the behavior in the documentation as far as I can tell. The code offers some more hints: (permalink)
// Equal checks to see if the src and dst objects are equal by looking at
// size, mtime and hash
//
// If the src and dst size are different then it is considered to be
// not equal. If --size-only is in effect then this is the only check
// that is done. If --ignore-size is in effect then this check is
// skipped and the files are considered the same size.
//
// If the size is the same and the mtime is the same then it is
// considered to be equal. This check is skipped if using --checksum.
//
// If the size is the same and mtime is different, unreadable or
// --checksum is set and the hash is the same then the file is
// considered to be equal. In this case the mtime on the dst is
// updated if --checksum is not set.
//
// Otherwise the file is considered to be not equal including if there
// were errors reading info.
In particular, that last comment implies that a failed hash check should transfer
A log from the command with the -vv
flag
inline in the above
The rclone config contents with secrets removed.
Shouldn't matter but...
[b2]
type = b2
account = REDACTED
key = REDACTED