Thank you for clear explanation.
Going back to your original post you wrote
I guess I'd like to think about why rclone does that.
The --checksum
flag was introduced following rsync's usage which I've reproduced here:
-c, --checksum
This changes the way rsync checks if the files have been changed
and are in need of a transfer. Without this option, rsync uses
a "quick check" that (by default) checks if each file’s size and
time of last modification match between the sender and receiver.
This option changes this to compare a 128-bit checksum for each
file that has a matching size. Generating the checksums means
that both sides will expend a lot of disk I/O reading all the
data in the files in the transfer (and this is prior to any
reading that will be done to transfer changed files), so this
can slow things down significantly.
The sending side generates its checksums while it is doing the
file-system scan that builds the list of the available files.
The receiver generates its checksums when it is scanning for
changed files, and will checksum any file that has the same size
as the corresponding sender’s file: files with either a changed
size or a changed checksum are selected for transfer.
Note that rsync always verifies that each transferred file was
correctly reconstructed on the receiving side by checking a
whole-file checksum that is generated as the file is trans‐
ferred, but that automatic after-the-transfer verification has
nothing to do with this option’s before-the-transfer "Does this
file need to be updated?" check.
For protocol 30 and beyond (first supported in 3.0.0), the
checksum used is MD5. For older protocols, the checksum used is
MD4.
Note that rsync
doesn't make any mention of modification times here and if I try rsync I find it does update the modification time with the --checksum
flag.
Originally --checksum
was introduced for remotes which didn't support modification times hence the limitation.
So to retain rsync compatibility we should update the modtime when using the --checksum
flag.
However to retain backwards compatibility with previous versions of rclone we don't want to update the modtime. Imagine someone has set up a large s3 to s3 sync with --checksum
which is the most efficient way of doing it. Making rclone set modtimes might cause every file to have its modtime set in the destination which would cause lots of expensive COPY operations on s3.
We could use a flag as you suggested or we could use an existing flag
--no-update-modtime Don't update destination mod-time if files identical.
And set the default behaviour to set the modtime and call the change out in the release notes. This would then make the syncing consistent between modtime/size/checksum modes and consistent with rsync.
That would at least avoid having to add another flag which certainly confuse users!
--update-modtime Update the modtime if it is incorrect when using --checksum
Thoughts? @nickgaya and @ivandeex ?