Data Integrity for Copy Command

What is your rclone version (output from rclone version)

v1.52.2 / v1.53.1

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Windows 10, Centos 7

Which cloud storage system are you using? (eg Google Drive)

AWS S3, S3 compatible, local filesystem

I am looking for clarity for what data integrity checks are performed and how they are performed for the remote control operations/copyfile command (or any copy command).

To a local filesystem, I attempted to copy a ~1GB file. During the Rclone copy, I replaced the destination file with a different file (but the same name) that contained different data. When Rclone completed the copy, the destination file had the different data followed by the data from the source file.

The content from the Rclone logs can be found below. The logs indicate that a MD5 check was performed.

2020/09/23 19:22:52 DEBUG : rclone: Version "v1.52.2-DEV" starting with parameters ["C:\\workspace\\redacted\\src\\redacted.Core\\bin\\Debug\\netcoreapp2.1\\Dependencies\\rclone.exe" "rcd" "--log-level" "DEBUG" "--rc-no-auth" "--rc-addr=127.0.0.1:5815" "--log-file=C:\\ProgramData\\redacted\\redacted\\Logs\\remote-control-output\\173E26BF25E9E140_redacted\\output.log" "--copy-links" "--local-no-set-modtime"]
2020/09/23 19:22:52 NOTICE: Serving remote control on http://127.0.0.1:5815/
2020/09/23 19:22:52 DEBUG : rc: "rc/noop": with parameters map[]
2020/09/23 19:22:52 DEBUG : rc: "rc/noop": reply map[]: <nil>
2020/09/23 19:23:08 DEBUG : rc: "operations/copyfile": with parameters map[_group:C:\Helpers\redacted\run-folders\run-folder-test\created-file-big.txt dstFs:C:\Helpers\redacted\output\redacted-filecopy dstRemote:__created-file-big.tmp.txt srcFs:C:\Helpers\redacted\run-folders\run-folder-test srcRemote:created-file-big.txt]
2020/09/23 19:23:08 DEBUG : rc: "operations/copyfile": reply map[jobid:2]: <nil>
2020/09/23 19:23:08 DEBUG : Using config file from "C:\\Users\\redacted\\.config\\rclone\\rclone.conf"
2020/09/23 19:23:08 DEBUG : fs cache: renaming cache item "C:\\Helpers\\redacted\\run-folders\\run-folder-test" to be canonical "//?/C:/Helpers/redacted/run-folders/run-folder-test"
2020/09/23 19:23:08 DEBUG : fs cache: renaming cache item "C:\\Helpers\\redacted\\output\\redacted-filecopy" to be canonical "//?/C:/Helpers/redacted/output/redacted-filecopy"
2020/09/23 19:23:08 DEBUG : created-file-big.txt: Need to transfer - File not found at Destination
2020/09/23 19:23:10 DEBUG : created-file-big.txt: MD5 = e37115d4da0e187130ab645dee4f14ed OK
2020/09/23 19:23:10 INFO  : created-file-big.txt: Copied (new)
2020/09/23 19:23:11 DEBUG : rc: "job/status": with parameters map[jobid:2]
2020/09/23 19:23:11 DEBUG : rc: "job/status": reply map[duration:1.8909105 endTime:2020-09-23T19:23:10.2444003-07:00 error: finished:true group:C:\Helpers\redacted\run-folders\run-folder-test\created-file-big.txt id:2 output:map[] startTime:2020-09-23T19:23:08.3534898-07:00 success:true]: <nil>
2020/09/23 19:26:17 DEBUG : rc: "core/quit": with parameters map[]
2020/09/23 19:26:17 DEBUG : rc: "core/quit": reply map[]: <nil>

A few questions here:

  • Are any special flags needed for Rclone to perform data integrity for the copy command? Does this differ by remote type? We plan to copy to AWS S3, S3 compatible, local filesystem.
    The " --ignore-checksum" (https://rclone.org/docs/#ignore-checksum) seems to indicate that Rclone should perform data integrity naturally. However, the " --checksum" seems to indicate that this flag is needed for data integrity (https://rclone.org/docs/#c-checksum)

  • How is the data integrity performed?
    Is the data integrity check done on a buffer-by-buffer basis or is the MD5 of the entire contents of the files compared?

Depending on the source and remote, rclone will use checksums if available to check data integrity. Depending on how much you want to check, you can use rclone check as well.

You can see which provider offers what by checking:

https://rclone.org/overview/

Hello Animosity,
Does that mean that Rclone will automatically validate checksums (if the remote provides it) without any extra flags needed for the copy or rcd commands? The remotes we use include hashes based on the linked table.

Do you have any insight on how Rclone verifies the checksum, on a buffer-basis or the entire file?

In the debug log you shared, you can see it does the MD5 sum check. You can turn things off with flags so without seeing the whole command, it's a bit of a guess to say 100%.

I normally just use rclone copy and only a few flags.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.