Our destination (a standard Linux XFS file system) contains many trees with files that are identical (in name and content) to those in other trees. Once
rclone sync is run we use the
hardlink command to consolidate all duplicates into a single physical file that is hard-linked across all trees. This is required and saves a HUGE amount of space. After hard-linking, the time stamps and permissions of these files all become identical (the last one "wins") and need to be considered irrelevant in the context of another sync from the (sftp) source. The files on the source are all physically separate (but still identical), however their time stamps can all be different. This causes them to be copied unnecessarily when the next sync is performed. How do we prevent this?
If the sizes of the source and destination files are different, then obviously the file needs to be copied. But if they are the same, is there an efficient way to tell if their contents are identical and thus avoid a copy operation? Some of the files are very large, and the unnecessary copy wastes a lot of time and bandwidth.
rclone v1.57.0-DEV - os/version: redhat 8.6 - os/kernel: 4.18.0-372.52.1.el8_6.ppc64le (ppc64le) - os/type: linux - os/arch: ppc64le - go/version: go1.16.12 - go/linking: dynamic - go/tags: none
No cloud storage. The source is Red Hat's business partner Linux system accessed via SFTP.
rclone sync --log-file=logfile --log-level=INFO redhat:our_hashed_partner_dir /our/local/destination
[redhat] type = sftp host = sftp.connect.redhat.com user = our_user_name key_file = /root/.ssh/rh_ecdsa # Work around the fact that the sftp backend doesn't (yet) support --links # Hopefully https://github.com/rclone/rclone/issues/5011 will fix this problem. skip_links = true md5sum_command = none sha1sum_command = none
I don't have such a log available, but the log I do have contains a large number of entries of the form:
2023/07/12 15:37:10 INFO : path/to/file: Copied (replaced existing)
and a tail-end summary that typically looks like:
Transferred: 38.816 GiB / 38.816 GiB, 100%, 1.642 MiB/s, ETA 0s Checks: 1186055 / 1186055, 100% Transferred: 14551 / 14551, 100% Elapsed time: 37m39.3s
As you can see, a very large amount of data is transferred when very few files on the source (if any) have been changed. I might be able to report back later with a
-vv log run against a subset of the trees if needed.