What is the problem you are having with rclone?
Our destination (a standard Linux XFS file system) contains many trees with files that are identical (in name and content) to those in other trees. Once rclone sync
is run we use the hardlink
command to consolidate all duplicates into a single physical file that is hard-linked across all trees. This is required and saves a HUGE amount of space. After hard-linking, the time stamps and permissions of these files all become identical (the last one "wins") and need to be considered irrelevant in the context of another sync from the (sftp) source. The files on the source are all physically separate (but still identical), however their time stamps can all be different. This causes them to be copied unnecessarily when the next sync is performed. How do we prevent this?
If the sizes of the source and destination files are different, then obviously the file needs to be copied. But if they are the same, is there an efficient way to tell if their contents are identical and thus avoid a copy operation? Some of the files are very large, and the unnecessary copy wastes a lot of time and bandwidth.
Run the command 'rclone version' and share the full output of the command.
rclone v1.57.0-DEV
- os/version: redhat 8.6
- os/kernel: 4.18.0-372.52.1.el8_6.ppc64le (ppc64le)
- os/type: linux
- os/arch: ppc64le
- go/version: go1.16.12
- go/linking: dynamic
- go/tags: none
Which cloud storage system are you using? (eg Google Drive)
No cloud storage. The source is Red Hat's business partner Linux system accessed via SFTP.
The command you were trying to run (eg rclone copy /tmp remote:tmp
)
rclone sync --log-file=logfile --log-level=INFO redhat:our_hashed_partner_dir /our/local/destination
The rclone config contents with secrets removed.
[redhat]
type = sftp
host = sftp.connect.redhat.com
user = our_user_name
key_file = /root/.ssh/rh_ecdsa
# Work around the fact that the sftp backend doesn't (yet) support --links
# Hopefully https://github.com/rclone/rclone/issues/5011 will fix this problem.
skip_links = true
md5sum_command = none
sha1sum_command = none
A log from the command with the -vv
flag
I don't have such a log available, but the log I do have contains a large number of entries of the form:
2023/07/12 15:37:10 INFO : path/to/file: Copied (replaced existing)
and a tail-end summary that typically looks like:
Transferred: 38.816 GiB / 38.816 GiB, 100%, 1.642 MiB/s, ETA 0s
Checks: 1186055 / 1186055, 100%
Transferred: 14551 / 14551, 100%
Elapsed time: 37m39.3s
As you can see, a very large amount of data is transferred when very few files on the source (if any) have been changed. I might be able to report back later with a -vv
log run against a subset of the trees if needed.