I'm looking for a way to convince
rclone to re-create a hard-linked target file so the new file is freshly written and other linked instances are not affected.
We have an environment populated by
rclone sync where the target file system contains a huge number of identical copies of identically-named files. To manage the amount of space consumed, after all syncing is performed we run a scheduled process to hard-link all identical files so that only one physical copy of the data is stored. Any hard-linking on the source system that might exist is ignored; we make no attempt to replicate hard-links during the sync.
The problem arises when any such file is updated on the source to have differing content. When
rclone sync sees that an update is required, the source file is copied to the target in a manner that causes any existing hard link to be retained rather than discarded, causing all other instances of that file to be updated as well. I recognize that this is probably desirable in some cases and that underlying
open() calls might be designed to operate this way, but in our case the behavior is very undesirable. My guess is that an underlying
unlink() prior to the
open() would "fix" the problem for us, but that is of course dependent on the back-end being used. In our case, the target back-end is always the local file system.
The following scenario duplicates the problem:
# Create the source tree and display the results mkdir -p source/tree1 source/tree2 echo identical > source/tree1/identical echo identical > source/tree2/identical stat --printf="Name: %n, inode: %i\n" source/*/* cat source/*/identical # Sync the source tree to the target and display the results rclone -vv sync source target stat --printf="Name: %n, inode: %i\n" target/*/* cat target/*/identical # Hard-link the target tree to save space, and display the results hardlink -c target stat --printf="Name: %n, inode: %i\n" target/*/* cat target/*/identical # Modify a file on the source and display the results echo different > source/tree1/identical stat --printf="Name: %n, inode: %i\n" source/*/* cat source/*/identical # Sync the source tree to the target and display the results rclone -vv sync source target stat --printf="Name: %n, inode: %i\n" target/*/* cat target/*/identical
Notice in the final step that only
source/tree1/identical is copied to
target/tree1/identical, but that both files remain hard-linked, and
target/tree2/identical is modified accordingly. We need a way for
rclone to prevent this from happening.
rclone v1.57.0-DEV - os/version: redhat 8.9 - os/kernel: 4.18.0-477.27.1.el8_8.ppc64le (ppc64le) - os/type: linux - os/arch: ppc64le - go/version: go1.16.12 - go/linking: dynamic - go/tags: none
rclone sync source target
Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.
We run the standard RHEL8 version of
rclone, which does not have a
redacted subcommand. No remotes or config file are defined for this test case.
Posting the output from
rclone config show instead:
2023/11/20 14:24:11 NOTICE: Config file "/root/.config/rclone/rclone.conf" not found - using defaults ; empty config
2023/11/20 14:22:42 DEBUG : rclone: Version "v1.57.0-DEV" starting with parameters ["rclone" "-vv" "sync" "source" "target"] 2023/11/20 14:22:42 DEBUG : Creating backend with remote "source" 2023/11/20 14:22:42 NOTICE: Config file "/root/.config/rclone/rclone.conf" not found - using defaults 2023/11/20 14:22:42 DEBUG : fs cache: renaming cache item "source" to be canonical "/root/source" 2023/11/20 14:22:42 DEBUG : Creating backend with remote "target" 2023/11/20 14:22:42 DEBUG : fs cache: renaming cache item "target" to be canonical "/root/target" 2023/11/20 14:22:42 DEBUG : tree1/identical: Modification times differ by -119.99524ms: 2023-11-20 14:22:42.452213728 -0600 CST, 2023-11-20 14:22:42.332218488 -0600 CST 2023/11/20 14:22:42 DEBUG : tree2/identical: Size and modification time the same (differ by 0s, within tolerance 1ns) 2023/11/20 14:22:42 DEBUG : tree2/identical: Unchanged skipping 2023/11/20 14:22:42 DEBUG : Local file system at /root/target: Waiting for checks to finish 2023/11/20 14:22:42 DEBUG : tree1/identical: md5 = e77d9b8dcb84d1fcd21187b03eac74f1 (Local file system at /root/source) 2023/11/20 14:22:42 DEBUG : tree1/identical: md5 = e7576c27844afc0a30690ae46a264bf2 (Local file system at /root/target) 2023/11/20 14:22:42 DEBUG : tree1/identical: md5 differ 2023/11/20 14:22:42 DEBUG : Local file system at /root/target: Waiting for transfers to finish 2023/11/20 14:22:42 DEBUG : tree1/identical: md5 = e77d9b8dcb84d1fcd21187b03eac74f1 OK 2023/11/20 14:22:42 INFO : tree1/identical: Copied (replaced existing) 2023/11/20 14:22:42 DEBUG : Waiting for deletions to finish 2023/11/20 14:22:42 INFO : Transferred: 10 B / 10 B, 100%, 0 B/s, ETA - Checks: 2 / 2, 100% Transferred: 1 / 1, 100% Elapsed time: 0.0s 2023/11/20 14:22:42 DEBUG : 3 go routines active