Hard-linked target files not being unlinked before copy

What is the problem you are having with rclone?

I'm looking for a way to convince rclone to re-create a hard-linked target file so the new file is freshly written and other linked instances are not affected.

We have an environment populated by rclone sync where the target file system contains a huge number of identical copies of identically-named files. To manage the amount of space consumed, after all syncing is performed we run a scheduled process to hard-link all identical files so that only one physical copy of the data is stored. Any hard-linking on the source system that might exist is ignored; we make no attempt to replicate hard-links during the sync.

The problem arises when any such file is updated on the source to have differing content. When rclone sync sees that an update is required, the source file is copied to the target in a manner that causes any existing hard link to be retained rather than discarded, causing all other instances of that file to be updated as well. I recognize that this is probably desirable in some cases and that underlying open() calls might be designed to operate this way, but in our case the behavior is very undesirable. My guess is that an underlying unlink() prior to the open() would "fix" the problem for us, but that is of course dependent on the back-end being used. In our case, the target back-end is always the local file system.

The following scenario duplicates the problem:

# Create the source tree and display the results
mkdir -p source/tree1 source/tree2
echo identical > source/tree1/identical
echo identical > source/tree2/identical
stat --printf="Name: %n, inode: %i\n"  source/*/*
cat source/*/identical

# Sync the source tree to the target and display the results
rclone -vv sync source target
stat --printf="Name: %n, inode: %i\n"  target/*/*
cat target/*/identical

# Hard-link the target tree to save space, and display the results
hardlink -c target
stat --printf="Name: %n, inode: %i\n"  target/*/*
cat target/*/identical

# Modify a file on the source and display the results
echo different > source/tree1/identical
stat --printf="Name: %n, inode: %i\n"  source/*/*
cat source/*/identical

# Sync the source tree to the target and display the results
rclone -vv sync source target
stat --printf="Name: %n, inode: %i\n"  target/*/*
cat target/*/identical

Notice in the final step that only source/tree1/identical is copied to target/tree1/identical, but that both files remain hard-linked, and target/tree2/identical is modified accordingly. We need a way for rclone to prevent this from happening.

Run the command 'rclone version' and share the full output of the command.

rclone v1.57.0-DEV
- os/version: redhat 8.9
- os/kernel: 4.18.0-477.27.1.el8_8.ppc64le (ppc64le)
- os/type: linux
- os/arch: ppc64le
- go/version: go1.16.12
- go/linking: dynamic
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

None.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync source target

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

We run the standard RHEL8 version of rclone, which does not have a redacted subcommand. No remotes or config file are defined for this test case.

Posting the output from rclone config show instead:

2023/11/20 14:24:11 NOTICE: Config file "/root/.config/rclone/rclone.conf" not found - using defaults
; empty config

A log from the command that you were trying to run with the -vv flag

2023/11/20 14:22:42 DEBUG : rclone: Version "v1.57.0-DEV" starting with parameters ["rclone" "-vv" "sync" "source" "target"]
2023/11/20 14:22:42 DEBUG : Creating backend with remote "source"
2023/11/20 14:22:42 NOTICE: Config file "/root/.config/rclone/rclone.conf" not found - using defaults
2023/11/20 14:22:42 DEBUG : fs cache: renaming cache item "source" to be canonical "/root/source"
2023/11/20 14:22:42 DEBUG : Creating backend with remote "target"
2023/11/20 14:22:42 DEBUG : fs cache: renaming cache item "target" to be canonical "/root/target"
2023/11/20 14:22:42 DEBUG : tree1/identical: Modification times differ by -119.99524ms: 2023-11-20 14:22:42.452213728 -0600 CST, 2023-11-20 14:22:42.332218488 -0600 CST
2023/11/20 14:22:42 DEBUG : tree2/identical: Size and modification time the same (differ by 0s, within tolerance 1ns)
2023/11/20 14:22:42 DEBUG : tree2/identical: Unchanged skipping
2023/11/20 14:22:42 DEBUG : Local file system at /root/target: Waiting for checks to finish
2023/11/20 14:22:42 DEBUG : tree1/identical: md5 = e77d9b8dcb84d1fcd21187b03eac74f1 (Local file system at /root/source)
2023/11/20 14:22:42 DEBUG : tree1/identical: md5 = e7576c27844afc0a30690ae46a264bf2 (Local file system at /root/target)
2023/11/20 14:22:42 DEBUG : tree1/identical: md5 differ
2023/11/20 14:22:42 DEBUG : Local file system at /root/target: Waiting for transfers to finish
2023/11/20 14:22:42 DEBUG : tree1/identical: md5 = e77d9b8dcb84d1fcd21187b03eac74f1 OK
2023/11/20 14:22:42 INFO  : tree1/identical: Copied (replaced existing)
2023/11/20 14:22:42 DEBUG : Waiting for deletions to finish
2023/11/20 14:22:42 INFO  : 
Transferred:             10 B / 10 B, 100%, 0 B/s, ETA -
Checks:                 2 / 2, 100%
Transferred:            1 / 1, 100%
Elapsed time:         0.0s

2023/11/20 14:22:42 DEBUG : 3 go routines active

What you need is a more recent rclone...

In v1.63 we now copy files to the local file system to a temporary name and then rename them over the destination when they have been verified. This will break the hardlinks as you require.

So please give the latest release of rclone a try!

Thank you, but there are no ppc64le builds on the download site. Do I need to build it myself?

Yes, that is correct.

No problem, easily done. In fact, I'm very impressed at how fast and easy it is to build rclone. My compliments to the development team.

The new version does appear to fix the problem, at least for the test case I posted. Time to try it out on the larger test bed.

1 Like