Flag --track-renames doesn't compare files properly

FirePower · December 6, 2023, 1:27pm

What is the problem you are having with rclone?

To explain it in more detail, please read the steps to reproduce. I honestly don't know how to explain it without these steps:

create an empty source and destination directory
create two files in the source directory. file1.txt file2.txt
sync the source with the destination using the command rclone sync -v --track-renames source/ destination/
now in the source directory delete file1.txt and rename file2.txt to file1.txt
run the same sync command, and the output will be similar to this:

file1.txt: Copied (replaced existing)
file2.txt: Deleted

As far as i can tell, this isn't efficient. rclone is reuploading a file which can be renamed. I don't know if I'm just missing a different flag or if this a bug. In this scenario it should first delete file1.txt and then rename file2.txt to file1.txt. The output could be:

file1.txt: Deleted
file1.txt: Renamed from file "file2.txt"

Run the command 'rclone version' and share the full output of the command.

I'm pretty sure this applies to all operating systems

rclone 1.65.0-termux
- os/version: unknown
- os/kernel: .4.210-qgki-g49ab1df3c394 (aarch64)
- os/type: andorid
- os/arch: arm64 (ARMv8 compatible)
- go/version: go1.21.4
- go/linking: dynamic
- go/tags: noselfupdate

Which cloud storage system are you using? (eg Google Drive)

None. This even works (or doesn't work) locally.

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

rclone sync -v --track-renames source/ destination/

The rclone config contents with secrets removed.

Because there's no cloud storage involved, this information should be unimportant.

A log from the command with the `-vv` flag

As stated before, the log should be similar to:

file1.txt: Copied (replaced existing)
file2.txt: Deleted

asdffdsa · December 6, 2023, 1:58pm

hi, hard to know exactly what is going on, as no debug log was posted?

about those two files:
same hash or different hash, same modtime or different modtime?

FirePower · December 6, 2023, 4:05pm

Thank you for your response. I thought the sync command create a 1:1 copy of the file. That's why in those steps I ran the command twice. First one to make a copy of the file to make the hash and modtime identical, Then i rename a file2.txt to file1.txt (a name that already exists in the destination folder) which means source/file1.txt is identical to destination/file2.txt. There's now also a redundant destination/file1.txt in there which causes the problem. Then I run the sync command again. In this example file1.txt & file2.txt get deleted from the destination folder, and file1.txt is uploaded/synced to the destination a second time. I thought this might be inefficient, because the file is already there and should be renamed, eve when there's a file already in it's place.
As for the debug log, I don't think it will help that much, because even with the with just the -v flag, it gives the output I need. Here's the debug log for the second sync, if you still need it:

Also, if file1.txt isn't in the destination folder, file2.txt will be renamed.

asdffdsa · December 6, 2023, 4:49pm

from the debug log file1.txt: md5 differ
if the files are not the same, then rclone has to copy it.

if source/file1.txt and source/file2.txt are identical

file1.txt;2023-12-06 12:05:01;6f8f57715090da2632453988d9a1501b
file2.txt;2023-12-06 12:05:01;6f8f57715090da2632453988d9a1501b

then the output from rclone sync --track-renames

file1.txt: Unchanged skipping
file2.txt: Deleted

if source/file1.txt and source/file2.txt are not identical

file1.txt;2023-12-06 08:39:13;c4ca4238a0b923820dcc509a6f75849b
file2.txt;2023-12-06 08:51:49;c81e728d9d4c2f636f067f89cc14862c

then the output from rclone sync --track-renames

file1.txt: md5 differ
file1.txt: Copied (replaced existing)
file2.txt: Deleted

FirePower · December 6, 2023, 6:46pm

Sorry for not making myself clear, but in the log file it says nothing about the md5 of destination/file2.txt. It just deletes it, while if it had checked what md5 that had, it would've been identical to source/file1.txt. I hope this doesn't look terrible, but here's simple tree to demonstrate what files are where, and which hashes they have:

/data/data/com.termux/files/home/rclone-test/
|
|--src
|   |--file1.txt;26ab0db90d72e28ad0ba1e22ee510510 
|
---dst
    |--file1.txt;b026324c6904b2a9cb4b88d6d61c81d1 
    ---file2.txt;26ab0db90d72e28ad0ba1e22ee510510

Edit: The tree probably looks terrible. It doesn't seem to adjust to the space i put in front of them. Here's a better example:

asdffdsa · December 6, 2023, 10:32pm

ok, now, i get your point.
i am not an expert with --track-renames, but now i think we have enough basic info for others to comment...

you can enclose text with three backticks
```

rclone tree ./zork
/
├── destination
│   └── file1.txt
├── doit.sh
├── org
│   ├── match
│   │   ├── file1.txt
│   │   └── file2.txt
│   └── nomatch
│       ├── file1.txt
│       └── file2.txt
├── org.md5
└── source
    └── file1.txt

kapitainsky · December 7, 2023, 8:44am

Your observations are correct and I can replicate this behaviour. IMO it is the consequence of the algorithm used to implement it.

--track-rename logic is only applied after sync (with --delete-after active), for source only and destination only objects.

So in your example (which is a bit edge case IMO but definitely valid) file1.txt is synced - as for normal sync logic it is what is needed - and there is nothing left to do for --track-rename part.

Maybe it could be possible to improve this algorithm - any suggestions are welcomed.

ncw · December 7, 2023, 11:14am

An rclone sync with --track-renames runs like a normal sync, but keeps track of objects which exist in the destination but not in the source (which would normally be deleted), and which objects exist in the source but not the destination (which would normally be transferred). These objects are then candidates for renaming.

After the sync, rclone matches up the source only and destination only objects using the --track-renames-strategy and either renames the destination object or transfers the source and deletes the destination object.

The actual implementation works like this - you can compare this with the source code:

When --track-renames is enabled the sync is done as normal, but

if an object is only in the source, it is added to the renameCheck list and not transferred
if an object is only in the destination it is added to the dstFiles map keyed by path and not deleted

At the end of the sync rclone then

creates the renameMap from the dstFiles using --track-renames-strategy to define the map key
matches all of the files in renameCheck against renameMap and if they match, renames them, and if they don't transfers them
After this, any remaining files in dstFiles are deleted.

So you can see why this case is not being renamed - it is because there is a matching src and dst and a normal sync is done.

We could potentially modify the first part of the algorithm to this

If src and dst match then skip, otherwise
- Add src objects to the renameCheck list
- Add dst objects to the dstFiles map keyed by path

This would then store all of the transfers to be done at the end which would bulk up the renameCheck list to include all the transfers. The algorithm would proceed as above.

I think this would work. It has the disadvantage of not doing any syncing until it has looked at all the files and it stores the whole transfer in memory before starting which will definitely use more ram.

Thoughts?

kapitainsky · December 7, 2023, 3:17pm

IMO in general --track-renames should do exactly what it says on the tin.

And it does not atm... The example @FirePower provided is trivial but if files used were 100s of GB large then it would make very substantial difference.

If modifying the algorithm does not require rewriting 100s lines of code IMO we should improve it.

I think this is only the question of clearly documenting how --track-renames works. My understanding is that overall run time won't be different - it is only different order of operations. And I do not think that RAM usage should justify not getting it right. Obviously if I want to have an extra functionality and avoid costly transfers it comes with some cost.

FirePower · December 8, 2023, 7:43pm

Thank you for looking into it. I'm not that technical, but just thought I should mention it. I'm probably not somebody who understands how file transfer works. I honestly don't even know what's faster in this scenario. But if the flag is called --track-renames, I just thought I should mention that it doesn't always work. I'll leave it up to you to decide what is faster and what is not.

envoy510 · December 17, 2023, 7:33pm

When I first started using rclone I assumed that's how it would work and was disappointed that it didn't seem to do that. After that, I assumed it wouldn't do a very good job.

So, I'm very much in favor of this being the new behavior of this option.

system · January 16, 2024, 7:34pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.