Feature question: how does RClone track renames and moves?

ClementNerma · September 9, 2022, 10:21am

Hi there!

I've been using RClone with success for quite some time and it works extremely well.

I've got one question though: how does RClone track renames?

Let's say I have a file named a and I rename it to b, or even move it to another directory. How is to able to tell that b is a renamed a? Does it just check all files and see that if a file has been removed but there is a new file with the exact same size and modification time, then it's a rename?

I'm just curious about how this works internally

Ole · September 12, 2022, 8:48am

Hi Clément,

I guess you already read the docs:
https://rclone.org/docs/#track-renames
https://rclone.org/docs/#track-renames-strategy-hash-modtime-leaf-size

I can't make a better explanation, but see you are a developer, so I suggest you just skim the code by doing a free text seach for "trackRenames" in rclone/sync.go at master · rclone/rclone · GitHub

ClementNerma · September 12, 2022, 9:00am

Thanks, I was hoping there was some deep dive in the docs but it seems like it's not the case. I'll go check the source code then

Ole · September 12, 2022, 9:09am

I remember @ncw explaining it somewhere in the forum or Github, but can't find it at the moment. You may have better luck

ncw · September 12, 2022, 9:47am

We should probably put an explanation for this in the docs...

What happens is this conceptually like this:

An rclone sync with --track-renames runs like a normal sync, but keeps track of objects which exist in the destination but not in the source (which would normally be deleted), and which objects exist in the source but not the destination (which would normally be transferred). These objects are then candidates for renaming.

After the sync, rclone matches up the source only and destination only objects using the --track-renames-strategy and either renames the destination object or transfers the source and deletes the destination object.

The actual implementation works like this - you can compare this with the source code:

When --track-renames is enabled the sync is done as normal, but

if an object is only in the source, it is added to the renameCheck list and not transferred
if an object is only in the destination it is added to the dstFiles map keyed by path and not deleted

At the end of the sync rclone then

creates the renameMap from the dstFiles using --track-renames-strategy to define the map key
matches all of the files in renameCheck against renameMap and if they match, renames them, and if they don't transfers them
After this, any remaining files in dstFiles are deleted.

ClementNerma · September 12, 2022, 10:31am

Ok so if I understand correctly RClone is stateless? It doesn't store informations about the source objects anywhere? It's just making a list of source-only and destination-only files and check if a file has the same size and modification time to see if it's a renamed file?

ncw · September 12, 2022, 10:36am

That is correct.

Yes, that is it.

ClementNerma · September 12, 2022, 1:46pm

INSEE, thanks for your explanation

jwink3101 · September 12, 2022, 3:58pm

Just to be pedantic for the sake of clarity, most of rclone is stateless. Some notable exception are bisync and the hasher remote.

Also, just a warning on modtime strategy: It will not check for a unique match. So if you have to files with identical sizes (to the byte) and ModTimes (to the resolution of the system), it can have a false match. Not super likely but it can happen. I opened a ticket or a forum post about it and, eventually, plan to muddle my way through golang to add the option to enforce uniqueness but it has been OBE.

If you're interested, I investigated my own file system. The few false matches are from:

Small files in a directory where something like touch * has been executed
macOS Sparsebundle Disk Images where exactly 8388608 byte blocks are created

ClementNerma · September 12, 2022, 4:26pm

That's exactly what I thought. I'm currently building a synchronisation tool myself (a kind of Rclone for more specialized usages) and that's the problem I had as well.

I don't see how to ensure correctness while still being stateless. FFS (FreeFileSync) achieves reliable renaming détection by storing each file's node ID in a state file and re-using it each times. But that requires to have a state and does not work on every filesystem.

And I'm not sure whether it's possible to actually achieve stateless and reliable renaming tracking

jwink3101 · September 12, 2022, 5:32pm

I'm not sure whether it's possible to actually achieve stateless and reliable renaming tracking

I've given a lot of thought to this when I developed syncrclone (a competitor to the built-in bisync with some notable pros and cons). But, alas, syncrclone is stateful.

It is possible to be reliable if you have hashes. If you don't have hashes, what I do in syncrclone is only allow the move if it can uniquely match. Still a (small) risk but that's worth taking. And the thing is, the safe answer is to just transfer again.

Remember, from @ncw's answer, it only compared deleted files. So having more than one file with the size and ModTime are only an issue if more than one has been moved.

ncw · September 12, 2022, 7:21pm

Interesting point. Rclone just picks one according to the source, but maybe it should be giving a warning or an error if there are duplicates?

jwink3101 · September 12, 2022, 9:17pm

I think so. We talked about it in the past and you pointed me to where in the code to start. I just haven't had the time.

I think it should be a flag though since (a) you otherwise break backward compatibility and (b) you no longer short-circuit the loop so it can be slower in theory (I mainly use Python so loops are very inefficient. Less of an issue with Golang)

Engelbrecht · September 12, 2022, 9:23pm

Different question - but fits to the topic @ncw :
According to the docs, this doesn't work on encrypted destinations currently. Are there any plans to implement this feature (or is there a technical limitation for this usecase?)?

Animosity022 · September 12, 2022, 9:24pm

See:

rclone cryptcheck

Already there.

jwink3101 · September 12, 2022, 9:26pm

You can use ModTimes (and some of the others) if the remote supports it. Maybe even the hasher remote around encrypted if you wanted.

ncw · September 13, 2022, 2:26pm

--track-renames-strategy modtime is a possibility.

It would be possible in theory to use the same methodology that rclone cryptcheck does to caclulate the encrypted checksums and use those, but that is a lot of work!

system · September 16, 2022, 2:26pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.