I've been using RClone with success for quite some time and it works extremely well.
I've got one question though: how does RClone track renames?
Let's say I have a file named a and I rename it to b, or even move it to another directory. How is to able to tell that b is a renamed a? Does it just check all files and see that if a file has been removed but there is a new file with the exact same size and modification time, then it's a rename?
We should probably put an explanation for this in the docs...
What happens is this conceptually like this:
An rclone sync with --track-renames runs like a normal sync, but keeps track of objects which exist in the destination but not in the source (which would normally be deleted), and which objects exist in the source but not the destination (which would normally be transferred). These objects are then candidates for renaming.
After the sync, rclone matches up the source only and destination only objects using the --track-renames-strategy and either renames the destination object or transfers the source and deletes the destination object.
The actual implementation works like this - you can compare this with the source code:
When --track-renames is enabled the sync is done as normal, but
if an object is only in the source, it is added to the renameCheck list and not transferred
if an object is only in the destination it is added to the dstFiles map keyed by path and not deleted
At the end of the sync rclone then
creates the renameMap from the dstFiles using --track-renames-strategy to define the map key
matches all of the files in renameCheck against renameMap and if they match, renames them, and if they don't transfers them
After this, any remaining files in dstFiles are deleted.
Ok so if I understand correctly RClone is stateless? It doesn't store informations about the source objects anywhere? It's just making a list of source-only and destination-only files and check if a file has the same size and modification time to see if it's a renamed file?
Just to be pedantic for the sake of clarity, most of rclone is stateless. Some notable exception are bisync and the hasher remote.
Also, just a warning on modtime strategy: It will not check for a unique match. So if you have to files with identical sizes (to the byte) and ModTimes (to the resolution of the system), it can have a false match. Not super likely but it can happen. I opened a ticket or a forum post about it and, eventually, plan to muddle my way through golang to add the option to enforce uniqueness but it has been OBE.
If you're interested, I investigated my own file system. The few false matches are from:
Small files in a directory where something like touch * has been executed
macOS Sparsebundle Disk Images where exactly 8388608 byte blocks are created
That's exactly what I thought. I'm currently building a synchronisation tool myself (a kind of Rclone for more specialized usages) and that's the problem I had as well.
I'm not sure whether it's possible to actually achieve stateless and reliable renaming tracking
I've given a lot of thought to this when I developed syncrclone (a competitor to the built-in bisync with some notable pros and cons). But, alas, syncrclone is stateful.
It is possible to be reliable if you have hashes. If you don't have hashes, what I do in syncrclone is only allow the move if it can uniquely match. Still a (small) risk but that's worth taking. And the thing is, the safe answer is to just transfer again.
Remember, from @ncw's answer, it only compared deleted files. So having more than one file with the size and ModTime are only an issue if more than one has been moved.
I think so. We talked about it in the past and you pointed me to where in the code to start. I just haven't had the time.
I think it should be a flag though since (a) you otherwise break backward compatibility and (b) you no longer short-circuit the loop so it can be slower in theory (I mainly use Python so loops are very inefficient. Less of an issue with Golang)
Different question - but fits to the topic @ncw :
According to the docs, this doesn't work on encrypted destinations currently. Are there any plans to implement this feature (or is there a technical limitation for this usecase?)?
--track-renames-strategy modtime is a possibility.
It would be possible in theory to use the same methodology that rclone cryptcheck does to caclulate the encrypted checksums and use those, but that is a lot of work!