As far as i can tell, this isn't efficient. rclone is reuploading a file which can be renamed. I don't know if I'm just missing a different flag or if this a bug. In this scenario it should first delete file1.txt and then rename file2.txt to file1.txt. The output could be:
file1.txt: Deleted
file1.txt: Renamed from file "file2.txt"
Run the command 'rclone version' and share the full output of the command.
I'm pretty sure this applies to all operating systems
Thank you for your response. I thought the sync command create a 1:1 copy of the file. That's why in those steps I ran the command twice. First one to make a copy of the file to make the hash and modtime identical, Then i rename a file2.txt to file1.txt (a name that already exists in the destination folder) which means source/file1.txt is identical to destination/file2.txt. There's now also a redundant destination/file1.txt in there which causes the problem. Then I run the sync command again. In this example file1.txt & file2.txt get deleted from the destination folder, and file1.txt is uploaded/synced to the destination a second time. I thought this might be inefficient, because the file is already there and should be renamed, eve when there's a file already in it's place.
As for the debug log, I don't think it will help that much, because even with the with just the -v flag, it gives the output I need. Here's the debug log for the second sync, if you still need it:
Also, if file1.txt isn't in the destination folder, file2.txt will be renamed.
Sorry for not making myself clear, but in the log file it says nothing about the md5 of destination/file2.txt. It just deletes it, while if it had checked what md5 that had, it would've been identical to source/file1.txt. I hope this doesn't look terrible, but here's simple tree to demonstrate what files are where, and which hashes they have:
Your observations are correct and I can replicate this behaviour. IMO it is the consequence of the algorithm used to implement it.
--track-rename logic is only applied after sync (with --delete-after active), for source only and destination only objects.
So in your example (which is a bit edge case IMO but definitely valid) file1.txt is synced - as for normal sync logic it is what is needed - and there is nothing left to do for --track-rename part.
Maybe it could be possible to improve this algorithm - any suggestions are welcomed.
An rclone sync with --track-renames runs like a normal sync, but keeps track of objects which exist in the destination but not in the source (which would normally be deleted), and which objects exist in the source but not the destination (which would normally be transferred). These objects are then candidates for renaming.
After the sync, rclone matches up the source only and destination only objects using the --track-renames-strategy and either renames the destination object or transfers the source and deletes the destination object.
The actual implementation works like this - you can compare this with the source code:
When --track-renames is enabled the sync is done as normal, but
if an object is only in the source, it is added to the renameCheck list and not transferred
if an object is only in the destination it is added to the dstFiles map keyed by path and not deleted
At the end of the sync rclone then
creates the renameMap from the dstFiles using --track-renames-strategy to define the map key
matches all of the files in renameCheck against renameMap and if they match, renames them, and if they don't transfers them
After this, any remaining files in dstFiles are deleted.
So you can see why this case is not being renamed - it is because there is a matching src and dst and a normal sync is done.
We could potentially modify the first part of the algorithm to this
If src and dst match then skip, otherwise
Add src objects to the renameCheck list
Add dst objects to the dstFiles map keyed by path
This would then store all of the transfers to be done at the end which would bulk up the renameCheck list to include all the transfers. The algorithm would proceed as above.
I think this would work. It has the disadvantage of not doing any syncing until it has looked at all the files and it stores the whole transfer in memory before starting which will definitely use more ram.
IMO in general --track-renames should do exactly what it says on the tin.
And it does not atm... The example @FirePower provided is trivial but if files used were 100s of GB large then it would make very substantial difference.
If modifying the algorithm does not require rewriting 100s lines of code IMO we should improve it.
I think this is only the question of clearly documenting how --track-renames works. My understanding is that overall run time won't be different - it is only different order of operations. And I do not think that RAM usage should justify not getting it right. Obviously if I want to have an extra functionality and avoid costly transfers it comes with some cost.
Thank you for looking into it. I'm not that technical, but just thought I should mention it. I'm probably not somebody who understands how file transfer works. I honestly don't even know what's faster in this scenario. But if the flag is called --track-renames, I just thought I should mention that it doesn't always work. I'll leave it up to you to decide what is faster and what is not.
When I first started using rclone I assumed that's how it would work and was disappointed that it didn't seem to do that. After that, I assumed it wouldn't do a very good job.
So, I'm very much in favor of this being the new behavior of this option.