Rclone dedupe files of same size but different names

What is the problem you are having with rclone?


What is your rclone version (output from rclone version)


Which OS you are using and how many bits (eg Windows 7, 64 bit)

Win, Linux, Mac

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone dedupe

I searched prior questions but couldn't find an answer, although I did see a feature request discussed earlier this year that might address it.

Occasionally there are files with the same size/hash but with different names in gdrive. Could be a photo that had an auto-name like 20190301.jog and birthday.jpg or something like file.jpg and copy of file.jpg.

Is there a simple command or script that would run through folders on a remote and delete exact dupes with different names? Could be automated, keeping only the newest file. Or interactive.

Apologies if this has already been addressed and I missed it.

As far as I know, dedupe only works on identical files or duplicates. Files in the same place with identical hashes can be handled automatically, while identical names with differing hashes usually requires some user input as the "correct answer" is not always obvious.

You could automate what you ask for, but it would have to be used with caution. It's not actually that unusual that an identical file exists in more than one place. Blindly removing the oldest of the two could make for example a program stop working when you download it again because some support file is missing (that some other program also used), and trying to figure out what happened could be annoying to say the least. For that reason it should not be a default behavior because it's not very "safe".

But specifically for photos and the like it would be ideal - I agree with that.

I don't think dedupe can currently do what you ask. You'd have to make a feature request (or upvote an existing one if you found that).

Meanwhile - it should be possible with some fairly basic scripting to output a list with hashes and dates from rclone, and then sort and delete with the external script. But then if you are willing to make that in the first place - why not just make a pull request and implement it directly into rclone? Anyone is free to make code suggestions :smiley:

If you are not up to scripting anything then my last suggestion would be trying to find some photo-deduping software. I'm sure there exist several good free programs for this. This would have the benefit of being a ready-made solution that could work through a normal mount. The downside is it would be slower if you have a ton of pictures because it wouldn't have access to hashes via the mount. To be sure (and not just rely on size and date) it would have to actually download the file (for any size that is identical) and either do some analysis on the picture - or prompt the user. i think the latter is usually more common - but even though it requires some interaction it's at least pretty fast as you probably just get them side-by-side visually and click a button to decide. At least that's how I remember it working last time I used such software (sorry, can't remember a spesific name).

Thank you stigma for the thoughtful reply. :+1:

Understood, re the potential dangers of deleting identical files that are in different locations. I am trying to find a way to delete hash-identical files that are in the same lowest-level folder but have different names.

When you say...

I assume you are talking about files that have identical hashes and names? If not, then that is exactly what I am looking for (delete duplicates in the same folder with the same hash but a different name).

There are various ways to script around it, as you mention. We could potentially cycle through folders and extract hash from rclone lsjson then select and delete. But before working through a script like that wanted to be sure the functionality isn't hidden somewhere.

It's not super high priority. Just one of those "hmm, want to do this, wonder if it's already there" questions.

Yes, sorry - I think rclone's automatic fixes only work on duplicate folders and filenames with same name and same hash.

The thing is that rclone's dedupe is not really meant for "traditional" file-organizing reduplication but rather as a tool to handle the fact that some Cloud-drives (like Google Drive) allows and sometimes accidentally creates identical copies of files. Most operating systems can not allow this or understand how to display it, so it's confusing to rclone how to handle these cases. Fixing these problems is what rclone deupe is for. That's why rclone dedupe is pretty limited in what it does - it only fixes the spesific types of issues that can happen on cloud drives.

Yes - you could either extract it from the json or just something like
rclone lsf MyGdrive:\ --format ph
Getting a script to read through that shouldn't be too hard

and chances are you can probably find a fully featured commanline deduping tool that you can pipe that output to directly if you look around a bit. That might be preferable to spending a lot of time making a fully custom script. The worst case might be you have to re-format the output a litte to make the input acceptable for the other problem, but that's a fairly small problem. I also assume that NCW probably stuck to some standard ways of formatting the output, so it may not be needed at all.

@thestigma Thank you for the reply. Very nice the way you explain why rclone works as it does and give a few options.

There are some great utilities that do exactly what I am looking for... dupeguru, ccleaner(feature) and many more. rclone is generally so much more efficient that I always try to use it when possible. rclone dedupe only deletes files that are in the same folder, ignoring dupes that are in different folders - which also fits my use case perfectly.

rclone lsf gdrive: --format ph is a great idea. Add t to get the timestamp and I'd have all the info I need for a 'keep latest' solution.

Again, thank you.

1 Like

Thank you for the praise, and you are very welcome :slight_smile:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.