Rclone dedupe still limited by files in same path?

I was reading this:

dedupe considers files to be identical if they have the same file path and the same hash

Does rclone still have this limitation?

If so is there currently a way to find duplicates that have the same name but in different paths?

And also that doesn't rely on hash because some files have the same name but different hashes

It was never rclone limitation as such but default mode of operation.

If you want to find duplicates across different path use --by-hash flag.

I also always recommend to test before making any changes by using --dry-run option.

And if you find documentation not entirely clear you can always suggests its edit. It is never ending work in progress.

--by--hash does not work in remotes that have no hash like crypt remotes. It will also fail to find files that have the same filename but different content across different paths

Correct. This is why maintaining hashes is extremely useful. Myself I use chunker for it. Also hasher remote can be used.

Also correct and logical IMO. I can not see why you would like files with different content to be considerer identical.

If they have the same filename but different content I'd still want to be able to find out and why

This is not something you choose...

This is not possible using rclone dedupe today.

I would list all content to a file (rclone ls) and use some scripting voodoo to find duplicate names.

This prints out the paths of duplicated file names in pairs

rclone lsf -R remote:path --files-only | awk '{ n=split ($1,a,/\//); leaf=a[n]; if (leaves[leaf]) {print $i, leaves[leaf]} leaves[leaf]=$1; }'

Wow apparently I have 109 files with duplicated filenames and in different folders.

Gotta cleanup this up! Thanks ncw as always

Thank you for this example.

For real life usage it requires some ironing though. As it is now it produces wrong results when there are spaces in filenames. But definitely good start:)

And a note for macOS users - this script requires GNU awk - brew install gawk

1 Like

Silly mistake, that should have been like this (replace $1 (the first space separated item in the line) with $_ (the whole line))

rclone lsf -R remote:path --files-only | awk '{ n=split ($_,a,/\//); leaf=a[n]; if (leaves[leaf]) {print $_, leaves[leaf]} leaves[leaf]=$_; }'

Thanks @kapitainsky

I don't use awk very often but it is very cool for one liners like this.


This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.