Any way to `dedupe` identical duplicates using ONLY the md5?

ZeroG · June 2, 2018, 2:36pm

Is there any way to find duplicate files within a tree, based not on their names, but ONLY their size & md5 checksum?

I have lots of duplicates that do not have the same name. As I understand, dedupe uses names first, then compares checksums to find ‘identical duplicates’.

ZeroG · June 2, 2018, 2:48pm

Ha, I realize this was asked this last year:

Any news?

For anyone else who wants this, @ncw suggested on GitHub using this to find dupes based on md5:
rclone md5sum remote:path | sort | uniq -c | sort -n

Hm, actually this only works for duplicate names AND md5. Need to look at JUST the checksum. (Need to look at only the first 32 chars of each line.)

ncw · June 2, 2018, 10:37pm

You can use lsf for this...

rclone lsf -R --format hs --files-only remote:path | sort | uniq -c | grep -v '^ *1'

This will show you all the duplicate hashes/size files

You'll then need to look up which files are duplicated by grepping the hash the output of rclone md5sum remote:path.

That could all be in a little bash script which I haven't got time to write just now but it shouldn't be too hard!