Is there any way to find duplicate files within a tree, based not on their names, but ONLY their size & md5 checksum?
I have lots of duplicates that do not have the same name. As I understand,
dedupe uses names first, then compares checksums to find ‘identical duplicates’.
Ha, I realize this was asked this last year:
For anyone else who wants this, @ncw suggested on GitHub using this to find dupes based on md5:
rclone md5sum remote:path | sort | uniq -c | sort -n
Hm, actually this only works for duplicate names AND md5. Need to look at JUST the checksum. (Need to look at only the first 32 chars of each line.)
You can use
lsf for this…
rclone lsf -R --format hs --files-only remote:path | sort | uniq -c | grep -v '^ *1'
This will show you all the duplicate hashes/size files
You’ll then need to look up which files are duplicated by grepping the hash the output of
rclone md5sum remote:path.
That could all be in a little bash script which I haven’t got time to write just now but it shouldn’t be too hard!