ZeroG
June 2, 2018, 2:36pm
1
Is there any way to find duplicate files within a tree, based not on their names, but ONLY their size & md5 checksum?
I have lots of duplicates that do not have the same name. As I understand, dedupe
uses names first, then compares checksums to find ‘identical duplicates’.
ZeroG
June 2, 2018, 2:48pm
2
Ha, I realize this was asked this last year:
Using Google Drive folder with multiple files named differently but they are the same file. Can I use dedupe and ignore the names and only compare based on the checksums?
Any news?
For anyone else who wants this, @ncw suggested on GitHub using this to find dupes based on md5:
rclone md5sum remote:path | sort | uniq -c | sort -n
Hm, actually this only works for duplicate names AND md5. Need to look at JUST the checksum. (Need to look at only the first 32 chars of each line.)
ncw
(Nick Craig-Wood)
June 2, 2018, 10:37pm
3
ZeroG:
For anyone else who wants this, @ncw suggested on GitHub using this to find dupes based on md5:
rclone md5sum remote:path | sort | uniq -c | sort -n
Hm, actually this only works for duplicate names AND md5. Need to look at JUST the checksum. (Need to look at only the first 32 chars of each line.)
You can use lsf
for this…
rclone lsf -R --format hs --files-only remote:path | sort | uniq -c | grep -v '^ *1'
This will show you all the duplicate hashes/size files
You’ll then need to look up which files are duplicated by grepping the hash the output of rclone md5sum remote:path
.
That could all be in a little bash script which I haven’t got time to write just now but it shouldn’t be too hard!