Deleting duplicate files

I have a significant number of duplicate (i.e. with the same MD5 hash) files in different directories and with different names in my Google Drive. rclone dedup identified and deleted some of them but it doesn't seem to be working across directories. Is there anything I can do do leave only one instance of a file across Google Drive with the same MD5 hash?

Rclone dedupe only dedupes files with the same name (which can't happen on a normal file system).

You can use rclone md5sum to get a list of all the hashes and find all the duplicates like this

rclone md5sum drive: | sort | uniq -w32 -dD > duplicates

Then you could look through that list by hand and work out which files needed to be deleted.

Thank you very much for a prompt reply! I have generated a list of md5 hashes but some files lack them, eg:

b26acc0ceafdddadd9e344a63e9359b1 Armenian/Mandakuni-Jarer.pdf
Armenian/Light Creator of light.docx
d27109593f6306658bbe2fa954ead6a3 Armenian/Cross of prayer.odt

The second file exists but no md5 hash is shown for it.

Is this a bug or I am missing something?

That's very likely a google doc - these don't have MD5 hashes.

You can skip the google docs with --drive-skip-gdocs

1 Like

Thank you very much!! Now I have a list of files to delete but they all have spaces in the filenames - how do I escape them to use in a shell loop so that spaces don't confuse shell or rclone? Can I use %20 instead of ' ' in file names with rclone?

If you get them all in a file then you can use

rclone delete --dry-run --files-from files-to-delete drive:path

Remove --dry-run when happy!