I have a significant number of duplicate (i.e. with the same MD5 hash) files in different directories and with different names in my Google Drive. rclone dedup identified and deleted some of them but it doesn't seem to be working across directories. Is there anything I can do do leave only one instance of a file across Google Drive with the same MD5 hash?
Rclone dedupe only dedupes files with the same name (which can't happen on a normal file system).
You can use
rclone md5sum to get a list of all the hashes and find all the duplicates like this
rclone md5sum drive: | sort | uniq -w32 -dD > duplicates
Then you could look through that list by hand and work out which files needed to be deleted.
Thank you very much for a prompt reply! I have generated a list of md5 hashes but some files lack them, eg:
Armenian/Light Creator of light.docx
d27109593f6306658bbe2fa954ead6a3 Armenian/Cross of prayer.odt
The second file exists but no md5 hash is shown for it.
Is this a bug or I am missing something?
That's very likely a google doc - these don't have MD5 hashes.
You can skip the google docs with
Thank you very much!! Now I have a list of files to delete but they all have spaces in the filenames - how do I escape them to use in a shell loop so that spaces don't confuse shell or rclone? Can I use %20 instead of ' ' in file names with rclone?
If you get them all in a file then you can use
rclone delete --dry-run --files-from files-to-delete drive:path
--dry-run when happy!