Dedupe encrypted files (via hash) on google drive

What is the problem you are having with rclone?

Can I dedupe (via hash) for encrypted files with standard file name encryption currently sitting on google drive ? Or do I need to reupload ?

What is your rclone version (output from rclone version)

Latest

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Catalina

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

Paste command here

The rclone config contents with secrets removed.

Paste config here

A log from the command with the -vv flag

Paste  log here

Not sure I'm following all the way.

dedupe takes two files that are the same and allows you to remove one of them.

Is that what you are asking?

You can use by hash as it's written up here:

https://rclone.org/commands/rclone_dedupe/

Sorry, English is my second language. I’ll be more specific.

I have folder structure like this,

photos (parent folder)

  • Folder A (sub folder)
  • Folder B (sub folder)
  • Folder C (sub folder)

My configuration is encrypted remote (gcrypt), google drive backend, file name encryption - standard

Between folders A,B,C there is high likelihood of duplicate photos.

My question, can I run Rclone dedupe (hash deduping) on the above parent folder ? Or will
I need to download and reupload

Dedupe works like this as it's written up on the linked page I shared:

dedupe considers files to be identical if they have the same file path and the same hash. If the backend does not support hashes (e.g. crypt wrapping Google Drive) then they will never be found to be identical. If you use the --size-only flag then files will be considered identical if they have the same size (any hash will be ignored). This can be useful on crypt backends which do not support hashes.

So a duplicate would be two of the same files in A.

If a the same file was in A and in B, that's not a duplicate.

one possible workaround is to use rclone mount, which makes cloud storage appear as a local folder.
then use any dedupe tool you would use for local on that mount.

if you are worried about having to download the files to calculate the hashes, you can use a free/cheap virtual machine from google cloud.
for google cloud compute virtual machines, there are not ingress/egress fees to access gdrive.
https://cloud.google.com/vpc/network-pricing#egress-to%20service

1 Like

Thank you for your response. I like your mount + dedupe idea. Do you have any dedupe tools that you’d recommend ?

glad to help.
sorry, no suggestions for your your platform.

https://www.macworld.co.uk/how-to/find-delete-duplicate-files-mac-3679414/

I have access to a Windows pc and a Linux machine as well so I’m not tied to Mac. If you have no recommendations or experiences with dedupe tools then that’s fine too.

https://www.techspot.com/article/1648-delete-duplicate-files/

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.