Is there a decent way to globally deduplicate Google Drive?

What is the problem you are having with rclone?

I would like to use rclone to deduplicate files on Google Drive that are also in other locations. I am unsure of the best way to approach the problem. I backed up endless garbage for years to an unlimited drive, which is now limited. I would like to keep one copy of everything without making a mess or accidentally deleting something I might care about. I don't hate puppies.

Run the command 'rclone version' and share the full output of the command.

╰─λ rclone version

rclone v1.59.1
- os/version: darwin 13.5 (64 bit)
- os/kernel: 22.6.0 (arm64)
- os/type: darwin
- os/arch: arm64
- go/version: go1.18.5
- go/linking: dynamic
- go/tags: cmount

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone dedupe gdrive: --dedupe-mode newest
rclone hashsum md5 gdrive: > data.txt
sort -k1,1 data.txt > sorted_data.txt
uniq -d -w 32 sorted_data.txt > duplicated_hashes.txt
grep -vf duplicated_hashes.txt sorted_data.txt > final_result.txt

The rclone config contents with secrets removed.

[gdrive]
type = drive
scope = drive
root_folder_id = ABCDEFG            
token = {"access_token":"HIJKLMNOP"}
team_drive =
client_id = 12345
client_secret = 67890

A log from the command with the -vv flag

I already ran rclone dedup without -vv so the files are gone, I can't exactly replicate the procedure at this point.
Also,
 ╰─λ wc -l final_result.txt
  800373 final_result.txt
Eight Hundred Thousand lines for duplicated hashes.

Any thoughts on a decent way to deduplicate a Google Drive that has duplicate files that are not in the same place?
For years I backed up the Download folder for each of my devices to different locations on the gdrive. Maybe I downloaded a file on only one device, I want to keep this valuable copy.
Maybe I downloaded that file to 20 different devices, now I have 20 copies spread out everywhere, each using up my non-unlimited google drive storage.

As it looks as you know how to deduplicate files when they are in one location... then put all files in one location:)

Use e.g. combine remote to achieve it without copying single file.

Wow, this looks great! Thank you, I was unaware of the combine remote. How does this differ from a Union remote?

I'm assuming if I suspect I have a file in potentially 20 different directories, I'll need all 20 of those to be in my combine remote for this to work relatively well. Yes? Am I misunderstanding how combine works?

There is no upper limit of number of directories really but common sense.

Combine all your sources into one remote and then dedupe it using rclone.

Test with --dry-run before.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.