Will dedupe work like this?

Will dedupe work like this?

I did not fill form since this is general question

But I am running windows with rclone-v1.65.2

So I have 2 main folders inside Google TD

Clean
Spaces

Inside those folders are thousands of folders all exactly 1 depth, meaning there are no nested folders

So like this
Clean\Folder1\Files
Clean\Folder2\Files

Spaces\Folder1\Files
Spaces\Folder2\Files

My goal is to KEEP all files inside Clean and ONLY remove duplicate files inside Spaces folders where the MD5 matches files inside Clean

I am trying to clarify whether rclone will go through folders in alphabetical order the way I see them.

I notice when I do moves with rclone it can skip around, which is not an issue when moving, but my concern is dedupe not do the same.

Is this the right cmd for rclone for what I want it to do?

rclone -vvP --fast-list  --tpslimit=100 --drive-use-trash=false dedupe --by-hash Remote:/ --dedupe-mode first

Meaning it would iterate all folders and files inside Clean\ then only delete the ones with matching MD5 in Spaces?

As per docs:

dedupe considers files to be identical if they have the same file path and the same hash.

So you can not dedupe across different paths.

Crap is not what I thought, I totally missed that!

I guess the only option then is doing this in rclone

rclone lsf --recursive Remote:/Clean > .Remote-Clean.txt --format hi --separator ","
rclone lsf --recursive Remote:/Spaces > .Remote-Spaces.txt --format hi --separator ","

At least then I can compare Clean to Spaces hash then have to use Google API to purge the dupe hash fileid.

A bit more work, but it will do it.

I would only call hi since I do not care about file names, just deduping the spaces dupe hash when Clean hash exists

A lot more manual work, and more tools to do it.

Unless you know a better solution.

I would go similar path I think. And if repetitive task I would spend some time to automate it as much as possible.

Having full list of files with hashes I think should be relatively easy to process by script and list dups. Some work first time, pressing enter only the next.

My solution is I have all my remotes inside excel which auto generates my rclone cmd lines so I can simply copy and paste.

To dedupe I use emeditor as it will compare 2 files based on MD5 and only give a report where there are matches on the MD5 value

Then is matter of copying the GDriveID from results and paste into tool that calls Google API to purge the FileID's

Is the best automation I have come up with.

I am not aware of rclone having ability to purge by fileid, I wish it could that would actually be a little easier if it did

1 Like

might be easier to use rclone mount and then run whatever dedupe tool you want.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.