Is there anyway to feed rclone dedupe two different directories and have the dedupe process treat those two different remote locations as one single location for the dedupe process without actually moving or merging them first?
This would be useful for example to dedupe two backups that are similar but that I do not wish to merge entirely for whatever reason. I had assumed it was a feature rclone already had? But I cannot find documentation on how it would occur.
I suppose I could create a temporary directory and then move both the directories I wish to dedupe into it, BUT this won’t work as it requires the file paths to be identical in addition to just name/size/hash…
Actually for my purposes the requirement for a match of file path will ruin things entirely. Even with a mount. I just realized that these two remote backups have similar files but totally different file path subfolder organization.
So, is there a way to tell dedupe to ignore file path matches and only match based on name/size alone? That’s actually what I need. No amount of mounting will fix the fact these two backup destinations have totally different subfolder structures
Actually what I really want is to fully flatten some remote directory and dedupe the flattened list of files. By flattened I mean treat all the files as being in the same path/directory even though that is false. In other words take all the files and run dedupe on them, treating them as all having the same exact path. This would catch all the files with same name/size that I had foolish organized incorrectly into different and nonsensical folder structures.
A mistake I have already made, with no real way to solve other than running a fully flattened dedupe like I am describing in this thread, or spending 100s of hours checking each file by hand or some such
dupeguru is using 12gigabytes of ram. This is clearly not a mere filename/size dedupe! Even though that is what I told it to do. I told it not to use hashes. The full lsl of these mounts would only be around 100-200megabytes. Using 12gigabytes of ram clearly means it is making hashes.
Can anyone suggest a dupeguru alternative?
If I ever did this again, with slightly more files, I’d run out of ram. (Thankfully at this point dupeguru claims to be 97% done.
dupeguru went from 97% done to 98% done and is now using 28gb of ram.
I hit cancel to try and get maybe partial results. now it is using 31gb of ram and my system is lagging.
Is dupeguru hashing these files? I didn’t tell it to. But what else is it? Maybe the fact it is a GUI? and displaying too much?
EDIT: tried dupeguru again and “Ran dupeguru again for 2hours straight. Quickly hit 22000 dupes and 98% complete using 28gb of ram then quit on me.”
Can anyone recommend me a local dedupe tool that can dedupe 400,000 files across many folders using rclone mount to host that data locally?
dupeguru seems to be able to try, but I run out of ram. I even closed dupeguru and reopened it and somehow it is using 38gb of ram now. This is just not going to work.
Mounting the folders worked great, once I found a proper dedupe tool to run locally on my machine. There are a LOT of bad ones out there but I can recommend czkawka based stuff at least.
Deleting files from the mounted drive is much much much slower than rclone delete, it’ll take hours not minutes. I wonder if this is because I chose –network-mode or is it always like this for large changes to an rclone mount?