Understanding the implications of using dedupe on a massive directory

CraftyClown · April 9, 2019, 6:25pm

Hi all,

So I have a google drive folder called clients that I use for archival purposes (old projects etc) These are incredibly large as I work in video production. I have recently started using Rclone to back these projects up and I noticed the other day that some of them seem to have duplicate files appearing in them. As this is video, we are talking about 250 gig projects turning into 500 gig projects etc, so this could quickly become an issue if I don’t manage it.

So the obvious solution is to use dedupe, however I want to be incredibly careful setting in motion a destructive process on my client archive!

What I’m wondering is how exactly dedupe works when dealing with very large directories? Will it take into consideration dupes it finds anywhere within that structure? There are a few projects I have worked on where the same files may have been used intentionally, so deleting those files would be a serious problem, however I need a solution that doesn’t involve me having to regularly manage the dedupe checks. I want to set it and forget it with a cron job.

I have a bad feeling I may be stuck between a rock and a hard place with this. If I want set and forget then I will lose important files and if I carefully manage the deduping then it will be quite time consuming.

I’d love to be proved wrong right now

EDIT:

I’m also wondering what exactly is compared to decide identical files? Is it name and size? The help file mentions that need to have the same md5sum, but I’m not quite sure what that constitutes. It is quite possible I will have some folders with camera files that have identical names, but different files sizes, so I want to make sure they aren’t deleted.

calisro · April 9, 2019, 7:40pm

It only looks for same files in the same path.

It simply looks for two files in the same exact path with the same name. So it won’t affect other files. You can tell it in a batch manner what you’d like to keep (newer, bigger, etc).

I run this batch and have it keep the newest version.

CraftyClown · April 9, 2019, 8:13pm

Ok so that’s great news, thank you

system · July 8, 2019, 8:13pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.