Migrating from Google Photos - delete dupes from Google

What is the problem you are having with rclone?

I'm migrating my photo archive from Google Photos to an S3 service, and I'm using rclone to manage the files. I'm not transferring from Google Photos using rclone, but I've downloaded a Google Takeout with my originals or "high quality" media, and I also have a lot of originals locally.

Now that I have uploaded a lot of files to the S3 service, I want to remove duplicates such that files existing on the S3 service can safely be removed from Google Photos. The dedupe feature looked promising, but it only appears to delete files within a single backend.

What would be the best way to approach this? Looking at rclone's functionality, there's virtual backends called both combine and union. Can I use combine to create a single backend that encompasses both my S3 and the Google Drive, and then run a dedupe by name on that so that I remove files existing on Google Photos if the same file name also exists on the S3 service? I understand that Google Photos can't do match by hash, and my originals might be different size and hash than the files stored on Google Photos.

Run the command 'rclone version' and share the full output of the command.

$ rclone version
rclone v1.66.0
- os/version: ubuntu 22.04 (64 bit)
- os/kernel: 5.15.0-101-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.22.1
- go/linking: static
- go/tags: none

welcome to the forum,

something i did once. with two different remotes that needed to be deduped.

  1. create two rclone mount, one for gdrive, one for s3. now the remotes appear as local directories.
  2. use any dedupe tool that works on local.

Thanks. Yes, looks like this might be the way to go. dedupe requires full path match, so my combine idea won't find any matches by name. But with your suggestion I can automate it as I prefer.

make sure to use --vfs-refresh

It's not obvious from the docs why that is an advantage. What's the drawback if I forget it?

      --vfs-refresh                            Refreshes the directory cache recursively in the background on start

before rclone mount is live, rclone will pre-scan all the files quickly using the same technique as rclone ls.