How to sync all files from Google Drive even if duplicate paths?

What is the problem you are having with rclone?

I'm looking to sync all files from Google Drive to a local directory even if the objects share a path (aka: "Duplicate object found in source - ignoring").

Is there an option to just append to the path in the destination directory with the hash of the file itself? That way future syncs are possible and changes are observable. If all 2..N files are the same, then they would result in just a single outputted file in the target directory.

E.g.

gdrive_source
  /file1.txt
  /file1.txt
target
  /file1.txt.md5hash1
  /file1.txt.md5hash2

#### Run the command 'rclone version' and share the full output of the command.

rclone version
rclone v1.66.0

  • os/version: ubuntu 22.04 (64 bit)
  • os/kernel: 5.15.0-100-generic (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.22.1
  • go/linking: static
  • go/tags: none

####  Which cloud storage system are you using? (eg Google Drive)

Google Drive is my source 

#### The command you were trying to run (eg `rclone copy /tmp remote:tmp`)  


rclone --max-size 1G --config /home/userconfig/rclone/rclone.conf sync gdrive:/ . --progress



#### Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.
<!--  You should use 3 backticks to begin and end your paste to make it readable.   -->

rclone --config /home/user/.config/rclone/rclone.conf config redacted
[gdrive]
type = drive
client_id = XXX
client_secret = XXX
scope = drive.readonly
token = XXX
team_drive =

Double check the config for sensitive info before posting publicly




#### A log from the command that you were trying to run with the `-vv` flag  
<!-- You should use 3 backticks to begin and end your paste to make it readable.  Or use a service such as https://pastebin.com or https://gist.github.com/   -->

N/A?

You'd have to dedupe it first:

rclone dedupe

There are a few options in terms of deduplicating based on your needs.

1 Like

welcome to the forum,

i think that would need scripting.
could use --suffix and --suffix-keep-extension

and instead of hash, might use datetime, something like

rclone copy source: dest: --suffix=`date +%Y%m%d.%I%M%S` --suffix-keep-extension
1 Like

You are a hero to keep replying to dense folks like myself to "just use dedupe like the Lord intended" as I've seen your reply in other threads.

This modifies the source though, is there no way to only modify the output at the target?

Isn't dedupe the most obvious solution? This is less or more exactly reason it was implemented.

There might be myriad other ways to achieve it like your idea with attaching hashes but IMO dedupe fulfils requirement of the most users in this situation. Sure there might be some special, niche cases when other approach would be more appropriate but then we are talking about very custom implementation benefiting very few users, maybe only you - for sure doable. But most likely either it has to to come from your in form of PR or sponsorship.

It is the right tool for the job.

Not that I'm aware of.

You get a random file though if you have duplicates in the source. I would avoid that.

another flag that might be helpful

rclone sync /home/user01/source target:full --backup-dir=target:incrementals/`date +%Y%m%d.%I%M%S` -vv --dry-run

If you have dupes, you will get randomish results as it depends on how the API gives you the 'first' file back.

Running any command with dupes will have some chaos in there so you really, really should dedupe your data.

texter@Earls-Mac-mini test % rm hosts.txt
texter@Earls-Mac-mini test % rclone copy GD:dupes /Users/texter/Downloads/test
2024/03/26 11:49:11 NOTICE: hosts.txt: Duplicate object found in source - ignoring
texter@Earls-Mac-mini test % cat hosts.txt
second file%
texter@Earls-Mac-mini test % rclone ls GD:dupes
       11 hosts.txt
       10 hosts.txt
texter@Earls-Mac-mini test % rclone lsl GD:dupes
       11 2024-03-26 11:47:36.170000000 hosts.txt
       10 2024-03-26 11:47:20.751000000 hosts.txt

I seem to always get the 'newest' file. Is that the right one though?I ran that command 50 times and got the 2nd file every time. First file would be 'gone/lost'.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.