How to sync all files from Google Drive even if duplicate paths?

dalan · March 26, 2024, 2:20pm

What is the problem you are having with rclone?

I'm looking to sync all files from Google Drive to a local directory even if the objects share a path (aka: "Duplicate object found in source - ignoring").

Is there an option to just append to the path in the destination directory with the hash of the file itself? That way future syncs are possible and changes are observable. If all 2..N files are the same, then they would result in just a single outputted file in the target directory.

E.g.

gdrive_source
  /file1.txt
  /file1.txt
target
  /file1.txt.md5hash1
  /file1.txt.md5hash2

#### Run the command 'rclone version' and share the full output of the command.

rclone version
rclone v1.66.0

os/version: ubuntu 22.04 (64 bit)
os/kernel: 5.15.0-100-generic (x86_64)
os/type: linux
os/arch: amd64
go/version: go1.22.1
go/linking: static
go/tags: none


####  Which cloud storage system are you using? (eg Google Drive)

Google Drive is my source 

#### The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

rclone --max-size 1G --config /home/userconfig/rclone/rclone.conf sync gdrive:/ . --progress



#### Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.
<!--  You should use 3 backticks to begin and end your paste to make it readable.   -->

rclone --config /home/user/.config/rclone/rclone.conf config redacted
[gdrive]
type = drive
client_id = XXX
client_secret = XXX
scope = drive.readonly
token = XXX
team_drive =

Double check the config for sensitive info before posting publicly




#### A log from the command that you were trying to run with the `-vv` flag  
<!-- You should use 3 backticks to begin and end your paste to make it readable.  Or use a service such as https://pastebin.com or https://gist.github.com/   -->

N/A?

Animosity022 · March 26, 2024, 2:25pm

You'd have to dedupe it first:

rclone dedupe

There are a few options in terms of deduplicating based on your needs.

asdffdsa · March 26, 2024, 2:36pm

welcome to the forum,

i think that would need scripting.
could use --suffix and --suffix-keep-extension

and instead of hash, might use datetime, something like

rclone copy source: dest: --suffix=`date +%Y%m%d.%I%M%S` --suffix-keep-extension

dalan · March 26, 2024, 2:36pm

You are a hero to keep replying to dense folks like myself to "just use dedupe like the Lord intended" as I've seen your reply in other threads.

This modifies the source though, is there no way to only modify the output at the target?

kapitainsky · March 26, 2024, 3:11pm

Isn't dedupe the most obvious solution? This is less or more exactly reason it was implemented.

There might be myriad other ways to achieve it like your idea with attaching hashes but IMO dedupe fulfils requirement of the most users in this situation. Sure there might be some special, niche cases when other approach would be more appropriate but then we are talking about very custom implementation benefiting very few users, maybe only you - for sure doable. But most likely either it has to to come from your in form of PR or sponsorship.

Animosity022 · March 26, 2024, 3:16pm

It is the right tool for the job.

Not that I'm aware of.

You get a random file though if you have duplicates in the source. I would avoid that.

asdffdsa · March 26, 2024, 3:16pm

another flag that might be helpful

rclone sync /home/user01/source target:full --backup-dir=target:incrementals/`date +%Y%m%d.%I%M%S` -vv --dry-run

Animosity022 · March 26, 2024, 3:50pm

If you have dupes, you will get randomish results as it depends on how the API gives you the 'first' file back.

Running any command with dupes will have some chaos in there so you really, really should dedupe your data.

texter@Earls-Mac-mini test % rm hosts.txt
texter@Earls-Mac-mini test % rclone copy GD:dupes /Users/texter/Downloads/test
2024/03/26 11:49:11 NOTICE: hosts.txt: Duplicate object found in source - ignoring
texter@Earls-Mac-mini test % cat hosts.txt
second file%
texter@Earls-Mac-mini test % rclone ls GD:dupes
       11 hosts.txt
       10 hosts.txt
texter@Earls-Mac-mini test % rclone lsl GD:dupes
       11 2024-03-26 11:47:36.170000000 hosts.txt
       10 2024-03-26 11:47:20.751000000 hosts.txt

I seem to always get the 'newest' file. Is that the right one though?I ran that command 50 times and got the 2nd file every time. First file would be 'gone/lost'.

system · April 25, 2024, 3:51pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.