How does copy handle the same file in different folders in a Google Drive?

Someone asked a related question ("How does rclone determine duplicate files") a while back and one of the responses they got was:

A duplicate file would be the same size/modification time.

To me this suggests that if I had a 10kB file called, say, "A1 Response" in subfolder "A1" and exactly the same file in subfolder "Submitted Assignments", rclone would refuse to download both of them??

I am desperately short of time to back up a 200GB Google Drive that my university will delete due to a change in policy (from "unlimited in perpetuity for free" to, well, not), and I have an extremely nested Google Drive where I often have the same file in multiple different subfolders. This was a very deliberate, and time consuming, archival strategy, so if rclone does download (via copy) only one of the "A1 Response" files, it's going to be a problem for me.

I did a little test where I created a two new subfolders and put the same image in them both, uploaded one at a time, and then told rclone to copy the subfolder and the result was that I got both (identical) files... which is what I want. However, I go back to that earlier question and I think "did I get both files because I couldn't upload the file to each subfolder simultaneously (bad) or because the files were in different folders (good)?"

I guess a part of my question is about how precise the judgement of time is. When I uploaded everything to the Google Drive in the first place I sort of did it en masse... arranging everything in File Explorer and then dragging and dropping the folders. In principle, then, the same file in different subfolders could've been uploaded within the same minute or second as each other. I also tested this by putting my arbitrary image in "Submain 1" and "Submain 2" and then putting those folders in "Main" and uploading it to a Test folder in the Google Drive. I then used rclone to download Test, where I again successfully had both/all four (identical) image files.

rclone will download both of them
after running rclone sync, the dest will be a mirror of the source, every file, folder by folder

for example
rclone sync gdrive: /path/to/local/dir -v --dry-run

and that link, was about uploading, and you are downloading.
really, not comparable to your situation.

There is a bit of context as it needs all the rest to make sense as Google Drives allow for duplicate files in the same folder which a normal OS / mount generally does not.

and something like:

etexter@seraphim ~ % rclone ls GD:test
       -1 testsheet.xlsx
       -1 testsheet.xlsx
etexter@seraphim ~ % rclone copy GD:test t -vv
2022/10/03 15:20:13 DEBUG : rclone: Version "v1.59.1" starting with parameters ["rclone" "copy" "GD:test" "t" "-vv"]
2022/10/03 15:20:13 DEBUG : Creating backend with remote "GD:test"
2022/10/03 15:20:13 DEBUG : Using config file from "/Users/etexter/.config/rclone/rclone.conf"
2022/10/03 15:20:14 DEBUG : Google drive root 'test': 'root_folder_id = 0AGoj85v3xeadUk9PVA' - save this in the config to speed up startup
2022/10/03 15:20:14 DEBUG : Creating backend with remote "t"
2022/10/03 15:20:14 DEBUG : fs cache: renaming cache item "t" to be canonical "/Users/etexter/t"
2022/10/03 15:20:14 NOTICE: testsheet.xlsx: Duplicate object found in source - ignoring
2022/10/03 15:20:14 DEBUG : Local file system at /Users/etexter/t: Waiting for checks to finish
2022/10/03 15:20:14 DEBUG : Local file system at /Users/etexter/t: Waiting for transfers to finish
2022/10/03 15:20:15 DEBUG : Local file system at /Users/etexter/t: File to upload is small (4691 bytes), uploading instead of streaming
2022/10/03 15:20:15 DEBUG : testsheet.xlsx: md5 = 620c55f93bc6c17de6849c7d64385d07 OK
2022/10/03 15:20:15 INFO  : testsheet.xlsx: Copied (new)
2022/10/03 15:20:15 DEBUG : testsheet.xlsx: Updating size of doc after download to 4691
2022/10/03 15:20:15 DEBUG : testsheet.xlsx: Src hash empty - aborting Dst hash check
2022/10/03 15:20:15 INFO  : testsheet.xlsx: Copied (Rcat, new)
2022/10/03 15:20:15 INFO  :
Transferred:   	    9.162 KiB / 9.162 KiB, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         1.3s

2022/10/03 15:20:15 DEBUG : 11 go routines active

That's a duplicate in the context I was speaking. The same file in two different might be a 'duplicate' but not the duplicate I was speaking to.

if that is correct, then, my advice, sooner than later run something like
rclone sync gddrive: /path/to/dest --retries=1 --log-level=DEBUG --log-file=/path/to/log.txt

if there are no duplicate files in the same dir, then the rclone sync should work the first time.

if there are a some duplicates to deal with, that would be in the log file.

Why sync instead of copy?

Okay so just to make sure I understand you and the other things I've read, you're saying that you're only referring to duplicated files within the same folder, e.g. your two testsheet files?

And, per other places, if I have this problem then this is why and when I would want to run dedupe, right?

after running rclone sync, the dest will be a mirror of the source, every file, folder by folder

Won't that mean if I put everything on an external hard drive and I delete everything in the Google Drive, that I'll lose everything on the drive too?

Sorry, as you've probably guessed from the below, I'm a bit in over my head here...

and that link, was about uploading, and you are downloading.

Wait, really? How did I miss that? Derp.

no, as rclone sync does not modify the source.
Make source and dest identical, modifying destination only.

1 Like

If you

  1. sync or copy from Google Drive to an empty external drive
  2. delete everything in Google Drive
  3. do nothing else

then your external drive will be an exact copy of your Google Drive!

If you

  1. sync or copy from Google Drive to an empty external drive
  2. delete everything in Google Drive
  3. sync from Google Drive to external drive

then your external drive will be an exact copy of your (emptied) Google Drive, that is empty!

If you

  1. sync or copy from Google Drive to an empty external drive
  2. delete everything in Google Drive
  3. copy from Google Drive to external drive

then your external drive will be an exact copy of your initial Google Drive!

Sounds like you want to use rclone copy going forward, copy is the safe choice if in doubt.

1 Like

you stated your goal is to "desperately short of time to back up"
only rclone sync can be trusted for backups.

should not try to use rclone copy for backups.
i could offer some long-winded examples and edge cases, to prove that.
but given the small amount of data, your hard limit on time and that this is your first post in the forum,

be safe, using rclone sync for backups, and
as mentioned up above, test first using --dry-run

Are you saying that copy might leave some files behind?

tl;dr, use rclone sync

no, i am not saying that.
rclone copy will not leave files behind.

if the dest is empty and run rclone copy, the dest will an exact mirror/backup the sync

if the dest is NOT empty and run rclone copy, then that strategy is no longer valid
for example, if you delete/move a source file, then rclone copy will not delete the corresponding dest file.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.