Moving files from one Google Workspace Shared Drive to another. Possible bug?

What is the problem you are having with rclone?

Hi everyone! I am trying to copy a huge amount of files from one Google Workspace Shared Drive to another.

I have done this in the past and it normally works. However I am seeing a strange behavior with google sheet, docs, slides, etc. files

Sorry for redacting some info, I had to as it is confidential.

This is the view from the web browser of a given folder:

Notice at the very bottom two "Stargardt_Summary" files. One with the Excel icon, one with the GSheets icon. Notice one has "-" as file size.

This is what I see using rclone lsl:

Notice both are named identical, and one of them has a size of "-1"

I don't understand what is this. Does anybody know?

When I try to copy the whole folder with rclone copy, this is what I get:

Notice the "NOTICE" rclone message. It says it is duplicate. But I am not really sure.

The problem I am having is that I have to move a hunge amount (maybe 100k files+folders) from one Shared Drive to other, and I cannot micromanage this. I am afraid there might be some unconsistencies.

Can anyone please help?
Thanks in advance!!

Run the command 'rclone version' and share the full output of the command.

rclone v1.61.1

  • os/version: darwin 11.6.7 (64 bit)
  • os/kernel: 20.6.0 (x86_64)
  • os/type: darwin
  • os/arch: amd64
  • go/version: go1.19.4
  • go/linking: dynamic
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy ClinicalTSD:"1_Projects/REDACTED NAME" . --verbose

The rclone config contents with secrets removed.

[ClinicalTSD]
type = drive
client_id = {REDACTED}
client_secret = {REDACTED}
scope = drive
token = {"access_token":"{REDACTED}}
team_drive = {REDACTED}
root_folder_id =

A log from the command with the -vv flag

Sorry, I need to put this in redacted mode thus an image

I tried also this:

laptop:/Users/user/Downloads/test $ rclone md5sum ClinicalTSD:"1_Projects/{REDACTED}/Stargardt_Summary .xlsx"
                                  Stargardt_Summary .xlsx
cd541abce34fdd84eb66464d6f070f17  Stargardt_Summary .xlsx

Notice how even if I am providing only one file name on the command line, rclone lists two, and only for one of them it can generate a md5sum.

More information. AsI keep coming up with new debug ideas.
When I open the two files in the web browser, they are different. They have different content and different URLS. Also different edit dates.

--

The content of the files is also different, thus they are not duplicated. The NOTICE message I think is not right.

Might this be a bug of rclone?

Hi Marcus,

You are right, Google Drive does indeed allow two files with the same name having different content.

You can find more info here:
https://rclone.org/drive/#duplicated-files
https://rclone.org/drive/#rclone-appears-to-be-re-copying-files-it-shouldn-t

and a command to fix it here:
https://rclone.org/commands/rclone_dedupe/

1 Like

Hello Ole.

Thank you so much for your answer. I am checking the links. I think it is not 100% accurate though. Let me explain, as I tried to make this reproducible and as simple as possible:

  • Upload a .xlsx file to drive.google.com web browser interface into a shared drive
  • Double click on it. It opens up in Google Sheets. Then menu File -> Save as Google Sheets
  • Then I edit one of them to make the content different.
  • Then on the Shared Drive I have two files named identical, but have different content, this is rclone output
laptop:/Users/user $ rclone lsl testSD:/test2
       -1 2023-01-20 16:14:15.447000000 printers.xlsx
     9606 2020-07-02 09:46:02.000000000 printers.xlsx
laptop:/Users/user $ rclone md5sum testSD:/test2
                                  printers.xlsx
9b8924563cf1272cee36dabc388eb84b  printers.xlsx

If now I then try to copy the whole test2 folder to my local drive, it only copies one of the two.

Dedupe seems partially useful, as it allows not deletion but renaming. However, all my users will get crazy if I rename hundreds of their work files. This is a dangerous approach, maybe the file names are even listed or hardcoded in other of their work files. Seems very unelegant to me.

Edit: I am using --dedupe-mode list to find out how many cases of this are. Maybe it is not hundreds but just a few. Sorry for the alarm.

I think this is a mistake/bug on rclone output messages for "copy". The files are not duplicated, they just have duplicated names. This can induce to confusion and potential data loss.

2023/01/20 12:01:39 NOTICE: Stargardt_Summary .xlsx: Duplicate object found in source - ignoring

It is not a duplicate object, it is a duplicate name for 2 different files.

rclone output messages for dedupe seem to get it right :

2023/01/20 16:42:47 NOTICE: {REDACTED}/Stargardt_Summary .xlsx: Found 2 files with duplicate names

Most modern OS'es don't allow for duplicate file names.

Some providers (Google) do allow for duplicate file names.

If you want to use rclone, you can't have duplicate file names.

You'd have to decide if you want to dedupe them and use rclone or not dedupe them and you can't really use rclone as you'll get a mixed bag of results with duplicates.

Hello Animosity

rclone should be able to copy from one Google Shared Drive to another. Why does it matter what OS am I using? I don't want to copy to my local disk.

I think you are missing the point.

rclone doesn't allow duplicates.

Google Drive does.

If you want to use rclone, you have to dedupe. That's it.

I don't get you, sorry.

I thought rclone usually adapts to the particularities of each provider and finds a workaround around them. It is a swiss knife.

And besides that, there is another point I see problematic: The messages can lead to confusion to users. If rclone claims something is a duplicate, while it is not, it can lead to data loss.

If you have two files in the same directory with the same name, that's a duplicate.

Rclone can't handle duplicates so you have to run rclone dedupe.

If your use case requires duplicate file names, you can't use rclone.

What about a feature request to allow that between remotes that allow duplicate file names with different content? That would be very useful.

And if I have two files in the same directory with different names but same content/hash? Is that a duplicate also? It is clearly two different things.

Doubtful, but feel free to submit it as a suggestion.

A duplicate is the same name as the content is irrelevant. You can dedupe based on hashes as another form of de-duping but in terms of the discussion we're having with Google, a duplicate is the same file name.

[felix@gemini test]$ rclone ls GD:Dupes
      601 hosts
      601 hosts

and

[felix@gemini test]$ rclone copy GD:Dupes /home/felix/test -vv
2023/01/20 11:26:25 DEBUG : Setting --config "/opt/rclone/rclone.conf" from environment variable RCLONE_CONFIG="/opt/rclone/rclone.conf"
2023/01/20 11:26:25 DEBUG : rclone: Version "v1.61.1" starting with parameters ["rclone" "copy" "GD:Dupes" "/home/felix/test" "-vv"]
2023/01/20 11:26:25 DEBUG : Creating backend with remote "GD:Dupes"
2023/01/20 11:26:25 DEBUG : Using config file from "/opt/rclone/rclone.conf"
2023/01/20 11:26:25 DEBUG : Creating backend with remote "/home/felix/test"
2023/01/20 11:26:25 NOTICE: hosts: Duplicate object found in source - ignoring
2023/01/20 11:26:25 DEBUG : Local file system at /home/felix/test: Waiting for checks to finish
2023/01/20 11:26:25 DEBUG : Local file system at /home/felix/test: Waiting for transfers to finish
2023/01/20 11:26:25 DEBUG : hosts: md5 = 8d955837212e82c38afc5b39b341d7c4 OK
2023/01/20 11:26:25 INFO  : hosts: Copied (new)
2023/01/20 11:26:25 INFO  :
Transferred:   	        601 B / 601 B, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         0.7s

2023/01/20 11:26:25 DEBUG : 5 go routines active

I haven't checked in detail, but seems like you have a valid point with respect to the wording of the NOTICE/WARNING.

Do you know any tools handling this and the entire copy process more elegantly?

You can always make a feature request here in the forum, but I doubt it will get implemented unless you back it as sponsor or developer.

How can it be irrelevant if it can induce to data loss? If you trust blindly rclone copy messages that say it is a duplicate and have been skipped and it is not really a duplicate content but just duplicate name, you are going to loose data the day you delete the original files where you copied them from.

All I am saying is that the message can induce to confusion. I believe 99% of the people appreciate more the content of the files than their file names.

But hey, don't want to make a huge deal out of it. This is just how I see it.

Thanks anyway for the help, really :slight_smile: dedupe option is going to be very helpful to me.

There's no 'data loss' as you don't get a file copied or moved as it'll remain in the source.

[felix@gemini ~]$ rclone copy GD:Dupes /home/felix/test
2023/01/20 11:29:31 NOTICE: hosts: Duplicate object found in source - ignoring

Probably should be an ERROR.

Sorry, didn't want to sound rude. And no, I don't know any.

1 Like

Agree, without having checked the code and thought about all the details/exceptions. Something like:

ERROR: Stargardt_Summary .xlsx: Duplicate folder/file name. Use rclone dedupe to find and fix duplicate names. 

What happens if you do a move? will it delete 1 or 2 files?

@Marcus1 Would that have been more helpful? Suggestions to improve the wording?

It skips the dupes / ignores it.

[felix@gemini ~]$ rclone move GD:Dupes /home/felix/test -vv
2023/01/20 11:49:43 DEBUG : Setting --config "/opt/rclone/rclone.conf" from environment variable RCLONE_CONFIG="/opt/rclone/rclone.conf"
2023/01/20 11:49:43 DEBUG : rclone: Version "v1.61.1" starting with parameters ["rclone" "move" "GD:Dupes" "/home/felix/test" "-vv"]
2023/01/20 11:49:43 DEBUG : Creating backend with remote "GD:Dupes"
2023/01/20 11:49:43 DEBUG : Using config file from "/opt/rclone/rclone.conf"
2023/01/20 11:49:44 DEBUG : Creating backend with remote "/home/felix/test"
2023/01/20 11:49:44 NOTICE: hosts: Duplicate object found in source - ignoring
2023/01/20 11:49:44 DEBUG : Local file system at /home/felix/test: Waiting for checks to finish
2023/01/20 11:49:44 DEBUG : hosts: Size and modification time the same (differ by 0s, within tolerance 1ms)
2023/01/20 11:49:44 DEBUG : hosts: Unchanged skipping
2023/01/20 11:49:44 INFO  : hosts: Deleted
2023/01/20 11:49:44 DEBUG : Local file system at /home/felix/test: Waiting for transfers to finish
2023/01/20 11:49:44 INFO  : There was nothing to transfer
2023/01/20 11:49:44 INFO  :
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Checks:                 2 / 2, 100%
Deleted:                1 (files), 0 (dirs)
Elapsed time:         0.9s

2023/01/20 11:49:44 DEBUG : 4 go routines active
[felix@gemini ~]$ rclone ls GD:Dupes
      601 hosts

Makes good sense, thanks!