Moving files from one Google Workspace Shared Drive to another. Possible bug?

Doubtful, but feel free to submit it as a suggestion.

A duplicate is the same name as the content is irrelevant. You can dedupe based on hashes as another form of de-duping but in terms of the discussion we're having with Google, a duplicate is the same file name.

[felix@gemini test]$ rclone ls GD:Dupes
      601 hosts
      601 hosts

and

[felix@gemini test]$ rclone copy GD:Dupes /home/felix/test -vv
2023/01/20 11:26:25 DEBUG : Setting --config "/opt/rclone/rclone.conf" from environment variable RCLONE_CONFIG="/opt/rclone/rclone.conf"
2023/01/20 11:26:25 DEBUG : rclone: Version "v1.61.1" starting with parameters ["rclone" "copy" "GD:Dupes" "/home/felix/test" "-vv"]
2023/01/20 11:26:25 DEBUG : Creating backend with remote "GD:Dupes"
2023/01/20 11:26:25 DEBUG : Using config file from "/opt/rclone/rclone.conf"
2023/01/20 11:26:25 DEBUG : Creating backend with remote "/home/felix/test"
2023/01/20 11:26:25 NOTICE: hosts: Duplicate object found in source - ignoring
2023/01/20 11:26:25 DEBUG : Local file system at /home/felix/test: Waiting for checks to finish
2023/01/20 11:26:25 DEBUG : Local file system at /home/felix/test: Waiting for transfers to finish
2023/01/20 11:26:25 DEBUG : hosts: md5 = 8d955837212e82c38afc5b39b341d7c4 OK
2023/01/20 11:26:25 INFO  : hosts: Copied (new)
2023/01/20 11:26:25 INFO  :
Transferred:   	        601 B / 601 B, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         0.7s

2023/01/20 11:26:25 DEBUG : 5 go routines active

I haven't checked in detail, but seems like you have a valid point with respect to the wording of the NOTICE/WARNING.

Do you know any tools handling this and the entire copy process more elegantly?

You can always make a feature request here in the forum, but I doubt it will get implemented unless you back it as sponsor or developer.

How can it be irrelevant if it can induce to data loss? If you trust blindly rclone copy messages that say it is a duplicate and have been skipped and it is not really a duplicate content but just duplicate name, you are going to loose data the day you delete the original files where you copied them from.

All I am saying is that the message can induce to confusion. I believe 99% of the people appreciate more the content of the files than their file names.

But hey, don't want to make a huge deal out of it. This is just how I see it.

Thanks anyway for the help, really :slight_smile: dedupe option is going to be very helpful to me.

There's no 'data loss' as you don't get a file copied or moved as it'll remain in the source.

[felix@gemini ~]$ rclone copy GD:Dupes /home/felix/test
2023/01/20 11:29:31 NOTICE: hosts: Duplicate object found in source - ignoring

Probably should be an ERROR.

Sorry, didn't want to sound rude. And no, I don't know any.

1 Like

Agree, without having checked the code and thought about all the details/exceptions. Something like:

ERROR: Stargardt_Summary .xlsx: Duplicate folder/file name. Use rclone dedupe to find and fix duplicate names. 

What happens if you do a move? will it delete 1 or 2 files?

@Marcus1 Would that have been more helpful? Suggestions to improve the wording?

It skips the dupes / ignores it.

[felix@gemini ~]$ rclone move GD:Dupes /home/felix/test -vv
2023/01/20 11:49:43 DEBUG : Setting --config "/opt/rclone/rclone.conf" from environment variable RCLONE_CONFIG="/opt/rclone/rclone.conf"
2023/01/20 11:49:43 DEBUG : rclone: Version "v1.61.1" starting with parameters ["rclone" "move" "GD:Dupes" "/home/felix/test" "-vv"]
2023/01/20 11:49:43 DEBUG : Creating backend with remote "GD:Dupes"
2023/01/20 11:49:43 DEBUG : Using config file from "/opt/rclone/rclone.conf"
2023/01/20 11:49:44 DEBUG : Creating backend with remote "/home/felix/test"
2023/01/20 11:49:44 NOTICE: hosts: Duplicate object found in source - ignoring
2023/01/20 11:49:44 DEBUG : Local file system at /home/felix/test: Waiting for checks to finish
2023/01/20 11:49:44 DEBUG : hosts: Size and modification time the same (differ by 0s, within tolerance 1ms)
2023/01/20 11:49:44 DEBUG : hosts: Unchanged skipping
2023/01/20 11:49:44 INFO  : hosts: Deleted
2023/01/20 11:49:44 DEBUG : Local file system at /home/felix/test: Waiting for transfers to finish
2023/01/20 11:49:44 INFO  : There was nothing to transfer
2023/01/20 11:49:44 INFO  :
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Checks:                 2 / 2, 100%
Deleted:                1 (files), 0 (dirs)
Elapsed time:         0.9s

2023/01/20 11:49:44 DEBUG : 4 go routines active
[felix@gemini ~]$ rclone ls GD:Dupes
      601 hosts

Makes good sense, thanks!

Although if you do a few cycles, you'd lose a file.

If I move again, the dupe isn't there and my dupe overwrites my destination so the original file is gone I suppose.

My use case is always on a OS so I never hit dupes even though they are allowed. To get a dupe, you'd have to do it with some purpose via the WebUI generally.

Ideally, rclone would detect if it is a duplicate name or name+content and word the error accordingly.

I don't think there is any way to do that right now, as you can see in my examples for some files rclone cannot produce an md5sum, nor the file has a size (it is -1). Therefore no way to know for rclone, as it is now.

So something like:

ERROR: Stargardt_Summary .xlsx: Duplicate folder/file name. Content might differ. Use rclone dedupe to find and fix duplicate names. 

Might not be the perfect wording, but for sure catches attention and forces the rclone user to go a bit deeper and understand what is really going on.

My 2 cents.

Thanks, I like your addition, makes it clearer. I will see if it can be easily changed one of the coming days.

1 Like

I can try to give a suggestion also for the wording of:
https ://rclone.org/drive/#duplicated-files

I mean, it took me a full day to figure all this out, why not bring it back to the community.

Would this be the right place?

Yeah, any wording or things like this, we can generally make those changes as I've seen the duplicate file thing for a few years now so it's all very up front to me so the wording makes sense.

If you have any updates or whatnot, you can either share them here, or even there are a number of ways to submit the change and update it.

That's the best part of open source as everyone can help on it too. I'm not a developer at all so I try to help/ask/answer as I've done IT for too many years..

That would be great!

I suggest you create a new topic of type Feature with a reference to this thread, that will give it the right attention.

One way to do this with minimal overhead is to simply append a GUID to each duplicated file. So if Rclone finds three copies of sample.txt, it would copy them like this:

  • sample.txt
  • sample_$GUID1.txt
  • sample_$GUID2.txt

That behavior is probably not for everyone, so it would be desirable to be on a flag, but it would eliminate duplicates as a problem in scenarios like mine, where running dedup is not feasible.

1 Like

I like that idea.

I don't think the naming is the primary challenge in handling duplicates.

It is because rclone like 99.99% of all programs is built on the core assumption that path and filename uniquely identifies a file. Think about a command like this:

fileHandle = openFile("hello.txt")

Most programmers would expect it to open a single file and return the corresponding fileHandle. It would require quite some reengineering in the core of rclone to make it treat fileHandle an array of fileHandles. The next questions after doing that is how to compare (when some backends don't support hashes etc.), how to list, etc. etc.

This GitHub issue illustrates how apparently simple and quick fixes like renaming in the backend suddenly opens for a lot of other questions and difficulties:

I am not saying it is impossible, just that it would require substantial sponsorship or development contribution.

It may be less demanding to make a dedicated Google to Google copy/migration tool from the ground.

1 Like

I have made a feature proposal here:
https://forum.rclone.org/t/improved-messages-on-duplicate-files-during-copy-sync-move/35696

Saw it. That is great! Let's see what the others think about it. It is going to be interesting :slight_smile:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.