Duplicate folders on Google Drive root

What is the problem you are having with rclone?

Rclone produce a lot of duplicate folders at Google Drive root.

I'm trying to sync folders from Box to Google Drive (Google Workplace Shared Drive).
Sync works fine but at some point it start creating a lot of duplicate folder on the shared drive root.

I know it is a known issue as Google Drive API allows for duplicate file name.
The use of rlcone dedupe if recommended in the documentation to deal with it.
Unfortunately it does not suit my needs as i have not been able to find a dedupe strategy keeping the original folder and particularly the original folder id. My folders id are tracked in remote systems and used by APIs.

I've tried to nest it deeper as some comments suggested on the rclone forum.
The folder at the root of the shared drive still get duplicated.

Ex:

MySharedDrive:
    /folder1
        fileA
        fileB
    /folder1
        fileA
        fileB
MySharedDrive:
    /folder1
        /folder2
            fileA
            fileB
    /folder1
        /folder2
            fileA
            fileB

I have also tried to:

  • use copy instead of sync
  • play with the different options: --check-first --checksum, creation date options, etc...

None of them seems to fix it.

Last hope: I will try to use the root_folder_id option with different nesting levels to see if it makes any difference.

I'm willing to invest some time to provide, if possible, a fix.
As I'm new to the rclone code base, could someone provide me with hints on where to start investigating ?

The Google Drive V3 API states that the id of a file is writable.
Could we use that to generate a consistent ID and avoid duplicate folder ?
Could this be a listing issue at the drives root ?

Any help is greatly appreciated.
Best regards

Run the command 'rclone version' and share the full output of the command

In production:

rclone v1.61.1
- os/version: debian 11.6 (64 bit)
- os/kernel: 6.1.13 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.19.4
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

  • Google Drive with Google Workspace enterprise and some Shared drives
  • Box

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync 'box:/FOLDER_A' 'SHARED_DRIVE_NAME:/FOLDER_A' --config=rclone.conf --use-json-log --log-format="date,time,longfile"  --tpslimit 10 --tpslimit-burst 15 --retries 5 --retries-sleep 1s --create-empty-src-dirs --fast-list --header="as-user: 123456789"

The as-user header is specific to Box authentication

The rclone config contents with secrets removed.

[SHARED_DRIVE_NAME]
type = drive
service_account_file = .secrets/gcp_secret_key.json
team_drive = TEAM_DRIVE_ID
impersonate = some.google.workplace.domaine.user@google.fr
[box]
type = box
box_config_file = .secrets/box_sercret_key.json
token = {"access_token":"ACCESS_TOKEN","token_type":"bearer","expiry":"2023-03-08T14:58:44.980Z"}
box_sub_type = enterprise

A log from the command with the -vv flag

I have not been able to identify a log part where it starts creating duplicate folder.
If you have any insight on what i should be looking for, please let me know.

I had to redact all the remotes, drives and files name as they are personal information.

rclone sync 'box:/FOLDER_A' 'SHARED_DRIVE:/' --config=rclone.conf --tpslimit 10 --tpslimit-burst 15 --retries 5 --retries-sleep 1s --create-empty-src-dirs --fast-list --header="as-user: 14492321437" -vv
2023/03/07 19:02:13 INFO  : Starting transaction limiter: max 10 transactions/s with burst 15
2023/03/07 19:02:13 DEBUG : rclone: Version "1.60.1" starting with parameters ["/nix/store/sy0dc99a1gshkpp96sqcabcy74gbgz4r-rclone-1.61.1/bin/rclone" "sync" "box:/FOLDER_A" "SHARED_DRIVE:/" "--config=rclone.conf" "--tpslimit" "10" "--tpslimit-burst" "15" "--retries" "5" "--retries-sleep" "1s" "--create-empty-src-dirs" "--fast-list" "--header=as-user: 123456789" "-vv"]
2023/03/07 19:02:13 DEBUG : Creating backend with remote "box:/FOLDER_A"
2023/03/07 19:02:13 DEBUG : Using config file from "/home/USER/.../rclone.conf"
2023/03/07 19:02:14 DEBUG : fs cache: renaming cache item "box:/FOLDER_A" to be canonical "box:FOLDER_A"
2023/03/07 19:02:14 DEBUG : Creating backend with remote "SHARED_DRIVE:/"
2023/03/07 19:02:15 DEBUG : fs cache: renaming cache item "SHARED_DRIVE:/" to be canonical "SHARED_DRIVE:"
2023/03/07 19:02:26 DEBUG : 05_FILE_NAME.pdf: Size and modification time the same (differ by 0s, within tolerance 1s)
2023/03/07 19:02:26 DEBUG : 05_FILE_NAME.pdf: Unchanged skipping
2023/03/07 19:02:32 INFO  : FOLDER_B/04_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:32 INFO  : FOLDER_B/0_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:33 INFO  : FOLDER_B/01_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:33 INFO  : FOLDER_B/03_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:36 INFO  : FOLDER_B/05_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:37 INFO  : FOLDER_B/05_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:38 INFO  : FOLDER_B/06_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:38 INFO  : FOLDER_B/06_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:42 INFO  : FOLDER_B/06_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:42 INFO  : FOLDER_B/06_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:43 INFO  : FOLDER_B/06_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:43 INFO  : FOLDER_B/06_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:46 INFO  : FOLDER_B/06_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:48 INFO  : FOLDER_B/06_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:48 INFO  : FOLDER_B/06_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:49 INFO  : FOLDER_B/07_FILE_NAME.pdf: Copied (new)
2023/03/07 19:02:52 INFO  : FOLDER_B/08_FILE_NAME.pdf: Copied (new)

[...]

I'm not seeing any duplicates there.

In Google terms, a duplicate is the same file/folder name in the same directory.

So folder1 and folder2's fileAs aren't duplicates in Google's mind.

If you have two fileAs in folder1, that's a duplicate.

etexter@Earls-Mac-mini ~ % rclone ls GD:Dupes
      601 hosts
      601 hosts
etexter@Earls-Mac-mini ~ %
etexter@Earls-Mac-mini ~ %
etexter@Earls-Mac-mini ~ % rclone copy GD:Dupes GD:TestCopy -vv
2023/03/07 17:15:48 DEBUG : rclone: Version "v1.61.1" starting with parameters ["rclone" "copy" "GD:Dupes" "GD:TestCopy" "-vv"]
2023/03/07 17:15:48 DEBUG : Creating backend with remote "GD:Dupes"
2023/03/07 17:15:48 DEBUG : Using config file from "/Users/etexter/.config/rclone/rclone.conf"
2023/03/07 17:15:48 DEBUG : Creating backend with remote "GD:TestCopy"
2023/03/07 17:15:49 NOTICE: hosts: Duplicate object found in source - ignoring
2023/03/07 17:15:49 DEBUG : Google drive root 'TestCopy': Waiting for checks to finish
2023/03/07 17:15:49 DEBUG : Google drive root 'TestCopy': Waiting for transfers to finish
2023/03/07 17:15:51 DEBUG : hosts: md5 = 8d955837212e82c38afc5b39b341d7c4 OK
2023/03/07 17:15:51 INFO  : hosts: Copied (server-side copy)
2023/03/07 17:15:51 INFO  :
Transferred:   	        601 B / 601 B, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         2.9s

2023/03/07 17:15:51 DEBUG : 9 go routines active

So that would be the error.

I don't know what you mean by a duplicate folder.

Hello @Animosity022,
Thank you for your quick answer.

I do have the same folder created multiple time with the same name in the same directory, but only on the root level it seems.

On Box is have:

BOX:
    - Folder_A
        - Folder_AA
            - File AAA
            - File AAB

It is synchronized as bellow in my Google Drive (Shared drive):

GDRIVE:
    - Folder_A
        - Folder_AA
            - File AAA
            - FIle AAB
    - Folder_A
        - Folder_AA
            - File AAA
            - File AAB

Sometime there are differences in the files like so:

GDRIVE:
    - Folder_A
        - Folder_AA
            - File AAA
            - FIle AAB
    - Folder_A
        - Folder_AA
            - File AAB
            - File AAC

I hope it is clearer like this.

Do you consider this duplicates ?
If not, what are those to you ?

Any hint on what might be causing this, or leads on where i should investigate ?

Thanks you for your time !

This can caused by using --drive-shared-with-me. This flag applies to both the source and the destination so if you do something like

rclone copy drive:stuff drive: --drive-shared-with-me

because the --drive-shared-with-me applies to the destination as well it means that rclone doesn't see any existing things in the destination and creates a new folder every time.

The way to avoid this is to apply the shared with me just to the source drive.

rclone copy drive,shared_with_me:stuff drive:

Now I don't think this applies exactly to what you are doing, but maybe the principle will - that the destination needs to be able to see the existing directories otherwise it will create new ones each time.

The other thing that can cause this is running multiple concurrent rclones.

Where do you see the OP is using shared-with-me?

I don't - I just wanted to give an illustration of how this sort of thing happens. Hopefully this will give the OP an idea!

1 Like

Hello !

I think i've come to the root cause of my duplicated folders: Google Drive lags a lot.

If I stop a sync before it is completed, i can see folders still "arriving" on Gdrive for more than an our.
If they are not listed by the UI nor the API, i guess it is normal for Rclone to sync them again if i launch a new sync.
It's probably not an Rclone bug then !

Thanks you guys for your help and your amazing work.

Ah that would explain it.

Are you using a shared drive? I've seen that behaviour there before.

Yes we are using shared drives and on the first sync i send thousands of files and folders

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.