Rclone making dupes on google drive

What is the problem you are having with rclone?

tried to sync a folder ABC from my gsuite drive to my gsuite team drive.
I'm running this via bat to a command runs for some time, stops, moves to next one..

Had to dedupe the folder which left some folders in
"mystuff (1)" like state, so if i run any other sync utility the sync says the stuff doesn't match as the path/folder names are different. ( some files were the same way too )

This means the entire sync effort was moot.

Later found the subfolders were all over the places in multiple copies.

What is your rclone version (output from rclone version)

rclone v1.49.2

  • os/arch: windows/amd64
  • go version: go1.12.3

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Windows Server 2016

Which cloud storage system are you using? (eg Google Drive)

Google Drive - own gsuite.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync sa1:ABC upload15: --fast-list --transfers=8 --checkers=12 --drive-chunk-size=512M --ignore-existing --progress --quiet --max-transfer 700g --disable copy

A log from the command with the -vv flag (eg output from rclone -vv copy /tmp remote:tmp)

Dupes can happen on Gdrive. Rarely under normal circumstances - but very commonly if you run several operations on the same destination at the same time. I suspect that is what happened here. That should be avoidable by making sure that jobs don't run concurrently to the same location. Either split them up by fodlers so that there is no overlap in the sync paths, or (much easier) run them one after another so that one can finish before the other starts.

If that still gives you some dupes, you may consider not using fast-list for these sorts of tasks, as fast-list can be a few minutes out of date on picking up some changes. It has to do with it apparently using some cache-system on googles backend. regular listing will avoid that.
The other alternative is to just wait a bit between operations. It usually doesn't take more than a minute or two for fast-list to be up-to-date at worst - but I bet this is the sort of thing that can vary by server load.

Thankfully there exists a tool in rclone to fix these problems though - so you won't have to resync everything.

rclone dedupe yourremote:
See documentaton:
https://rclone.org/commands/rclone_dedupe/

There are several modes you can use, but by default it will auto-merge when it can (which will be most) and then ask you what do to for any ambiguous conflicts.
If saving reuploading time is essential then try to choose smartly.
If you want to just get it done fast then you're probably fine just doing "rename all" on conflicts and then do a second resync to fix up any of these conflicts. That will save you manual interactions at the cost of having to re-transfer a small amount of the files.

i thought so too about

very commonly if you run several operations on the same destination at the same time.
But what i did was simply

  1. Setup a job that has multiple commands in same bat.
  2. Ran it on 3 diff consoles side by side but on different folders in

abc/f1
abc/f2
abc/f3

So it was
rclone abc/f1 >> abc/f1
rclone abc/f2 >> abc/f2
rclone abc/f3 >> abc/f3

Yet I ended with dupes. :frowning:

Thankfully there exists a tool in rclone to fix these problems though - so you won't have to resync everything.

dedupe didnt help this one as the source folder was

abc/f1 on left side.

and on the right side the targer folder ( after dedupe ) became
abc/f1 (1)

This also was the case for some sub folders inside.

Am i missing something here?

What should be the most stable command to copy a drive folder from A to B
1.Where it exists on drive
2.I want to switch the config after Xgb of transfer but i want to avoid dupes like this.

Huh, strange - I have never seen folder dupes being renamed to (1) like that. That shouldn't be what google does and it almost sounds like it happening at some other level. Google dupes should be "true" dupes ie. multiple items with the same exact name. The dedupe tool is specifically designed to fix these types of dupes (it's not a all-purpose deduping tool in other words).

You are not copying all this through a mount or some other abstraction-layer are you? Because the (1) renaming behavior sounds more like something an OS would do to prevent a collision.

The most stable transfer should be a simple sequential series of move/copy/sync operations - while not using fast-list, directly to remote, ie. not via a mount. This should not make any dupes. (or at least very rarely).

One file here and there may seemingly randomly be affected unrelated to this, but it should happen rarely, and that's what the dedupe is for. Nick hasn't been able to fully track why this happens and we may never completely eliminate it for Gdrive, but it should be a very minor and easily rectified issue under normal circumstances. Your scenario is definitely not this, but something else.

You are not copying all this through a mount or some other abstraction-layer are you?
Nope, simply a drive to drive copy or a local to drive copy as shown in the initial command i shared.

I think i should do my parallel syncs by adding a root folder flag as well.
or simply as you suggested, Will try a cycled sync but run it one by one.

command1
command2
command3

I'll try both ways and repot back what gives.

Just remember to not use --fast-list with this if you want to be extra safe (or else - set in a delay of a minute or so in-between). This may only be the case for Gdrive spesifically. I have no idea of other services face the same --fast-list caching situation.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.