Fast-List overwriting and endless sync with gsuite/drive

#2

Did you use --fast-list with this? --fast-list does a complete scan then does the filtering afterward which might explain these results.

Or not rooting your --include might explain them too - see later.

rclone should work identically with or without --fast-list. I suspect this might be caused by duplicate directories (as in two directories with the same name in the same folder). I suggest you run rclone dedupe to see if that is the case.

Are you using your own client_id? If not then getting your own will help with the transfer speed.

You want to root your --include with a leading / assuming those folders are in the root otherwise rclone will be looking for folders called QXXXX in any folders (including all the other Q folders).

0 Likes

#3

Adding the leading / doesn’t seem to have had any effect, same delay afterwards, Makes sense to be there though.

Running a dedupe now, just sitting at a blank cursor for 15 mins, I’ll let it run overnight and see what happens.
I don’t see any duplicate folders in either google drive web UI or in Drive stream. Even with the leading / only the --fast-list version tries to upload files on sections I’ve fully uploaded.

I tested the same 94k folder as previously, It “checked” 1125 files and then started re-uploading. If I remove the --fast-list entry it check’s the full 94000 then just counts up the timer with 100% completed.

I hadn’t tried my own ID, Hadn’t considered it due to not having any rate-limit errors at all. Will try tomorrow.

1 Like

#4

Well the Dedupe finished overnight, no status/msg of any kind. I gather that means it didn’t find any duplicates.

I’ve got my own ID in there as well, no noticeable change in performance, At this point I’m pretty sure its just my bandwidth limitations.

There is however a bit of a difference with the --fast-list now. On the exact same command I now have it check 90774 files and then try and upload 3248, Is it possible this is related to time-stamp accuracy somehow? Does the recursive list return one less decimal point or something like that?
If that’s the case I’ll just re-upload those last few and use --fast-list for everything from now on. The performance benefit seems to be worth the time to re-upload this once.

I went and double checked some of the files that it’s trying to re-upload and they are definitely already there.

0 Likes

#5

–fast-list does the same thing when it reaches the end. Just keeps counting up at 100% with eta of 0s

Transferred: 102.101M / 102.101 MBytes, 100%, 18.268 kBytes/s, ETA 0s
Errors: 0
Checks: 32683 / 32683, 100%
Transferred: 1156 / 1156, 100%
Elapsed time: 1h35m23.2s

And re-running that exact same attempt after it finished, already trying to re-upload 802 files.

Transferred: 1.428M / 79.695 MBytes, 2%, 13.698 kBytes/s, ETA 1h37m30s
Errors: 0
Checks: 33037 / 33037, 100%
Transferred: 7 / 802, 1%
Elapsed time: 1m46.7s

0 Likes

#6

Did you run it with -v? It should have printed stuff about duplicate directories if that was the problem.

I’d like to see a log of what you are doing with -vv with and without --fast-list - can you post them somewhere? Or alternatively email them to me nick@craig-wood.com - put a link to this page in please.

0 Likes

#7

Edit - Dedupe with -v finished with only a single entry “Google drive root ‘Directory1’: Looking for duplicates using interactive mode.”

I’ll see if I can figure out a way to capture the output of sync, In about a second I had several thousand lines go by with -vv

0 Likes

#8

Alright, logged and email’d

0 Likes

#9

Thanks for the logs.

Here are some things I observed…

In each run the total of checks + transferred is 33839 so at least that is consistent.

What it looks like is that --fast-list is missing some of the files. We did find a bug which could cause this which was fixed in 1.46 (released on Saturday) so it would be worth trying that - see the latest release.

I think the creating all the empty directories each time is probably this bug: https://github.com/ncw/rclone/issues/2869 - rclone is creating all the directories that have excluded files in. Does that look correct?

0 Likes

#10

The directories being created at the end are ones I would expect to be there (they’re empty directories inside the ones I match using --include). I wonder if maybe google drive isn’t reporting them in the list because they are empty? I’d be okay with them not being copied if that’s what it comes down to. (Whether by default or specified option). With that in mind, it might be related, but it’s not creating any directories it’s not supposed to (I’m also not currently using --exclude)

I’ll give that update a try and see what it does.

0 Likes

#11

Looks better with 1.46 but still re-uploading some.

14xxx-15999 worked fine twice in a row so I tested the next batch in order. On 16000-17999 It re-uploaded 373, then 175 and 175 in each successive attempt (the same files on the last attempts too, which was odd, Most of which were from J16610). So it’s possible something about those specific files is causing it.

0 Likes

#12

OK…

I think it is probably doing more work than it needs to.

The directories should be reported regardless of whether they are empty or not.

I just noticed you are using a Team drive. In my testing when I originally did Team drive support, I noticed exactly this sort of problem. I put it down to eventual consistency on team drives - the uploads taking a while to appear in the listings. Do you think this is the problem now?

We can test this though…

If you run

rclone lsf -R --include '...'  remote:...  | sort > list1

with your --include list, then try that with --fast-list to a different file. Try it a couple of times with --fast-list. I suspect you are going to see different results.

If you’ve got a specific directory which always shows the problem then can you email me the results of rclone lsf -R --fast-list remote:dir -vv --dump bodies which will probably be quite big!

0 Likes

#13

It’s gotta be that file delay that’s causing it.

I re-ran both of the ones that acted up yesterday and they’re 100% now without any re-uploading.

So I guess with --fast-list enabled I just need to compensate for that by not running it back to back. (shouldn’t be a problem once my first initial upload is completed and I switch it to a scheduled service).

I suppose team drives are caching their full lists for some period of time (I would guess as much as an hour).

I guess that means the only thing that’s really an issue is that it’s trying to re-create the blank directories at the end. If they already exist on the drive the process is quick, but if not, it appears to be doing nothing on the progress display while it does so (around 1s per folder in my situation)

0 Likes

#14

Strange that it should be the --fast-list that gets cached and not the other list.

The directories it creates - are they in the root of the --include?

Any directories that rclone lists should be in the directory cache already so mkdir should be instant.

Ah, I wonder if --fast-list isn’t adding them to the directory cache.

Do you see this long pause creating directories if you don’t use --fast-list?

0 Likes

#15

The directories it creates - are they in the root of the --include?

Sub directories of the ones matched with --include
Eg

J16123\folder

Do you see this long pause creating directories if you don’t use --fast-list?

The pause only appears the first time these directories are actually created. The second time through it’s able to do 1000+ in under a second. (although the log still says creating)

Edit - to clarify, no. It seems to be the same duration with and without --fast-list, Purely dependent on whether its the first run or not.

0 Likes

#16

Ah OK, so it sounds like it is working as intended, creating the empty directories on the first run only.

This is perhaps a bit misleading and I think it could probably be fixed.

I made an issue about it: https://github.com/ncw/rclone/issues/2977

0 Likes

#17

Sounds good. I would propose the following for the inverse scenario then (what brought this up in the first place)

Adding an entry to -P to show folders that still need to be created or counting folders in the same tally that does files (So it doesn’t look like it stalled out at 100% progress, when it’s still working)

Also maybe a warning when using --fast-list with team drives.
Something like this?

Warning: Rclone has detected that you’re using --fast-list with sync/copy/move to or from a Google Team Drive remote. Due to the way Team Drives currently cache the full directory listing there can be situations where Rclone is unable to see recently created files on the drive and cause the command to miss or re-copy files. Running the command without --fast-list should allow Rclone to properly detect all files on the remote. Waiting a currently unknown amount of time after a file has been uploaded (Estimated to be less than an hour) should also be effective.

0 Likes

#18

Maybe I could count these as Other or something like that… Normally rclone creates directories as part of the sync. Maybe doing that would be best rather than waiting until the end.

Fancy sending a PR for that? The only file that needs patching is drive.md?

0 Likes

#19

Yup, I can do that. Will take a read over it and likely submit later on this week.

0 Likes

#20

For the benefit of anyone else who comes across this.

The behavior of Team Drives and --fast-list appears to be worse than I initially thought.

I had to move 200k files between two team drives and adjust my scripts to match. It’s been over 2 days now and there are still files not showing up when using --fast-list on the new location. It seems to be indexing files at approx 1500-2000/hour. I don’t know if this speed is per account or per drive. I’ve got 8 such drives all currently in this status. I suspect a background operation in google drives that can only process the files so fast and add them to the main list.

Non --fast-list still operates normally and assuming nothing goes wrong I should be able to start using --fast-list again once google catches up.

0 Likes

#21

I wonder if there is something we could do to help with this.

The fact that without --fast-list you get the correct listings means that there must be bug somewhere…

Can you please make a new issue on github about this and we can investigate further.

0 Likes