How to speed-up opening folders/traversing directories?

thanks a lot for your reply and for the script you published earlier. However, I still don't know if one has millions of files and directories, which seem to take hours to scan/vfs-refresh, why not save it to disk and only poll for changes next time? Can rclone mount handle such big directories, or is it not its "use-case"? Google Drive on Windows seems to handle it just fine...

I really appreciate the work you did to develop rclone, but at the moment it freezes my system and I can't see how one can reliably use it. I would really appreciate your advice on this.

Should it take so long to scan millions of files? I tried to add --fast-list flag, but nothing seems to make any difference.

It may be worth noting that running ls command from terminal is fast for me as well, but navigating it from a file manager (Dolphin, Nautilus) freezes and crashes frequently.

It's actually faster to run a Windows VM and use google Drive client there sharing it via SMB/samba :joy:

the fact that ls works fine means the issue is not rclone but the file manager.
you need to tweak the settings for the file manager, not to create thumbnails, not to create previews, etc.

same on windows, using windows explorer can be painful, but can be tweaked to perform well.

post your systemd file so we can look at it.

also, perhaps update rclone....

If you want to code it up, submit a pull request as that's a requested feature that's out there :slight_smile:

Not everything is quite as easy to implement and Google Drive in that example doesn't work with dozens of cloud providers nor provide much of rclone's functionality overall.

I don't use Windows nor am I developer. I'm just help out how I can with my time :slight_smile:

It's already using fast-list behind the scenes so adding the flag wouldn't change anything.

I have ~160TB of data with 5k directories and 54k files and it takes about 1 minute for me to run it:

 time /usr/bin/rclone rc vfs/refresh recursive=true --rc-addr 127.0.0.1:5572
{
	"result": {
		"": "OK"
	}
}

real	0m53.836s
user	0m0.094s
sys	0m0.031s

I don't use any GUI based products as I'm all terminal. To see what's going on, you'd have to run a log with debug and you'd see what is slow and why.

Use what's best for your particular use case. Don't use something that doesn't work. My setup works flawlessly for my use case.

if it can take only a minute or so to scan such a large directory then it would make sense. In my case rclone's log output suggested that my API may be throttled, presumably because of an excessive number of requests. That would explain why it's so slow (Insync sync is quote slow as well).

Unknown as you have not shared any logs so I can't comment.

2021/06/17 21:20:07 DEBUG : pacer: low level retry 1/10 (error googleapi: Error 403: Rate Limit Exceeded, rateLimitExceeded)
2021/06/17 21:20:07 DEBUG : pacer: Rate limited, increasing sleep to 1.015327569s
2021/06/17 21:20:07 DEBUG : /: Attr: 
2021/06/17 21:20:07 DEBUG : /: >Attr: attr=valid=8700h0m0s ino=0 size=0 mode=drwxrwxr-x, err=<nil>
2021/06/17 21:20:07 DEBUG : /: ReadDirAll: 
2021/06/17 21:20:07 DEBUG : /: >ReadDirAll: item=15, err=<nil>
2021/06/17 21:20:07 DEBUG : vfs cache RemoveNotInUse (maxAge=31536000000000000, emptyOnly=false): item path/to/a/file not removed, freed 0 bytes
2021/06/17 21:20:07 NOTICE: Serving remote control on http://[::]:5573/
2021/06/17 21:20:07 DEBUG : Creating backend with remote "gdrive:"
2021/06/17 21:20:07 DEBUG : vfs cache: root is "/tonk/cache-user1/gdrive-rclone/vfs/gdrive"
2021/06/17 21:20:07 DEBUG : vfs cache: metadata root is "/tonk/cache-user1/gdrive-rclone/vfs/gdrive"
2021/06/17 21:20:07 DEBUG : Creating backend with remote "/tonk/cache-user1/gdrive-rclone/vfs/gdrive"
2021/06/17 21:20:07 DEBUG : Google drive root '': Mounting on "/home/user1/gdrive-rclone"

If you could let me know if you think that this may be the case with the Google API rather than rclone itself I would really appreciate that. I'm clueless at the moment why this vfs/refresh scan may be so slow.

What's the full log look like?
You are on an old version as well as you'd want to update.
Are you using your own client ID/secret? Can you see API hits on that in the Google Admin Console?

do you think not doing this could cause the problem
https://rclone.org/drive/#making-your-own-client-id

or does this, from the config file, mean the OP already did that

client_id = 
client_secret =

I can't really upload the entire log, because some files listed there are private, but the above entries mostly repeat over and over. I will try to upload the full log and take out the file names in a moment.

Can you see API hits on that in the Google Admin Console

I will take a look if it says anything.

Thanks a lot for looking into this :slight_smile:

I haven't generated those API ID and secret - this may be the reason why it's so slow... I am trying now, added them to the rclone config of this google drive, reloaded and restarted the systemd daemon (with the ExecStartPost line commented out) and trying to vfs/refresh scan on the terminal with --progress flag.

10 minutes now and it's still scanning, but at least the previous

2021/06/17 21:20:07 DEBUG : pacer: low level retry 1/10 (error googleapi: Error 403: Rate Limit Exceeded, rateLimitExceeded)

no longer appear in the log output.

yes, having your own client id is a big deal with gdrive.

the scan time can be affected by your internet connection.
from reading in the forum @Animosity022 and myself, we both have 1Gbps verizon fiber optic internet connections from verizon.

about the computer, is it virtual or physical.
using wifi,lan connection or what?

By not using your own client ID/secret, you are using the default rclone one which is over subscribed so you get rate limited to slow down and that does make things take longer.

What does:

rclone about gdrive:

and then on your mount if you run my commands on the mountpoint.

find . -type d /YOURMOUNTLOCATION | wc -l
find . -type f /YOURMOUNTLOCATION | wc -l

That will show you number of directories and files respectively. @VBB has something crazy and that refresh takes like 10-15 minutes or something along those lines.

1 Like

I've been following this thread, of course :wink:

Here's what I run a refresh on once a day:

image

It takes about 4 minutes on average with this command:

rclone rc vfs/refresh recursive=true --drive-pacer-min-sleep 10ms --timeout 30m --user-agent *******

3 Likes

oh, that would explain why it was that fast. Here it is, well, like this:

Elapsed time:     23m10.5s{
        "result": {
                "": "OK"
        }

but it did finally finish, so I went into Dolphin and Nautilus (Linux's file explorers) to test and voila, it finally opens up the folders in less than a second even in the most obscure, deeply nested directories.

However, both Client API ID and the vfs/refresh tweaks were needed for this to work. I tried to open some folders after reconfiguring rclone mount with Google's API ID and secret, but before the rclone rc vfs/refresh ... command finished and it was still sluggish/freezing.

Now after the full scan it's butter-smooth. Thanks a lot for this suggestion and for bearing with my questions. Rclone community is awesome.

great!

i remember that @vbb was going to write a wiki about this topic.
he seems to be lurking in the shadows recently, as outlaws are want to do.

2 Likes

Wow, that's impressive. Could you let me know what's your internet speed?
Also this flag user-agent could you let me know what is it doing here?
Pacer in my rclone log tends to take a big section at some point with entries like:

pacer: Reducing sleep to 896.435684ms

I suppose your option --drive-pacer-min-sleep 10ms adresses that?

yes, lack of the client ID/secret was the culprit. Thanks a lot for your help!

just to sure, shouldn't it be

find ./YOURMOUNTLOCATION -type d | wc -l

? I've run it on my google drive, which yielded 78,149 directories and 1,234,506 files. "Only" 9.67 TB

It still seems to be taking about 20 minutes to do this scan (on an average wifi 14.93 Mbps Down and 6.41 Mbps Up), surely a caching to disk would come in handy if technically possible, but it's no longer a life or death situation.

Internet speed is also gig up and down (dedicated server with OVH).

As for the pacer sleep time, see here Google drive. @Animosity022 recently made the change from 100ms down to 10ms, after it was discovered that Google had increased the default number of API calls from 1,000 to 10,000.

Speaking of, @Animosity022, would it also make sense to increase --drive-pacer-burst?

EDIT: forgot to mention that I recently added the user agent to all of my commands just as a precaution. I don't believe it makes any difference. It can be set to anything you want. When you consider the amount of files you have, twenty minutes is probably not bad at all.

1 Like

Yep, sorry as I had a typo there as. you want the full path in the command like:

find /GD -type d | wc -l
5265

A ./GD would be relative from your working directory and not work so that's my typo.

If my math serves me, I have a 10,000 quota per 100 seconds, which comes to 1000 per second and 10ms gives me the ability to do 1000 in a second so I'd imagine burst should actually go from the default 100 to 1000 as well as that would make sense to me (assuming I can do some basic math being half asleep :slight_smile: )

I'll test that change out in my setup and see how it works out as that's a great catch.

I recall like a year or so ago Google throttled the user-agent of rclone for a day or something and I've kept the user agent set ever since. Not sure it does any harm so why not.

2 Likes