Slow scans after library expanded

I have a bout 6 libraries and they are quite large, but the largest one being the 10k movies.
After checking the logs, I notice that all libraries scan relatively fast, but when it reaches the 10k movies one, it starts off fast but after a few minutes it goes slow (about 1 movie per second).

Where is the bottleneck?
-rclone config ?
-plex itself ?
-windows filesystem? (running windows server 2019)

Rclone - The only potential rclone config bottleneck I see would be info_age and chunk_total_size but I would like to know some confirmations, or see if anyone had encountered this problem before.

Plex - What makes me think being Plex itself is because the problem only started when the library started to get really large (10k+ folders inside the Movies) because when it had about 5k, it would scan all of the 5k movies fast.

Windows - This is what I doubt the most but you never know...

Also, is there an easy way to find how long do my scans last? Checking the Plex Media Scanner.log isn't very helpful because I can't find when it started due to the large scans.

Here's my rclone config below:

[rcache]
type = cache
remote = gdrive:crypt
plex_url = http://127.0.0.1:32400
plex_username = *****
plex_password = *****
chunk_size = 10M
info_age = 3d
chunk_total_size = 300G
db_path = Q:\temp\rcache
chunk_path = Q:\temp\rcache
workers = 10
plex_token = *****

[mount]
--allow-other
--read-only
--allow-non-empty
--log-level INFO

It's a Windows thing, unfortunately. I've had to deal with this myself for a quite a while now, having about 11,000 folders inside the Movies folder. My scans last about an hour and a half when the mount is already primed (if the content has been pre-read by doing a "dir /s"), but can be much longer if it's a fresh mount.

What I've been working on, slowly, is creating folders "A" through "Z" and "#" inside the Movies folder, and then moving the individual movie folders into those. That is really the only way to make things significantly faster in Windows.

As far as the length of your scans, wouldn't you be able to look at Plex when you start it and then again when the scan ended to see the time stamp? I'm assuming you're doing manual scans only, since you're on Windows.

1 Like

Thank you!

Have you tried mounting the crypt folder with writes on, then selecting all movies started with the letter A and moving into the A folder? I might try this, but if you're working on it slowly, it might be because you've tried many things before, and I do want to know what.

Or use rclone move command with like "rclone move drive:movies/A* drive:moviesA2Z/A/". Will it work?
(edit: it does not :frowning: )

I'm not doing manual scans, it's doing automatically every 6 hours. No, it takes way too many hours for the logs and for me to have the Alerts opened the whole time.

I stopped doing automatic scans once I fully converted over to G Drive. I now scan once a day after I add my daily uploads.

At first, I tried moving stuff with a write-enabled mount, but it turned out too slow. For "A" alone I had to move about 800 folders, and with each folder taking about 5 seconds, I didn't want to wait that long. So I asked Nick and others here for the best way to do this via command line and ended up using that solution. It's a lot faster, but I haven't done it in a while. So far, I've only moved "#", "A", and "B". Main reason is that Plex has to scan the moved content in as "new", even if you don't empty the trash. The movies don't show up as recently added, which is good, but they need to be re-analyzed, unfortunately. And that takes a lot of time. Pains me to think I'll also have to do this with more than 2,000 TV folders :laughing:

Just noticed you edited your last post while I was writing. Manual move does work. I'm using this:

rclone move -v --include "/A**" Google_Drive_Crypt:Videos/Misc/Movies Google_Drive_Crypt:Videos/Misc/UPLOADS/Temp/A

G Drive is case sensitive, keep that in mind. The above command moves every folder inside the Movies folder starting with a capital A into a folder called A inside a folder called Temp. Later I would move the folder called A back over to the Movies folder (I realize I could make this easier on me).

1 Like

Just noticed you edited your last post while I was writing. Manual move does work. I'm using this:

rclone move -v --include "/A**" Google_Drive_Crypt:Videos/Misc/Movies Google_Drive_Crypt:Videos/Misc/UPLOADS/Temp/A

Thank you so much, this is working perfectly. I'm moving the movies at about 100/minute.

I don't see myself doing this with TV shows that soon. I only have about 250 and they are splitted between sonarr and the completed/ended ones.
Might do it to Music though.

Update:

TLDR: Bad idea using A through Z folders.
1- Limited to 26 folders
2- Unbalanced
3- T-Folder alone is made out of 40% of the whole library. (Movies that start by "The...")

Until now I've moved everything to A-Z folders, realized it was a bad idea, creating a separate 27th folder "The", then making the new movies go to another folder structure, separated by year ranges (80s, 90s, 2000s, 2010s, etc).

I came to the conclusion that I want the folder structure to be organized so I'm now moving everything (again) and separate all movies by each year and see how it goes.

I can't speak for what if anything may go wrong inside Plex (due to lack of experience), but it does sound a bit like the problem may be listings. Any info it can fetch from the cache would be very fast, but up to a second pr individually listed folder may not be abnormal if it has to fetch them one-by-one as it goes after it can't find it in cache.

Do you use the --rc on your remote?
If so you might try to send a

rclone rc vfs/refresh recursive=true

to the RC to refresh the listings for the entire drive (should use --fast-list automatically when available, making this take a reasonable amount of time). I think this listing info should then stick in your cache as the answer comes back - hopefully fixing the issue. At least until
info_age = 3d
expires.

You may consider increasing this timeout. The only downside should be that if you upload files outside of the cache then they won't be registered and may take a long time to show up, but the benefit is you don't have to keep re-listing so much every 3 days. You can always just do a manual refresh if this happened and you needed to correct it. Anything going through the cache on the way up should have it's info added to cache automatically though.

1 Like

The only downside should be that if you upload files outside of the cache then they won't be registered and may take a long time to show up

Not really. When I upload things outside the cache (and even outside the computer), rclone mount receives a cache expiry notification (I don't know how) and the cache will have to refresh that one folder you just uploaded, next time it's accessed.

You may consider increasing this timeout.

I am considering. I noticed that the whole refresh was slow after the 3 days.

Do you use the --rc on your remote?

What do you mean "remote" ? As far as I remember rc is used for remote accessing rclone. All rclone commands I do, are on the server locally.

That is probably due to polling. The mount can ask Gdrive to send a list of all changes to the drive since [last-timestamp]. You are right that this would probably pick up the changes pretty quick. I think the default polling interval is either 1 or 5 minutes. It can be adjusted if needed as a flag on the mount command.

RC (enabled by --rc flag in mount command) can be used to well.. remote control from somewhere else obviously, but it has other benefits too.

  • Being able to give rclone instructions (or change settings) while a mount is running on that instance (the mount would normally be blocking further input)
  • Being able to bypass the limitations of the mount, in this case we would bypass it's inability to use --fast-list natively, a consequence of OS compatibility.

Running the command I suggested leverages both of these. Personally I use this to pre-cache my drive on launch to make it fast and snappy (I do not use the cache-backend so it's not stored persistently).

Otherwise you'd have to do it "the dumb way" which would be to "dir /s" the whole location from the OS, but that would not use fast-list and thus be obnoxiously slow for the kind of archive you have.

My (probably much smaller) archive of 85K files takes 60sec to list with fast-list. 13-15minutes with default listing method.

1 Like

This problem of slow Plex scans with huge libraries exists only in Windows. Refreshing the listing, caching, etc., none of that makes any significant difference in scan time when Plex has to go through thousands of individual folders.

@brvnogit - Did you not notice a faster scan after you moved everything into A-Z folders? I'm curious, because I haven't gotten any further than "C" :laughing: Of course the letters "A" and "T" will have considerably more content than the rest of the letters, but it should still be much faster.

1 Like

@brvnogit - Did you not notice a faster scan after you moved everything into A-Z folders? I'm curious, because I haven't gotten any further than "C" :laughing: Of course the letters "A" and "T" will have considerably more content than the rest of the letters, but it should still be much faster.

@VBB Yes, I did notice a (much) faster scan after moved everything (and also after Plex deleted, and re-entered the movies). All libraries would refresh in less than 30 minutes. However, I noticed only one slow scan after the 3rd day. Can't confirm if it happened again in the 6th day, because I wasn't paying attention to Plex's activity, but it might also happened.

I might change my cache timeout from 3 days to 7.

@VBB Please consider separating movies by year instead of A-Z before it's too late. In my case of 10,000 movies, 4000 were started by "T" and in like a year or two it might get the slow scan problem again. And also, I find it easier making a script filtering it by year instead of the first letter.

RC (enabled by --rc flag in mount command) can be used to well.. remote control from somewhere else obviously, but it has other benefits too.

I see that as a potential additional threat/risk, and I don't see much benefit from that. But I will definitely study the RC flag and play with it.

Being able to bypass the limitations of the mount, in this case we would bypass it's inability to use --fast-list natively, a consequence of OS compatibility.

Are you also using Windows? What kind of limitations? Are you using vfs cache or the cache config?

@thestigma Also, what configs are you using?

My (probably much smaller) archive of 85K files takes 60sec to list with fast-list. 13-15minutes with default listing method.

How is your file structure? Are those 85K files in just 1 folder? Do you have any folder with about or more than 10K files?

Sounds like the slow scan was caused by the cache timeout. Is there a reason why you use the cache? Either way, once the cache/listing times out, a Plex scan would take longer again, yes. That is why I now use a much higher "--dir-cache-time", so that has become a non-issue.

1 Like

You can set a password for it, so it should be more than secure enough. It wouldn't normally be accessible from the outside unless you forwarded a port for it anyway.

I am on Windows 10.

The mount has some limitations that aren't rclone-spesific but rather OS-spesific. The mount has to interface with the OS, so it can't do instructions that the OS don't know how to ask for. --fast-list would be one of these (OS always iterates though). Displaying metadata hashes would be another one, since NTFS that is used for Windows compatibility doesn't use hashes. Stuff like that.

I use only the VFS cache. Animosity does the same on Linux (using Plex). We both had our testing-phase on the cache but ultimately decided against it due to certain bugs, limitations and inefficiencies - as well as the fact that the cache-backend is pretty dead in terms of develpment (author disappeared) and VFS is slowly taking over it's roles more efficiently as it improves. I don't use Plex myself, but Animosity does and says cache should be unnecessary for smooth playback - at least if you disable some of the more in-depth automatic scans. You'd have to ask him for the Plex-spesific details on that. For my Windows use I have smooth playback of 4K video when just using VLC for media.

Nothing like yours. My drive is a general-storage of all sorts of files like you'd find on any nerds local harddrive, not a pure media archive. Lots of fairly deep folder-structures but not many huge folders of several thousand files, no, mainly because that would be really impractical for direct human interaction (which is my primary use-case).

Theoretically, listings should be more efficient the less folders there are as one folder = one list request. --fast-list bundles together many such requests and will thus have the greatest savings if there are many folders and subfolders to go though (like for my use-case). A single folder of a gazillion files should still be only one list request (albeit probably one having to be transferred in several chunks simply due to the size).

1 Like

The mount has some limitations that aren't rclone-spesific but rather OS-spesific. The mount has to interface with the OS, so it can't do instructions that the OS don't know how to ask for. --fast-list would be one of these (OS always iterates though). Displaying metadata hashes would be another one, since NTFS that is used for Windows compatibility doesn't use hashes. Stuff like that.

Does that mean VFS is a completely different filesystem independent from windows? Does it mean if I use VFS cache, I won't get a slow Plex scan on a 10k folder?

I use only the VFS cache. Animosity does the same on Linux (using Plex). We both had our testing-phase on the cache but ultimately decided against it due to certain bugs, limitations and inefficiencies - as well as the fact that the cache-backend is pretty dead in terms of develpment (author disappeared) and VFS is slowly taking over it's roles more efficiently as it improves. I don't use Plex myself, but Animosity does and says cache should be unnecessary for smooth playback - at least if you disable some of the more in-depth automatic scans. You'd have to ask him for the Plex-spesific details on that. For my Windows use I have smooth playback of 4K video when just using VLC for media.

My problem with vfs is that I want to cache the dirs on disk, so I can stop and start the mount without purging the cache.
Second (big) issue is the crypt problem. Everything on the gdrive is crypted. When using vfs cache, the file has to be downloaded fully for video playback.

The VFS (Virtual File System) is a layer that rclone uses to present a cloud storage location as a file-system. Then WinFSP presents that virtual file-system in a format that allows the OS to interact with it natively (emulating NTFS format for example on Windows). The VFS is independent of the OS yes, it runs within rclone. If you are running a mount you are already using it to some extent (though the default settings are very minimal, such as not caching anything).

The VFS has some added functionality such as caching (writes only so far) as that was needed to create full compatibility for the OS. It has since been elaborated upon to have more useful functions in general that aren't just strictly needed for the mount to work (which was the main reason for it to exist in the beginning).

The VFS is fast, so it won't limit you in terms of performance - but I can't say if it will solve your "slow 10K scan" problem either - because that depends entirely on what the root cause is (which I don't know for sure). If the problem turns out to be in Plex then it's unlikely that the VFS will just magically fix that somehow.

Currently the VFS does not cache persistently - only to memory as long as the rclone instance runs. However, NCW has indicated that this functionality is on his list of things to get done.

I don't see why you would need to purge the cache though? ... You'd need to pre-cache it again on a restart if you wanted to have it precached, but that is optional and not that much of a hassle to do with a fast-list via the RC. I just have a script that does this for me whenever I launch my mount.

That's not true at all. Decryption happens on the fly. You can decrypt a stream as it is accessed (this is the way it happens by default). I can open and watch a 100GB large 4K movie in about 2-3 seconds even though it is stored encrypted on my Gdrive (like all my files are). Obviously this would be impossible if it had to download it first to start playback...

Note that if you set the --vfs-cache-mode full then you DO have to download files completely first, but cache mode full is neither required, nor recommended for most use-cases. I generally recommend cache mode writes (for unrelated reasons), but to stream you don't need to have the VFS cache enabled at all.

1 Like

From what @VBB said, it's windows.

Note that if you set the --vfs-cache-mode full then you DO have to download files completely first, but cache mode full is neither required, nor recommended for most use-cases. I generally recommend cache mode writes (for unrelated reasons), but to stream you don't need to have the VFS cache enabled at all.

That was it, then. I Used vfs-cache-mode full in read-only, because I thought it would cache the dirs to disk. (don't judge me for trying :sweat_smile:)

So I guess I need to start using vfs. I still want to mount it read-only because I want to limit my upload speeds. WIll still the cache be receiving expiry notifications in vfs mount ?

What are your configs ?
I'm so confused and skeptical on how the RC mode works. Doesn't make any sense (to me now) remote controlling a rclone mount. By HTTP, does it mean on a actual HTTP page on the browser?

Also I want the rc to be controllable ONLY by the localhost machine (no access from private network. Should I use --rc-addr=127.0.0.1 ? Or set a password just in case?

I still don't get it. When launching and doing a fast-list, how is not not slow if it's a cold start? Isn't rclone requesting the listing of all files and calling API hits ?

I think you misunderstood him slightly.

the VFS is an independent thing from the OS - but Windows is ultimately the thing that reads it (via WinFPS) so it's perfectly possible that it could be Windows itself handling extremely large collections of files in a single folder poorly. I don't know if that is the case, but it's entirely plausible.

I did handle a folder with 17K files in it recently for testing purposes (just generally in windows explorer). That was somewhat slow, but still got listed and accessible in maybe 5-10 seconds, which I deem within reason and not abnormal in itself. Outside testing I rarely use such large single folders though.

Yes, the VFS cache updates (add and removes) based on polling info.

Just a Google drive remote, layered with a Crypt remote on top.
The rest of the settings are in my mount command.
Do you want excruciating detail? If so I suppose I can share my whole config and script upon request.

The RC is just an alternate route to talking to rclone as it runs. Nothing more fancy than that.

I'm not sure what you are referring to here when you say HTTP. Do you mean the webGUI? The RC can (aside from text-based commands) also connect to a web-GUI (newly added) that lets you monitor the performance of rclone and do other basic tasks (more advanced functionality being added over time).

The RC server just lives on a network port (default 5572) and listens there (and/or serves the webGUI there - which would be accessible via HTTP if you want it to). The RC also has an old and very basic HTTP mode where you could also download from if you chose to enable that --rc-serve
That's all extra-functionality though. At the most basic level the RC just listens for commands to execute.

localhost (127.0.0.1) is where it is served by default. If you wanted it to be accessible from the outside you would have to open port 5572 through your firewall - and if you have a NAT router you would also need to forward the port. It's not really accessible to others normally. Not unless you don't trust other machines on your LAN + don't use any firewall at all (including windows default firewall).
There may be a setting to ONLY accept request from the local machine - but I'm not sure off the top of my head - you'd have to check the documentation. I'd probably just set a password for it if you were worried about it.

EDIT: heading to bed for now, but I will answer tomorrow if there is anything more :slight_smile:

Windows is ultimately the thing that reads it (via WinFPS)

This answers it. Thanks.

heading to bed for now, but I will answer tomorrow if there is anything more :slight_smile:

Thank you so much for the time and patience. Really appreciate it. I'll go on and test vfs and rc a little bit now. I'll definitely post here, again, if I get into a problem.