Vfs cache performance problems

Inuakio · December 25, 2021, 6:43pm

What is the problem you are having with rclone?

So im using rclone in vfs cache mode full so i can build a cache for my server and all the other things scanning it. From what i understand with the cache mode on full it should cache in chunks the most used parts of the files (like for plex and emby scans). It seems to be doing this but im having a couple performance issues that have me confused. I assumed that since that cache was on local disk and my dir cache time high that i should be able tog et basically local storage list times on the mounted directory. but when i do ls -R to the mounted directory it takes a quite a long time. However if i do the same thing to the directory where the cache is stored it takes almost no time at all.
Shouldnt these times actually be a lot closer?
rclone should be serving from the stored data in the cache and using the dir cache for this before hitting the remote right?
i would think that because these are stored locally and shouldnt need to be requested from the remote that the times should be just slightly slower then if listing the actual directory where the cache is.

Any and all help would be greatly appreciated.
Also using the vfs cache for very specific reasons and cant not use it. The data from scans and most frequently played parts of items really need to be cached locally.
Happy Holidays and thank you for your time!

What is your rclone version (output from `rclone version`)

rclone v1.57.0

os/version: ubuntu 18.04 (64 bit)
os/kernel: 4.15.0-163-generic (x86_64)
os/type: linux
os/arch: amd64
go/version: go1.17.2
go/linking: static
go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

ExecStart=/usr/bin/rclone mount \
  --config=/root/.config/rclone/rclone.conf \
  --allow-other \
  --allow-non-empty \
  --rc \
  --rc-no-auth \
  --rc-addr=localhost:5576 \
  --drive-skip-gdocs \
  --vfs-read-chunk-size=64M \
  --vfs-read-chunk-size-limit=1G \
  --buffer-size=16M \
  --vfs-read-ahead=32M \
  --poll-interval=5m \
  --dir-cache-time=999h \
  --timeout=10m \
  --umask=002 \
  --read-only \
  --cache-dir=/vfscache10 \
  --vfs-cache-mode=full \
  --vfs-cache-max-size=6500G \
  --vfs-cache-max-age=999h \
  --transfers=32 \
  newmovies: /mnt/GMovies
ExecStop=/bin/fusermount -uz /mnt/GMovies
Restart=on-abort
RestartSec=5
StartLimitInterval=60s
StartLimitBurst=3

The rclone config contents with secrets removed.

type = drive
client_id = dummyid
client_secret = dummysecret
scope = drive
token = dummytoken
team_drive = Dummyname 
root_folder_id =
service_account_file = /home/sa/162.json

A log from the command with the `-vv` flag

https://gist.github.com/Kal1210/45f1b9bae53bb9f7b21f2118c471afe3

asdffdsa · December 25, 2021, 6:56pm

as you might know, rclone has two caches.
one cache, vfs file cache, to store chunks of files.
one cache, vfs dir cache, to store the dir/file names

this might be a helpful summary
https://forum.rclone.org/t/status-about-using-rclone-for-music-storage-playback-in-2021-access-times-improved/27648/34

you can prime or pre-load the vfs dir cache, which can be useful, before a media server scan.
rclone rc vfs/refresh recursive=true
so if you do a ls -R on the mount, should be much faster.

Inuakio · December 25, 2021, 7:04pm

so this should just have to be run once per reboot?
am i able to just add this to my mount command?

asdffdsa · December 25, 2021, 7:11pm

yes, run it on reboot.
since you are using systemd, can add
ExecStartPost=/usr/bin/rclone rc vfs/refresh recursive=true --rc-addr 127.0.0.1:5576 _async=true

also, i run it on demand, for example,
if i add media files to my rclone mount and now i need emby to scan it.
first i prime the vfs dir cache and than run the emby scan

Inuakio · December 25, 2021, 7:21pm

when trying to fun this command i get this output
2021/12/25 20:13:02 DEBUG : rclone: Version "v1.57.0" starting with parameters ["rclone" "rc" "vfs/refresh" "recursive=true" "--fast-list" "--rc-addr=:5576" "-vv"]
2021/12/25 20:18:02 DEBUG : 4 go routines active
2021/12/25 20:18:02 Failed to rc: connection failed: Post "http://localhost:5576/vfs/refresh": net/http: timeout awaiting response headers

asdffdsa · December 25, 2021, 7:28pm

try rclone rc vfs/refresh recursive=true --rc-addr 127.0.0.1:5576 --timeout=10m
change 10m as needed.

Inuakio · December 25, 2021, 11:42pm

this command did work and i do indeed get better listing times, however this with plex this hasnt seemed to make a difference. The scan time when using vfs with no cache on the same library is much faster and completes in minutes rather than hours. This just doesnt seem right because i can see that chunks have been stored on my local disk but am still not getting great times on the scan. I would think this would be much better scanning off the cache with the data stored on local disk as opposed to having to download from the Gdrive first.
Am i doing something wrong with my set up?

Animosity022 · December 25, 2021, 11:55pm

Can you post a log longer than a few seconds?

Cache or no cache, plex has to analyze files. If you change paths/locations, it has to analyze them and update them.

Are you using an encryption or just mounting directly as you posted a snippet of your rclone.conf so I can't tell

How many items are you scanning? What's the size of the drive? In the Linux world with Plex, analyzed media scans are pretty much instant as it takes a few seconds to scan my library.

You have a pleothra of flags too.

This one allows for over mounting and can cause issues. Should only use this if you have a very specific use case requiring it.

Just remove them these and use the defaults.

Are you mounting it read only for a reason?

Any reason for these? It not, I'd remove and use the defaults.

This goes back to the read only question as this flag is only for writing files and if you are ready only, why are you setting this?

Are you using a service account? You can remove your client ID/client secret and token and that's not needed if you have a service account. Those values are needed if you are not using a service account. A service account uses it's own client ID/secret.

Use this command not

The aysnc true runs it in the background and fast-list is redundant as you already have recursive there. That way you don't worry about the time out and it'll just finish on its own.

Inuakio · December 26, 2021, 12:37am

The paths havent been changed, i have a plex server on a similar system set up with regular vfs no cache and another server that im using to test with the cache.
Cache has been populated by a few scans. im just mounting directly with no encryption.

The library has 43,000 movies. The drive is very large. doing an rclone size now. The stored size of chunks cant be too big or else i wont have enough storage. it needs to be at least big enough for all the scan data and some of the most played files. which it currently is but doubling chunk size could be problematic. Also a big data waste.
What are the defaults for buffer size and chunk size limit?
also the defaults for vfs read ahead?
the goal is to find a sweet spot here. There is going to be several constant streams from my friends and family who are all major binge watchers. between that, and all the scanning form plex, emby, jellyfin, sonarr, radarr, etc all hitting this it will be a bit busy all the time.
i sk for the defaults because i dont want the system to get stressed and be able to handle 15 streams with multiple things scanning without issues. as it does for the most part with regular vfs.

also shouldnt all these options come after the data stored locally in the cache once its there. The data in the cache should be accessed before rclone starts downloading chunks from the remote right?

i can remove allow non empty and the transfers flags because indeed they are not needed with the current set up.

i can get you a log but was having trouble getting it pasted into places if it was longer then what i sent.

Inuakio · December 26, 2021, 12:55am

Also will changing the chunk value invalidate the cache like before with the old cache back end?

Animosity022 · December 26, 2021, 1:08am

No. the chunk size has really nothing to do with the full mode as it's just the range on the request being made. The defaults are generally fine.

That's really not that bad. My library has about 60k items in it and based on folks here, I'm rather small.

There's no need to run rclone size, just do rclone about remote: as it'll tell you the size used unless it's a team drive.

      --buffer-size SizeSuffix               In memory buffer size when reading files for each --transfer (default 16Mi)

--vfs-read-chunk-size SizeSuffix        Read the source objects in chunks (default 128M)
--vfs-read-chunk-size-limit SizeSuffix  Max chunk doubling size (default off)

      --vfs-read-ahead SizeSuffix              Extra read ahead over --buffer-size when using cache-mode full

All those items are on:

or

What is a stressed system? Out of what?

Yes, that's all easily seen in the debug logs.

Not sure what that means. Once something is scanned and analyzed, it isn't reanalyzed again. Your mount is read only so I am not sure how you are adding media or what that process is.

Inuakio · December 26, 2021, 1:37am

ok so then if the cache is populated already with data from a few scans then i shouldnt need to worry about the chunk size and the other flags at the moment because they shouldnt be getting hit when plex scans again and there is no new content.
It should be just hitting the cache which should be much faster then a similar mount with no cache. But that isnt happening. the regular vfs no cache is out performing when downloading it from the remote directly vs when accessing the already populated cache.
one scans in a few minutes and the other in 2 to 3 hours.

Animosity022 · December 26, 2021, 2:03am

Cache mode is faster.

You'd have share logs/be specific as it's playing from local disk.

The cache mode has no impact in comparing a scan as a scan of analyzed media is just comparing file stamps/sizes.

Inuakio · December 26, 2021, 3:24am

i will get some logs so you can see. how long do you want? i assume just create a new log and start a scan? the whole scan would be a lot for the log no?
do you think attribute cache time might help?

Animosity022 · December 26, 2021, 3:32am

As long as you feel is enough to reproduce your issue.

The more complete the log, the more chance of seeing if there is an issue related to rclone.

Nope.

I think your main issue is your media isn't analyzed and paths/things change so it has to reanalyze or you have some extra attributes on in Plex require deep analysis.

Inuakio · December 26, 2021, 3:37am

the path isnt changing though...... nothing is being uploaded to this drive, everything is being uploaded to another drive. so i mount it to same spot everytime and nothing is being moved or uploaded to it.
i also already have all the analysis things turned off in plex.....

its already gone through the scan multiple times now and the paths arent changing but its still performing less well then vfs with no cache by a large margin.

Animosity022 · December 26, 2021, 8:25am

Again, that's not how it works so without a log file, I really can't offer much more.

When things are analyzed,no downloading happens so having a cache or not a cache isn't relevant.

If you want to share a log file, we can progress forward.

asdffdsa · December 26, 2021, 4:06pm

please post both rclone mount commands, one with vfs file cache, one without the vfs file cache.

are you running rclone vfs/refresh before each plex scan?

fwiw, i do not use the vfs file cache, equivalent to --vfs-cache-mode=off

for the emby media scan, to minimize the amount of data downloaded,
i have configured emby to not create thumbnails, previews and so on.
otherwise, for each new media file, emby would have to download the entire media file.

VBB · December 26, 2021, 7:51pm

Just to give you an idea, I have "Upgrade media analysis during maintenance" checked in Plex' Scheduled Tasks, and every night from 1AM to 8AM it analyzes roughly 5,000 items. I believe Plex counts one media file as three separate items for this purpose, so you do the math

Like @asdffdsa, I do not use the cache. A full library scan of ~250,000 media files takes about 4 minutes.

system · January 25, 2022, 7:51pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.