Rclone Mount --vfs-cache-mode full slower than none, why?

  • Google Drive
  • rclone v1.61.1
    • os/version: ubuntu 22.04 (64 bit)
    • os/kernel: 5.15.0-60-generic (x86_64)
    • os/type: linux
    • os/arch: amd64
    • go/version: go1.19.4
    • go/linking: static
    • go/tags: none
  • 1Gbps Down/500Mbps Up Connection

Why is it --vfs-cache-mode=full seems to be slower to start a video on Plex than --vfs-cache-mode=none? It is my experience that, at times, running a cache mode results in a >30sec start time for streaming videos. While none results in a <5sec delay.

In this case, the server is a dual SSD setup, using Linux software raid 1, so I can't imagine the drives would be much of a bottleneck. Below is a copy of my service file, note that rclone-mount is just a symlink to rclone so I can tell the difference in the processes easily. The lines prefixed with a '##' are the cache options I have tried, originally based on @Animosity022 Github repo.

[Unit]
Description=Google Cloud Rclone Daemon
After=network-online.target

[Service]
Type=simple
Environment=RCLONE_CONFIG=/usr/local/etc/rclone.conf
ExecStart=/usr/local/bin/rclone-mount mount googlecloud-decrypt-remote: /mnt/cloud \
#   --read-only \
   --allow-other \
   --devname GoogleDrive \
   --drive-chunk-size 128M \
   --dir-cache-time 9999h \
   --log-level NOTICE \
#   --log-level DEBUG \
   --log-file /var/log/rclone.log \
   --poll-interval 10s \
   --umask 007 \
   --rc \
   --rc-addr 127.0.0.1:5572 \
   --rc-no-auth \
   --user-agent backupmount \
##   --cache-dir=/tmp/rclone-cache \
##   --vfs-cache-mode full \
##   --vfs-cache-max-size 500G \
##   --vfs-cache-max-age 9999h \
##   --vfs-cache-poll-interval 5m \
##   --vfs-fast-fingerprint \
##   --vfs-read-chunk-size 128M \
##   --tpslimit 12 \
##   --tpslimit-burst 0 \
##   --disable-http2 \
   --uid=0 \
   --gid=998

ExecStartPost=/usr/bin/rclone rc vfs/refresh recursive=true --rc-addr 127.0.0.1:5572 _async=true
ExecStartPost=echo ONLINE > /mnt/cloud/status
ExecStop=/bin/fusermount -uz /mnt/cloud
Restart=always
RestartSec=5

[Install]
WantedBy=default.target

Rclone.conf looks like this

[googlecloud]
type = drive

[googlecloud-decrypt-remote]
type = crypt
remote = googlecloud:secure
filename_encryption = standard

Logs:
Log 0001

Without a log file, it's really just a guessing game.

Is your cache disk slow? IO Bound? that's about it.

1 Like

@Animosity022 Cheers mate, I've attached a log of a single file being played. One think that looks a bit odd is this file was partially watched and I simply did a resume. However, the log looks like it downloaded the entire file. To be fair the cache was empty, but why would it redownload the entire file? Am I mistaken in thinking that rclone was seeking to the part required and just playing from there?

As for Disk IO, the drives are SSD and have a good SMART status. Additionally, the server only is used for Plex and as such the only IO I'm seeing is related to Plex and Netstat, so pretty quiet really. I'll try to grab some more meaningful metrics on that in a bit.

The snip in the beginning of the log missing as I usually like to see that as well as it'll confirm the settings/versions/etc.

The log seems pretty normal in terms of reading through the files as lots of 'present true' which means it's just reading for the local cache.

I've been running the new cache for quite some time and my starts are generally 2-4 seconds at most as I am direct playing everything on my main device and no transcoding.

Are the server/client in the same location or is one remote? Transcoding or Direct Playing?

I'll grab you a fresh log from start to finish. The machine is remote, but the playback is Direct.

Out of curiosity, you're saying rclone scanning from the beginning of the file to the point where the playback starts is normal? I ask as the file starts playing as soon as the scan for pieces in the cache gets to the resume point. It seems to me that this might be what is causing my frustration as it is trying to read up to that point instead of just seeking to resume point and playing.

Does that make sense or is my understanding of rclone incorrect?

Rclone doesn't pick or decide what it does for file access.

Plex decides what it does and what it's asking for.

Normally, it'll analyze the file and do a few opens when you start playback.

If it's mostly coming from cache, there isn't much rclone is doing.

If you ffprobe a file or use mediainfo, that's what Plex does for analyzing a file.

I get generally 2s-4s for that if it's not in cache.

real	0m2.264s
user	0m0.060s
sys	0m0.000s

Cached is instant:

real	0m0.059s
user	0m0.050s
sys	0m0.008s

and then playback starts after that usually.

If you have a very quiet system and just play 1 thing, it be semi easy to pick out and line up when you hit play and line up the click on play to when it starts.

I did that painful analysis some time back testing out a plethora of various mount options.

Cheers for the information as usual mate! I'll play around with it and see what I get.

Ok, let me ask something on this same topic. Let us assume I have two different copies of a file, one at 720p and another at 1080p. Let us also assume Plex "decides" to use the 720p version when I first play the video and then I stop the video halfway through. Then I come back and play the video again, note the 720p should still be in the cache at this point, however, Plex "decides" to stream the 1080p file. Would this result in the delay that I'm seeing upon resuming said video? Could it be the cache is downloading the new file and not starting until it has read up to the desired position?

Not really. It would seek the position in the file and resume play.

You'd see that level of detail in the debug log file as well.

Ok thanks, I'm gonna do some probing and see what I can find. Just as an update, I tried mediainfo and it did indeed respond quickly in either case I'm also seeing this delay when not using the vfs option. I'm at a bit of a loss and am going to try and capture a few clean logs and hope I can capture the issue happening as well.

The oddest thing is it seems like either Plex or Rclone is reading the file from the beginning to the point of the resume. Again this is not every time just randomly, however, it does seem to happen more often if a resume is initiated after some time.

Just to make sure we mean the same thing.

Rclone doesn't do anything on its own. An application (Plex/mediainfo/ffrprobe/whatever) does the request.

For something like mediainfo, it does a bunch of a few open and closes and you'll see a seek:

ChunkedReader.openRange at 8953835520 length 134217728

In the debug log, which is an openRange request as it jumps around the file.

A resume does the same thing in Plex.

Indeed, I only mention Plex and Rclone as they are currently black boxes to me, so I have no way to know where exactly the issue lies. That being said, it does seem a little odd that a filesystem overlay, if we can call rclone that, would go off on its own and request a reread of a file from its beginning up to the point of seeking when Plex is asking to seek to a specific point. Then again it seems equally absurd that Plex would request a re-read of all the chunks up to the point of seeking instead of just jumping in at the correct spot.

Not trying to be dogmatic, just trying to explain what I'm observing. :smiley: All things being equal the only culprits I can think of are rclone, plex, or some other linux sybsystem like fuse going off the rails. To that end, I'm going to do some more poking and prodding to see what I can find; see if I can find a little order in the apparent chaos.

Cheers as always for the information!

The best bet would be to kick everyone off, turn the mount in debug, timestamp / write down when you are doing the testing and share the log / times and it can all be somewhat correlated in the logs.

Plex does some things on its own but generally, it aligns with the rclone logs.

If something is going wrong (which it very well can), the logs are the best as that'll give us the data to analyze and look at.

It's like you were reading my mind, exactly what I have on my to-do list. :smiley:

One thing you could try is using --buffer-size=0. @Ole recommends this and it may work for you. I think there is probably a bug in the VFS if this is the solution, but I haven't figured out where yet.

Worth a try, I'll push that in and see what I get. Have not had a chance to get those clean logs yet, life being life and all just refuses to give me time to make a cup of coffee, let alone debug a server. :stuck_out_tongue:

@ncw Is there a post for this recommendation you could link?

UPDATE: I've also noticed that if I use dd to test the read speed, the read from the mount vs the vfs is slower. Is this to be expected? I've outlined what I'm seeing below the three commands read a chunk of a file, one that has not been cached. The first one is longer due to it being not in the cache, the second is a read from the mount point after it being cached, the third is a read directly from the cache directory.

FILE="file.mp4"

# time dd skip=300000 count=10000 if="/mnt/cloud/$FILE" of=/dev/null bs=4k
10000+0 records in
10000+0 records out
40960000 bytes (41 MB, 39 MiB) copied, 1.81329 s, 22.6 MB/s

real    0m1.816s
user    0m0.009s
sys     0m0.026s

# time dd skip=300000 count=10000 if="/mnt/cloud/$FILE" of=/dev/null bs=4k
10000+0 records in
10000+0 records out
40960000 bytes (41 MB, 39 MiB) copied, 0.0835121 s, 490 MB/s

real    0m0.091s
user    0m0.004s
sys     0m0.025s

# time dd skip=300000 count=10000 if="/tmp/rclone-cache/vfs/cloud/$FILE" of=/dev/null bs=4k
10000+0 records in
10000+0 records out
40960000 bytes (41 MB, 39 MiB) copied, 0.0138731 s, 3.0 GB/s

real    0m0.015s
user    0m0.011s
sys     0m0.004s

He he!

Check out

FUSE file systems are slower than normal file systems as the data has to make more trips in and out of the kernel.

The third test is likely coming straight out of the kernel cache as rclone will have just written it.

Try (as root)

sync; echo 3 > /proc/sys/vm/drop_caches 

Before doing the 3rd test and that will drop the file out of the cache.

That said there is probably optimization we could be doing in rclone to make reading cached files faster!

Ok, I was able to run a playback of a file, one that had never been played before and here is what I see. Note, this is not technically a clean log as I'm having a hard time getting that at the moment, but I did filter it down and at this point, once you take a look, I think you will agree there is some oddness going on here. I thought that this might have something to do with seeking, however, I think this pretty much puts a nail in that.

The above is a single play click on plex for a video file that I had not played before. As you can see it does some initial probing the proceeds to read the ENTIRE file once through and then plays the file at the end. At this point I'm thinking it is an issue with the Android version of the Plex app as I have Vizio and Roku devices that do not display this issue in their native players, yet the android app on my firetv, kindle devices, and phone all have the issue.

Any concurrence on this?

Can you please share the full log? The snippet at the top is really the key part.

What part? This may seem silly, but I use logrotate to rotate the logs and as such the files don't really have a start per se. This mount has the vfs disabled in the config as I wanted to rule out vfs as the issue, hence it is missing from the first line.

2023/03/09 23:52:49 DEBUG : rclone: Version "v1.61.1" starting with parameters ["/usr/local/bin/rclone-mount" "mount" "googlecloud-decrypt-remote:" "/mnt/cloud" "--allow-other" "--devname" "google_drive" "--dir-cache-time" "9999h" "--log-level" "DEBUG" "--log-file" "/var/log/rclone/rclone.log" "--poll-interval" "10s" "--umask" "007" "--rc" "--rc-addr" "127.0.0.1:5572" "--rc-no-auth" "--user-agent" "main-storage" "--uid=0" "--gid=998"]
2023/03/09 23:52:49 NOTICE: Serving remote control on http://127.0.0.1:5572/
2023/03/09 23:52:49 DEBUG : Creating backend with remote "googlecloud-decrypt-remote:"
2023/03/09 23:52:49 DEBUG : Using config file from "/usr/local/etc/rclone.conf"
2023/03/09 23:52:49 DEBUG : Creating backend with remote "googlecloud:secure"
2023/03/09 23:52:49 DEBUG : Google drive root 'secure': 'root_folder_id = 0AKThML1A4E2XUk9PVA' - save this in the config to speed up startup
2023/03/09 23:52:50 DEBUG : Encrypted drive 'googlecloud-decrypt-remote:': Mounting on "/mnt/cloud"
2023/03/09 23:52:50 DEBUG : : Root: 
2023/03/09 23:52:50 DEBUG : : >Root: node=/, err=<nil>