Rclone slow read on RPi

agneev · August 23, 2019, 4:34pm

Hi,

I'm getting slow read speed from my clients when playing video files from my rclone mount which is shared via SMB to the LAN.

Apart from videos buffering, directories take a long time to load, especially their directory sizes (a way to cache them maybe?)

I'm using Raspberry Pi 3B+ running Raspbian Buster.

rclone v1.48.0
- os/arch: linux/arm
- go version: go1.12.6

thestigma · August 23, 2019, 7:50pm

Does the problem only appear if you access it via the share? (is access on the drive directly better?)
What cloud service do you connect to? There are definitely some good ways of pre-caching the listing of the entire drive and thus making navigating feel just as snappy as a local disk - but what options you have depends a lot on the backend remote and what functions they have to use.
In terms of the video buffering, how are they being accessed? Are you just opening them directly in a media player, or do they go via software like Plex or similar?
It always helps a lot if you can share your mount command + the remote config (just remember to redact any password or secrets before posting)

agneev · August 24, 2019, 2:01pm

RPi runs headless at the moment.

Google Drive - Shared/Team Drive.

Not sure what you mean by backend remote. I'm not using cache.
Also I'm open to ways to cache directory structures.

Both. On the RPi itself, Plex runs and accesses rclone's mounts.

On my Mac, I tend to use QuickTime or IINA... so yes, a media player opens them from Finder.

Sure. Here you go.

Mount command

[Unit]
Description=gdrive-team_media_mount
Wants=network-online.target
After=network-online.target

[Service]
Type=notify
ExecStart=/usr/bin/rclone mount \
        --config=/home/pi/.config/rclone/rclone.conf \
        --allow-other \
        --cache-dir=/disks/1TB/rclone/cache-dir \
        --drive-chunk-size 128M \
        --vfs-cache-mode writes \
        --fast-list \
        media:Movies/ /media/movies
Restart=always
RestartSec=10
ExecStop=/bin/fusermount -u /media/movies
User=pi
Group=pi

[Install]
WantedBy=multi-user.target

Remote config

[media]
type = drive
client_id = xxxxx
client_secret = xxxxx
scope = drive
token = xxxxx
team_drive = xxxxx

Thank you.

thestigma · August 25, 2019, 4:58pm

Sorry for late reply. It's a bit hard to diagnose if we can't exclude the share itself being a problem. I really don't enough experience with the finer details of SMB shares know if this is even a likely issue. I don't see anything in your config that worries me at all.

As for pre-caching directories, there are a couple of ways:

the cache backend can do it (it creates a local database for the job), but unless you also want the other features of the cache-backend (and the downsides) then this may not be the best solution
The VFScache can be set to something like:
--attr-timeout 8700h
--dir-cache-time 8760h
--poll-interval 10s (not all cloud services support this, but Gdrive does)
This will remember directories as it goes - but if you just do a full listing once as the drive starts it will have fully precached everything for as long as it runs. You can trigger a full listing most elegantly and efficiently by using --rc to enable the remote controll, and then as part of your mounting script - run a cache warmup script that sends the rc a command to list the whole remote to pre-cache.
NOTE: This is pretty safe, but not 100% safe. In the rare case you try to edit a file that was uploaded by another user (not via this mount) to the same drive AND the 10second polling interval has not picked up the change yet then under some circumstances it could result in file corruption. Pretty unlikely to happen in practice, and polling can be set to even faster if you wish, but be aware of this if you are dealing in mission-critical data or something.

Here is the cache warmup script I currently use, note that it is for windows so minor syntax changes are required for Linux. I'm sure Animosity has an example for the one he uses on Linux in his "my recommended settings" thread also. I'm not an experienced shell scripter by any means, so this might not be elegant but it works fine and reliably:

@echo off
:: If window is not minimized, restart it as minimized
if not DEFINED IS_MINIMIZED set IS_MINIMIZED=1 && start "" /min "%~dpnx0" %* && exit

title VFSCACHEWARMUP
echo Checking drive is mounted and ready...

:: Check that the folder is valid, otherwise wait until it is
:LOOP1
vol X: >nul 2>nul
if errorlevel 1 (
echo Drive not yet ready. Retrying...
timeout /t 2 > nul
goto LOOP1
) else (
echo Drive ready, continuing...
)

echo Warming up cache...
echo This may take a few minutes to complete
echo You may use the cloud-drive normally while this runs
echo This window will close automatically when done
::alternative slower method that does not use the remote control
:
::dir /s > nul
rclone rc vfs/refresh -v --fast-list recursive=true
exit

The key command in all that is the
rclone rc vfs/refresh -v --fast-list recursive=true

There is one caveat to all this (the VFS apraoch). I think that the way the VFS caches directories and file attributes is technically in the kernel and not really a part of the cache or the mount itself. I am not sure therefore if other OSes accessing it via a share can benefit from it (without maybe running their own cache warmup locally?). That is too low-level for me to really answer. I think you just have to try and see.

Animosity022 · August 25, 2019, 5:27pm

You really do not want to set that value anything other than the default as that can lead to corruption.

It's documented here:

https://rclone.org/commands/rclone_mount/#attribute-caching

I wouldn't change this as there is really no benefit minus making a lot of API calls.

thestigma · August 25, 2019, 5:41pm

Well, yes and no. There is a (small) risk as I already noted, but according to NCW this should be pretty safe on remotes that support polling (it would not be safe at all on ones that don't). I will quote what he said to me when I asked him specifically about this.

The format of the question was in statements that he could confirm or deny:

The relatively low polling rate is specifically intended to lower the time-frame of any risky files. I agree that would be wasteful unless you used a longer refresh for file-attributes like I do here. This should use 10 API calls in 100 seconds, ie. 1% of the total quota. I find that to be an acceptable amount.
If you wanted to avoid this risk-factor entirely you could just skip caching the file-attributes as you suggest, but that also does make the pre-caching less powerful.

If this info is incorrect/incomplete or I have just misunderstood something I definitely want to know about it as I use this myself.

ncw · August 26, 2019, 8:40am

What you want to set the polling interval to depends very much on your usage patterns.

Possible danger comes where

the local mount has info about the file in the VFS directory cache
it is changed in the remote drive not via the mount
the poll interval hasn't come round yet
you use that file in the local mount

So what can happen? If the file is cached (vfs or cache backend) then you'll read that file instead of the new one. If the file isn't cached then the directory cache has the wrong size. So you'll read the old size of file rather than the new. This is something rclone is quite likely to notice when transferring from the file.

Writing to the file will be no problem, you'll just upload the new file.

agneev · August 30, 2019, 12:44pm

What are the downsides of using cache?
Also a cache wraps around a remote right, so if the remote is named "media" and the cache is named "media_cache", the mounting script should be media_cache:/ [MOUTING_DIR] right?

thestigma · August 30, 2019, 2:29pm

You can call the cache remote anything you want, so the mounting script doesn't necessarily need to look different (but it makes sense to call it something like GdriveCache or similar to not forget what it is). The important bit is where you point the cache remote. It should be pointed at whatever the remote name of your Gdrive is. That's basically how it gets wrapped - so communication goes like :
OS --> mount (if used) --> cache --> Gdrive (and the same back in reverse for incoming data).
Each link in this chain is generally not aware of eachother. The OS thinks the mount is just a hard-drive, the mount thinks it's talking directly to a cloud-remote ect.

To name a few downsides to cache that spring to mind:

You can't read any data before a full chunk has been fetched, unlike with the VFS. Therefore the cache is far more reliant on small chunks to stay responsive, which also impacts your API calls (although this is usually not such a big problem for most use-cases)
There is generally more overhead involved due to more steps in fetching data
More modules mean more things can go wrong. You have more settings that might cause issues if not set appropriately (outside of defaults) and more potential for bugs the more lines of code are involved.
Speaking of bugs I've noticed some things that don't work well with the cache at all. Specifically the tmp-upload has a lot of wonky behavior and cache-writes (although perhaps not technically a bug) probably does not do what you actually think it does (it does not retain any written data once uploaded). I therefore generally recommend avoiding both. The read-caching portion does seems to work well and without issues though.
Lastly, the cache doesn't have anyone specifically dedicated to maintaining it. The original author has gone silent and it's unknown if he will return. It is hard for NCW to make big changes in another persons code. All in all this means the cache is unlikely to see a lot of development going forward.

I think my understanding of the future-plans is that the aim is to integrate some of the missing features (like read cache) into the VFS which he has a deeper understanding of rather than NCW trying to spend a lot of time deep-diving into unfamiliar code. This solution would also just generally be far more effective as it wouldn't have a lot of the inherent limitations associated with trying to do caching-operations in two different modules which can't communicate.

I'm not trying to nay-say the cache-backend. Don't misunderstand me. It has it's uses and it's benefits may outweight the downsides for some use-cases. I used it initially, but eventually settled on only using the VFS at the cost of losing the read-caching (which was the major thing I wanted) because I felt it wasn't quite worth the trouble and also doesn't seem like the way forward to making a better system in the long-term. Use it it makes sense for you - but just have some idea of what the pros and cons are and don't just add it blindly because it exists and sounds like it would be useful

Hope this was informative and let me know if you wish any clarifications

system · November 28, 2019, 2:29pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.