Question about directory cache time

dannymichel · July 27, 2019, 10:47pm

From what i understand this is about files in directories that never change.
The thing is the files themselves will never change and i will never need updated versions of the files themselves but as far as new files within directories those are being uploaded and/or deleted every day.
Since the files themselves never change, shouldnt my directory cache time be forever? If so, can I unmount now, change the value, then remount without having to rebuild the entire cache by doing db cache purge?

thestigma · July 28, 2019, 12:14pm

You may want to rephrase your question as I'm not sure what you are really asking for here.

Are you specifically referring to this VFS flag? --dir-cache-time
If so, what this does is determine the amount of time that the cache considers the it's directories up to date. Once they expire it will ask the cloud drive again as needed for fresh info.

Setting this high will save a couple of list requests, but those aren't too slow. The biggest benefit would probably be the mount feeling snappier. The problem is that if you set it very high then the cache will be unable to see any changed that happened outside of it's control. This would mean for example changes from other users, PCs, or even other rclone instances. If none of these apply to you and it's a purely single-user environment where all interactions on the cloud are performed via the mount then a higher value should in theory be safe to use.

I'm not sure how any of this is really relevant to storing long-term files that won't change. Perhaps you have misunderstood what this parameter is for?

The documentation has this to say about the flag:
Using the --dir-cache-time flag, you can set how long a directory should be considered up to date and not refreshed from the backend. Changes made locally in the mount may appear immediately or invalidate the cache. However, changes done on the remote will only be picked up once the cache expires.

dannymichel · July 28, 2019, 12:18pm

yes i misunderstood but yes, If none of these apply to you and it's a purely single-user environment where all interactions on the cloud are performed via the mount then a higher value should in theory be safe to use. this is the case. the only thing reading these files is plex and i just set it to 360h.
Btw, do i have to use --db-cache-purge for that change to take affect(something else i my be misunderstanding)

thestigma · July 28, 2019, 12:27pm

Hmm, so you also use the cache backend? in that case there are some things you need to be aware of.

In that case the cache backend will be the first point of contact to the cloud, and it will be the main deciding factor for what is considered up to date. Both the cache backend and the VFS cache info about directories and file attributes. It will make no sense (and cause problems) if your expiry timers on the cache are lower than on your mount.

In a mount --> cache setup I would recommend just leaving all such parameters on the VFS default and letting the cache handle it. The cache can have a large timer though if you want that. If the VFS needs to update it's info it just asks the next link in the chain (the cache) which has it's own local database - so that request will be instant.

in short, make sure that --dir-cache-time (this is a VFS setting) is lower than --cache-info age (this is a cache backend setting). Feel free to leave --cache-info-age high though. That will both make the drive snappy to navigate and also save a lot of time on syncs since you always have most of the file an folder metadata locally.

PS: purging the cache DB will not affect the VFS in any way. These two are completely separate modules and they don't talk to eachother. Don't be confused about which settings apply to each. This is a good reason why it's smart to leave the VFS settings on this at default or at least not excessive.

Hope that helps

dannymichel · July 28, 2019, 12:31pm

this is my setup

[Unit]
Description=drive
Wants=network-online.target
After=network-online.target

[Service]
Type=notify
Environment=RCLONE_CONFIG=/opt/rclone/rclone.conf

ExecStart=/usr/bin/rclone mount cache: /mnt/drive --allow-other --gid=1000 --uid=1000 --bind xxx.xx.xx.xxx --dir-cache-time 360h --fast-list --cache-chunk-path=/opt/rclone/rclone-cache --cache-db-path=/opt/rclone/rclone-cache --log-file /opt/rclone/logs/rclone.log --umask 002 --log-level INFO --user-agent rclone
ExecStop=/bin/fusermount -uz /mnt/drive
Restart=on-failure

[Install]
WantedBy=multi-user.target

[cache]
type = cache
remote = drive:
plex_username = xxx
plex_password = xxx
chunk_size = 16M
plex_token = xxx
db_path = /opt/rclone/rclone-cache
chunk_path = /opt/rclone/rclone-cache
info_age = 2d
chunk_total_size = 20G
db_purge = true

[drive]
type = drive
client_id = xxx
client_secret = xxx
scope = drive
token = xxx
use_trash = false
chunk_size = 16M

thestigma · July 28, 2019, 1:42pm

Ok, then I think I understood it correct the first time. The stuff I posted was under the assumption of a Drive <-- Cache <-- Mount config.

This seems ok except for:
dir-cache-time must be set to be at least LESS than info-age. The other way around makes no sense, and also you will inevitably run into problems where the mount shows you outdated information otherwise. The exact time of dir-cache-time is not very important since the cache makes it somewhat redundant.

Also, I note that you run a cache purge on each start. There is nothing wrong with this, but that is kind of counterproductive to the idea of using long expiry timers in the first place. If you only change files via the cache you can keep it "forever" and it will persist through restarts. If you do then the whole cache DB won't need to rebuild every time. Note that if you ever need to do an occational manual reset you can just delete the entire cache folder after closing Rclone and then restart - then it will rebuild automatically without any problems. Sometimes that can be nice to do for troubleshooting, changing the cache chunk size or experimentation with settings.

Animosity022 · July 28, 2019, 2:42pm

The way dir-cache time works depends on the backend being used.

For example, in Google Drive, which supports polling, you can see the dir-cache-time as high and as long as you want. Polling picks up changes every 1 minute regardless of if you upload from another PC or another device. So in general, any backup that supports polling, it's a good idea to use a very long dir-cache-time because that keeps the directory and file structure in memory making less API calls and things are faster since you are going to memory rather than making API calls.

The cache backend sits on top of the default VFS so you need to take that into consideration as the cache has it's own database/directory cache, which is independent of the dir-cache you setup before. That's the reason you want dir-cache-time to be less than the info age so thinks do not become out of sync.

It's best after you make a lot of changes just to purge the cache database to ensure everything is the same.

Assuming you are Google Drive, you really do not want to leave things at the default since it supports polling as you would get much better performance for setting dir-cache-time and info_age to large values such as a week or you could even go larger.

The great strength and weakness of rclone is the massive support for backends it has to having the documentation apply for each situation is unfortunately a bit hard so each backend's page has information relating to that backend.

thestigma · July 28, 2019, 3:46pm

Thanks for that extra info about polling Animosity. Nice to know.
I think most of what I wrote should still apply, but a deeper understanding is always good.

I don't think it would matter if you kept dir-cache-time at default as long as you use cache-backend with a long timer. The VFS should just end up asking the cache locally, making a fairly trivial operation.

Animosity022 · July 28, 2019, 4:24pm

It does matter as you'd have to make an API call if it wasn't in the cache.

The cache is on top of VFS not the other way around.

thestigma · July 28, 2019, 10:18pm

I know that's the order.

But if the VFS ask the cache frequently then it doesn't matter as long as the cache knows the answer (from using a long timer). That query is so trivial that minimizing it by also setting a long dir-cache-time is kind of pointless. It wouldn't hurt I guess, but it also wouldn't do much to benefit you...

Animosity022 · July 29, 2019, 12:10am

The VFS never asks the cache as the cache is on top.

Cache -> VFS which is why the info_age should be longer than dir-cache-time as the cache goes down the layer and checks the "vfs" in your statement.

It's explained here:

thestigma · July 29, 2019, 1:00am

I agree that info-age (cache) should be longer than dir-cache-time (VFS).
From my understanding that is exactly because the mount passes the request up (or down if you prefer I guess) the chain leading to the cloud drive. As long as the cache has the answer it won't really matter how frequently the mount asks since that request is local and thus pretty trivial.

Maybe we are miscommunicating here based on what is "up" or "down". That is ultimately an arbitrary distinction based on conventions (and I may not be using the "right" convention).

So to illustrate, let me draw it out:
Google drive <-- Cache backend <-- Crypt <-- Mount VFS <-- Local OS

The way I read your statement I feel like you are somehow running your cache to the "right" (in this diagram) of the mount, in which case that is a setup that is alien to me. I guess it is possible, but I don't see why you would stack it this way or what the benefit would be.

Not trying to "prove you wrong" here, just trying to make sure we are talking about the same thing so that either you or I can learn from this...

system · October 27, 2019, 1:05am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.