VFS Cache purge - Prioritise file size instead of file age

Hi, with my particular use case of rclone with VFS i'd prefer it to cache small files for as long as possible, and larger files I'd only need for a minimal amount of time.

As far as I can tell from playing with the settings, and reading all the VFS parameters this option isn't quite possible yet. If it is can someone let me know, if not then can it be a feature please.

I've tried messing around with Cloudflare (introduces intermittent problems with Chromecast), and Nginx caching (Introduces problems where certain videos will cache corrupted and refuse to play).

Using with Emby, I keep all my movie posters and other accessory files in the same folder as the media file (because keeping them local takes up too much space, makes migrating take way too long, and is easier to manage directly) eg:

[gdrive storage]
Movie folder/poster.jpg
Movie folder/Movie.mp4

This of course all gets cached into the VFS when it's needed. First load takes a second while it pulls from gdrive into the local file system obviously. Loading the home page that's nothing but poster files with empty cache takes forever.

But then the next time it needs that poster, it's already there in the VFS. Great. Also useful for having that movie file there so rewinding or exiting and reopening on another device is speedy.

However currently when the local storage fills up, the VFS purges the oldest files first. Not a concern for big movie files, it's unlikely to be needed again by any user in the near future. But the poster files that get loaded regularly for the home page and browsing I would really like to stay more or less indefinitely (until they inevitably fill up storage themselves).


Long story short; can there be a parameter like:

-vfs-cache-purge-priority [ age | size | access ] When cache storage fills, delete [age] oldest files first, [size] biggest files first, [access] least recently used first.

VFS should be purging the least recently used files, but I think that is what you meant.

When the cache fills up rclone make a list of all the files, sorts them by last accessed then deletes the least recently used until there is enough storage space.

I don't think a pure sort by size then delete the biggest first will do what you want as it will delete the largest movie file first, which might be the one just just watched or are in the middle of watching.

I think what is needed is a heuristic, something like - round the last usage of every file to the nearest hour (or day) then in that bucket of files start by deleting the largest first.

There is purely pragmatic way of doing this - you could write a script to read 1 byte from all the files you want to keep every day...

PS I found a paper relevant to this - I'll read it and see if it gives me any ideas!

Yeah, I was thinking it would have similar rules to the current purging rules about not deleteing in-use files. Some sort of hierarchy for safety would be a given.

I thought it was pure age of file, but if it's last accessed might give that a go. Would it still work if the file being read was the one inside the rclone cache, or does it have to read the source file on the mount?

I do forsee a possible problem that only affects me. My library is so big that keeping 100% of poster files on hand in the cache at all times simply wont fit, let alone have enough space for video files being played. Increasing the drive size on the VM adds cost which I'm hesitant to do for obvious reasons.

I'm not sure if this would be more complex to program, it feels a bit more specific to my case instead of a simple "delete big file". How about different [--vfs-cache-max-age]s based on a file size variable. Something like this:

--vfs-cache-max-age >10mb 1h
--vfs-cache-max-age <10mb 30d

Also thanks heaps for considering it.

It would have to read the source file on the mount.

I guess it would be possible to make an rc command to do this, so stop a file being purged from the cache.

That makes sense.

That is a nice simple idea.

I've put this paper on my reading list: https://ieeexplore.ieee.org/document/884690 which explores using time of last access, size and popularity as ways of tuning which things get removed.

Least recently used for variable sized objects is definitely wrong!

Plus + 1 for -vfs-cache-purge-priority
i have some very but small configs files from /etc/ that should go well before GB of VM image dumps.
for backup purposes this would be a big plus.

You mean the small files should be kept longer in the cache than the big one? Or shorter?

Hey. Sorry for the confusion. I must admit I should have read the thread more carefully.
I was talking about how the cache gets send to the actual store while this thread is talking about how the cache is being emptied/purged.
I confused "emptied" with "offloaded" to actual backend store.

The Scenario:
Lets say I 30 gigs of Cache. My Backup does its nightly routine and dumps 30gig of files in it. 100MB of which small files and important config data and and a 29.9gig 8k Movie.
Bandwidth being the limiter this is going to take quite a while to sync to the remote. If the movie has been written to cache shortly before the "small" files the the small stuff isn't going to show up in the backend storage until the full 29gig movie has been transferred.

So to make it short: I wish I could tell the cache to prioritize sending the smaller files to the backend first so I could have quicker access to them on the remote storage.

I understand this is Offtopic now. - my bad

At the risk of derailing my own thread.

Plus 1 for also being able to control the upload priority of VFS. My upload bandwidth sucks so having smaller files go up first would be super useful.

Hmm, so what you want is the --order-by flag to work on an rclone mount.

That doesn't sound impossible to me... Can you please make a new issue on github about that.

Aren’t file folder.jpg files frequently accessed hence keeping them fresh in the cache while large movies are removed when they age out already? That’s how i understand emby and VFS cache to work. If your cache storage isn’t large enough for all your meta data then you probably need to prune content, or add some additional storage.

No frequently enough. Quick example of how I've been using it today:

Load up the app > all the posters on the home screen download. Takes a little while because they're not in the cache yet.

I'm rewatching SG1 in HD currently so I put on a episode. Fills up 5GB. > Next episode plays, another 5GB. etc. >

By the time I'm nearing the end of the season the cache has purged the old poster files and is filled with a bunch of episodes I have no plan to rewatch again anytime soon. > I click back to the home screen to watch a different show, all those posters have to be downloaded fresh again.

And that's just me. Another users home screen's posters gets wiped out too while they're away. If any one of my users including me watches a bunch of shows or movies, it makes Emby load slowly for everyone else because all their posters are now gone.

If rclone can prioritise purging larger files that haven't been used in the last hour and leave the smaller poster files for as long as possible, I would get a speedier Emby because those posters will survive the purges and still be in the cache. That's my main goal here.

Yes I have a lot of media which I refuse to prune. And I'd prefer to avoid adding extra storage as it gets costly real quick when you're dealing with cloud storage. Besides it would only help a little, but what would help a LOT is if we can get rclone to have more choices on what it prefers to purge.

Interesting, thanks for the additional details. In my emby program folder I have a cache folder which stores a local copy of resized/reformatted JPGs from my remote file store. I think the local emby cache may be preventing the VFS cache from being hit and updated accordingly. Maybe you can configure emby to prevent it to using locally cache and hit the remote backend instead and keep the images cached?

My only concern is that this is still an edge case introduced by your refusal to prune your dataset, and preference to not add additional storage for your specific data needs which burdens rclones developers who could spend time working on features that would benefit a larger proportion of the community. These kind of edge cases contribute to code bloat adding complexity, require addition testing and maintenance etc. I'm not saying they should not add it though, or this may not have benefits to others, but that the ask isn't 'free' for everyone else.

The VFS cache is definately being hit. I've played around with this a lot to try and reduce loading times. I had the most luck with a extra layer of cache with using nginx since I can control it a bit more. But unfortunately there's a weird bug where about a 3rd of my video files just completely fuck with it. Might have to take ncw's advice for the meantime and write up a script to find the jpgs in the cache folder and then read the corresponding file on the remote every now and then. And then maybe let it not run once a week to clear out bloat.

I feel like refusing to delete most of my library or add several hundreds of gigs worth of local storage (many $ per month extra) isnt too unreasonable.

I don't expect this to be coded any time soon. But I feel like giving the user a little more control about how the cache acts can be beneficial to others too. Maybe that [--order-by] parameter could also be a way.

Edit: Wrote that script to read all the jpgs every 30 mins to keep them fresh and it seems to be working well for now. Hopefully it will self regulate by clearing out jpgs at random when it starts to fill up and then someone watches a movie. Here it is below for anyone else to use or modify.

Step 1 > Change max age of VFS back to 1 hour.
Step 2 > Make the below script run every 30 minutes via a crontab - 1,31 * * * *
Script > Replace USER (or whole path) and PATH with your setup.

#!/bin/sh
# Rclone cache vfs folder:
folder=/home/USER/.cache/rclone/vfs

# Rclone mount path:
mount=/PATH

# Magic code
cd $folder
find -type f -name '*.jpg' | cut -c 3- | while read filename
do
tail "${mount}/${filename}" > /dev/null 2>&1
done