Rclone VFS cache maximum size per file

Ataraxia · April 3, 2022, 11:24pm

Hello, I've been using rclone for many years, but this is my first time on the forums. I apologize if this has already been requested.

Problem: Gdrive API limits prevent mass transferring of many small files. This limit is not noticed on large files as they take a while to transfer anyway.

I am hoping for a flag to set the max file size that VFS caches long-term, or set alternative cache times for differing size files. This way, the cache can be solely dedicated to small files, creating an overall better experience.

For example, if I can limit VFS to only cache files below 10MB, my cache will fill up much slower without impacting performance on large files much.

I'm curious to hear your feedback

ncw · April 4, 2022, 2:03pm

There has been some discussion about strategies for working out which files rclone should keep in the cache and which files rclone should delete.

I think this is the same request: VFS Cache: control caching of files by size · Issue #4110 · rclone/rclone · GitHub

There is also another thread I can't find about using a different strategy for caching based on other parameters other than just least recently used.

I have a nice paper somewhere about a a better strategy, but there is an overview here of lots of different policies: Cache replacement policies - Wikipedia . Currently rclone uses LRU

Ataraxia · April 4, 2022, 3:58pm

Ah I see, my apologies!

I think him and I definitely had the same purpose, except he stated it a lot more eloquently than me
The --vfs-cache-file-max-size and vfs-cache-file-min-size sounds like a great idea to me.

I think different caching algorithms are a lot more complex than this though (unless there was a term for this specific purpose). LRU is fine; I am just attempting to solve the issue with gdrive API limits on many small files.

I hope others see the potential of this, unfortunately I am not much of a programmer, or I would solve it myself. If there is some sort of bounty I would definitely financially contribute for such a feature.

ncw · April 4, 2022, 8:37pm

No apologies necessary! Any topic is fair game on the forum

I found the paper I was talking about: "LRU-SP: A Size-Adjusted and Popularity-Aware LRU Replacement Algorithm for Web Caching"

This paper presents LRU-SP, a size-adjusted and
popularity-aware extension to Least Recently Used (LRU)
for caching web objects. The standard LRU, focusing on
recently used and equal sized objects, is not suitable for
the web context because web objects vary dramatically in
size and the recently accessed objects may possibly differ
from popular ones. LRU-SP is built on two LRU exten-
sions, namely Size-Adjusted LRU and Segmented LRU. As
LRU-SP differentiates object size and access frequency, it
can achieve higher hit rates and byte hit rates. Further-
more, an efficient implementation scheme is developed and
trace-driven simulations are performed to compare LRU-
SP against Size-Adjusted LRU, Segmented LRU and LRV
caching algorithms

The basic idea is to give each object in the cache a score

nref / (size * delta_t)

where nref is the number of times the object has been referenced, size is its size in bytes and delta_t is the time since it was last accessed.

Imagine calculating that for all objects in the cache then throwing away the one with the smallest value (nearest to 0). This will mean that older files, larger files will be discarded in preference over newer and smaller files and files with fewer references will be discarded in preference to files with lots of references.

I think this does exactly what you want without having to have knobs to tune on the cache.

What do you think?

Ataraxia · April 5, 2022, 3:02am

Yes, this looks like a great solution!

I may be overemphasizing, but I bet many people would benefit from this - they just don't know they need it yet. I hope it receives more traction.

If it ever does see the light of day, I will be front and center for testing. Thank you for the explanation!

ncw · April 5, 2022, 9:40am

This turned out not to be very hard to implement, so I've given it a go. Have a go with this - it implements LRU-SP so will prioritise keeping smaller files and more often accessed files.

v1.59.0-beta.6062.37322ea6b.fix-vfscache-strategy on branch fix-vfscache-strategy (uploaded in 15-30 mins)

Ataraxia · April 5, 2022, 2:26pm

Thank you Nick!

I did some initial testing with a small cache. I accessed some pictures first, and then accessed a video over half the size of the total cache (1G). When I exceeded the cache it correctly removed the video and kept the older accessed photos.

I will continue to play around with it of course and test it at a much larger scale over the next couple days, but this is already extremely promising.

ncw · April 5, 2022, 6:51pm

I'll be interested to hear your long term testing results!

The above patch while functionally complete, needs docs and tests.

I also noted it would be possible to make the replacement policy changeable with a flag or config option. Do you think that is a good idea?

Ataraxia · April 5, 2022, 9:10pm

Yeah I think a flag would be sufficient, with the default just being LRU. It is what everyone has come to expect.

I think the people that use rclone more actively and at a larger scale will surely notice the feature and try it out.

system · June 4, 2022, 9:10pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.