Hello, I've been using rclone for many years, but this is my first time on the forums. I apologize if this has already been requested.
Problem: Gdrive API limits prevent mass transferring of many small files. This limit is not noticed on large files as they take a while to transfer anyway.
I am hoping for a flag to set the max file size that VFS caches long-term, or set alternative cache times for differing size files. This way, the cache can be solely dedicated to small files, creating an overall better experience.
For example, if I can limit VFS to only cache files below 10MB, my cache will fill up much slower without impacting performance on large files much.
There is also another thread I can't find about using a different strategy for caching based on other parameters other than just least recently used.
I have a nice paper somewhere about a a better strategy, but there is an overview here of lots of different policies: Cache replacement policies - Wikipedia . Currently rclone uses LRU
I think him and I definitely had the same purpose, except he stated it a lot more eloquently than me
The --vfs-cache-file-max-size and vfs-cache-file-min-size sounds like a great idea to me.
I think different caching algorithms are a lot more complex than this though (unless there was a term for this specific purpose). LRU is fine; I am just attempting to solve the issue with gdrive API limits on many small files.
I hope others see the potential of this, unfortunately I am not much of a programmer, or I would solve it myself. If there is some sort of bounty I would definitely financially contribute for such a feature.
No apologies necessary! Any topic is fair game on the forum
I found the paper I was talking about: "LRU-SP: A Size-Adjusted and Popularity-Aware LRU Replacement Algorithm for Web Caching"
This paper presents LRU-SP, a size-adjusted and
popularity-aware extension to Least Recently Used (LRU)
for caching web objects. The standard LRU, focusing on
recently used and equal sized objects, is not suitable for
the web context because web objects vary dramatically in
size and the recently accessed objects may possibly differ
from popular ones. LRU-SP is built on two LRU exten-
sions, namely Size-Adjusted LRU and Segmented LRU. As
LRU-SP differentiates object size and access frequency, it
can achieve higher hit rates and byte hit rates. Further-
more, an efficient implementation scheme is developed and
trace-driven simulations are performed to compare LRU-
SP against Size-Adjusted LRU, Segmented LRU and LRV
caching algorithms
The basic idea is to give each object in the cache a score
nref / (size * delta_t)
where nref is the number of times the object has been referenced, size is its size in bytes and delta_t is the time since it was last accessed.
Imagine calculating that for all objects in the cache then throwing away the one with the smallest value (nearest to 0). This will mean that older files, larger files will be discarded in preference over newer and smaller files and files with fewer references will be discarded in preference to files with lots of references.
I think this does exactly what you want without having to have knobs to tune on the cache.
This turned out not to be very hard to implement, so I've given it a go. Have a go with this - it implements LRU-SP so will prioritise keeping smaller files and more often accessed files.
I did some initial testing with a small cache. I accessed some pictures first, and then accessed a video over half the size of the total cache (1G). When I exceeded the cache it correctly removed the video and kept the older accessed photos.
I will continue to play around with it of course and test it at a much larger scale over the next couple days, but this is already extremely promising.