Hmm, well the two caches work a bit differently here.
the cache-backend fetches and stores chunks, so it can cache partial files.
the VFScache does not chunk, so it will only operate on whole files. At least so far. I wouldn't be surprised if we can get chunking behaviour for partial-file caching for this too at some point, but I'm not aware of this being one of the "near future" priorities. It currently doesn't make sense as long as it's only caching writes anyway as you would never want to partially upload anything. So there would need to be read-support before this feature makes sense. I will suggest to NCW that we maybe do that right off the bat when the time for that comes.
This spesific question is pretty straightforward though. As long as you do not use VFS mode full you only download the parts of a file that you request. This is actually most efficient without any cache-backend because you can fetch exactly what you need, while the cache-backend grabs arbitrarily sized chunks and is probably always going to download more than strictly needed in a partial read.
As an example - if I have a huge RAR archive with quick-open information, I can open it via the remote and list the contents inside very quickly within a few seconds. I don't need to download the whole thing, just the index inside the archive (at the start of the file). If I then pick a single file from the archive to extract I can similarly only download that part of the data.
What is the spesific use-case this is tailored for? Or is it an attempt at general optimization? It will matter what you intend to achieve, so please describe this to me in as much detail as possible so I can suggest how to best optimize.
I can think of two ways to do more aggressive caching right off the bat that would be fairly practical and have some benefits, but they are slightly different in what they achieve. But before I explain them, let me just summarize real quick what the limitations we are trying to overcome are (assuming you use Gdrive):
- Opening files are fairly slow
- Seeking in files are pretty quick once the file is open (ie. jumping around to grab different bits of it, sort of like jumping ahead in a media file). A seek does require an API call however.
- You can only open about 2-3 new files pr second (on writes, I think reads are a little more forgiving) (this is a backend limitation and not related to the API quota). This is the primary reason why more than 4-5 --transfers on Gdrive is rather pointless.
- I rarely consider the API quota to be much of a problem outside of malfunctioning or badly behaving programs. Because you are almost certainly going to run into the above limitation long before you max out the 1000request pr 100seconds API quota.
Thus the strategy should be not primarily to limit API calls, but instead to limit the amount of file-open operations we need to do. Small files are your enemy!!
Using the cache-backend:
You can use the cache/fetch function in the RC to pre-fetch the first X chunks of all files, see the documentation for that here:
This would of course make sure that all the tiny files (that fit inside a singe chunk for example) are already in the cache and do not need to be opened from the cloud at all, assuming they haven't changed. You can also quick-access the data that resides in the first part of a larger files (this often includes various metadata so it could be very beneficial for searches and scans that do not limit themselves to basic attributes (size, name, modtime, createdtime).
To automate this you can just have a script that runs daily to refresh the cache and re-download the pieces that had changes in them since yesterday.
Using the VFScache:
This is what I use myself currently. I do a simple rclone sync Gdrive: C:\RcloneVFSCache --max-size 10M --fast-list (Windows example). Just because the VFScache is a write-cache doesn't mean we can't just add files to it as we see fit It's just not automatically included. This ensures that all the small files (that kill my performance) will be pre-cached to the VFS-cache (if they changed since last time). For my use this takes up about 150GB of storage but is more than 70.000 files out of 90.000 files. I probably have a lot more small files than most... Basically this ensures that all files that I have to fetch from cloud because they are not in cache are large enough to get good speed on and usually max out my 160Mbit connection. Since I have a good connection, downloading these larger files are fast enough that I don't really care. In my experience this + pre-listing makes a world of difference and eliminates almost all of the poor performance that comes from Gdrive's poor performance on small files. It effectively works more like I was using a premium Cloud without limits (like Wasabi or Backblaze).
Depending on your need, your local storage you can afford to use for cache and the ratio of large to small files your ideal size may differ to mine of course. Some manual tuning may be required to find the ideal number so you can cache many files, but not end up with an impractically large cache. Using rclone size --max-size XXM --fast-list allows you to check this pretty easily.
Like with the above example, you can run this operation daily to keep the cache fresh and ready.
Sorry for the answer being so long, but I think this includes a lot of good info relevant to your question.
I recommend you answer the "your intended use-case" question above if you want further advice. I have on purpose not covered details about implementation here in order to not make it even longer. Ask if needed