How to cache only metadata on Cache remote?

Harry · March 10, 2020, 9:36pm

Reading on this thread, I found that Rclone Cache can cache only metadata.

[zcache]
type = cache
remote = sdrive200:
chunk_size = 0
info_age = 1h0m0s

Is this correct?

And what kind of database the rclone uses on the cache backend? At first I thought it was SQLite, but looks like that isn't the cache. Can I read the .db file outside of rclone?

calisro · March 11, 2020, 12:33am

Yes I believe so. If you turn off the chunk total size. If you do this you should also turn off the dir-cache otherwise you're double caching.

A bolt database

Yes it's just a file. I'm not sure how many tools are out there too so it. But you can write some go code. You could try this. Not sure.

thestigma · March 11, 2020, 1:43am

@Harry Let me know what your results are if you try this Harry. I had completely forgotten about this an never got to test it out.

IMO I am not super impressed with the cache backend as a whole (inefficiencies and several bugs that unlikely to get fixed now), but if it is possible to just use the metadata part of it (which seems to be stable and working well) - that might be pretty beneficial.

calisro · March 11, 2020, 3:24am

VFS already mostly stores the same metadata. Are you just looking for it to persist reboots? My sure I understand the "why" you want to use it for metadata.

thestigma · March 11, 2020, 3:34am

The VFS's caching with a cache-warmup script is what I use currently.

The reason why you'd want the metadata cached is obviously for the massive performance benefits in searching and general snappy response when browsing through, as fetching the data on-demand has a lot of latency and searches are otherwise just painful... especially via a mount that can not leverage --fast-list

This solution works fine, but it does have 2 non-trivial downsides:

Does not persist between sessions, so you'll have to warm the cache each time. This may take less than a minute or several minutes depending on how complex the folder structure is.
The VFS will currently store all this info in RAM. For me this is not much of an issue as my listing fits in about 150MB, but I have seen some huge collections that actually need a GB or more to keep all that data. At point it's no longer trivial... and a database on an SSD is probably more than fast enough anyway in terms of performance. I expect the software can't leverage the full speed of the RAM by a longshot.

Nick has indicated that he has several large changes coming down the road for the VFS - which (according to my understanding) will include persistence in some things like the "upload-list" ect. Using the same framework it's probably quite possible to dump the VFS cache data and re-load it upon restart. We've at least aired that idea before. That would at least solve one problem (having to re-cache each session).

ncw · March 11, 2020, 8:24am

This is my 3 point plan

Goals in order

Enable async writeback in a secure way
Enable --vfs-cache-mode full not to download the entire file immediately
Persistent metadata caching

I'm working on 1&2 at the moment which are related. 3 needs a bit more work but it isn't nearly as tricky!

thestigma · March 11, 2020, 2:06pm

Oh wow, so we are really getting a proper read-cache in the VFS? Fantastic!
Been waiting a long time for that

A request in that regard while you are working on it:
See if you can have a method that lets you load up the cache with files while live.
For example:
"each day, sync all remote files less than 2MB to the cache"

Basically if we would use the normal rclone filtering and flags to choose the input that would be excellent.

I know you can basically do that manually now with the existing VFS cache, but trying to do it live seems like the VFS loses track of what is in the cache, and it's not practical to shut down the system frequently to do these things.

If we get smart chunking for it sometime later down the road then similarly it would be extremely useful to be able to specify "sync the first chunk of all files to cache". Ok ok... I will try not to calm myself now

Harry · March 11, 2020, 2:15pm

chunk_size = 0

seems to be working perfectly fine, it only caches metadata.

thestigma · March 11, 2020, 2:21pm

I did have one issue with the DB you should keep an eye open for...

I noticed last time I did an extensive test of the cache that the DB kept getting written to constantly, and at a rate that compared to it's size would indicate it was basically doing a total rewrite multiple times a second. (even when very little was actually happening).

Granted this was not with a metadata-ony setup, and it could have due to other things (and it was a long time ago now), but it's worth checking that in resource manager. I'm not fond of constant write-access simple for wear reasons.

calisro · March 11, 2020, 3:32pm

Mine crashes when using chunk_size = 0

rclone version
rclone v1.50.2
- os/arch: linux/amd64
- go version: go1.13.4



2020/03/11 11:30:45 http: panic serving 127.0.0.1:39124: runtime error: integer divide by zero
goroutine 191 [running]:
net/http.(*conn).serve.func1(0xc001ec6fa0)
        /opt/hostedtoolcache/go/1.13.4/x64/src/net/http/server.go:1767 +0x139
panic(0x143d700, 0x23ca7b0)
        /opt/hostedtoolcache/go/1.13.4/x64/src/runtime/panic.go:679 +0x1b2
github.com/rclone/rclone/backend/cache.(*Handle).Seek(0xc000a34000, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/runner/work/rclone/src/github.com/rclone/rclone/backend/cache/handle.go:328 +0x62a
github.com/rclone/rclone/backend/cache.(*Object).Open(0xc001ec6960, 0x1979760, 0xc0000c0028, 0xc00033ac00, 0x1, 0x1, 0x0, 0x1, 0x203000, 0x203000)
        /home/runner/work/rclone/src/github.com/rclone/rclone/backend/cache/object.go:232 +0x170
github.com/rclone/rclone/backend/crypt.(*Object).Open.func1(0x1979760, 0xc0000c0028, 0x0, 0x3203220, 0xc001e10e98, 0xa32543, 0x18, 0x28)
        /home/runner/work/rclone/src/github.com/rclone/rclone/backend/crypt/crypt.go:775 +0x115
github.com/rclone/rclone/backend/crypt.(*cipher).newDecrypterSeek(0xc000172b00, 0x1979760, 0xc0000c0028, 0xc000a15fb0, 0x0, 0x3200000, 0x14d0ae0, 0x40c301, 0xc000a15fb0)
        /home/runner/work/rclone/src/github.com/rclone/rclone/backend/crypt/cipher.go:751 +0xf9
github.com/rclone/rclone/backend/crypt.(*cipher).DecryptDataSeek(0xc000172b00, 0x1979760, 0xc0000c0028, 0xc000a15fb0, 0x0, 0x3200000, 0x4ebb24, 0xc00092ad00, 0xc0002e5500, 0x7fd5d3e876d0)
        /home/runner/work/rclone/src/github.com/rclone/rclone/backend/crypt/cipher.go:1029 +0x64
github.com/rclone/rclone/backend/crypt.(*Object).Open(0xc001d9dc60, 0x1979760, 0xc0000c0028, 0xc00033abe0, 0x1, 0x1, 0x2, 0x2, 0x203000, 0x203000)

Harry · March 11, 2020, 3:45pm

I was using the latest version.

thestigma · March 11, 2020, 3:51pm

The cache itself hasn't been updated in forever though... so I'm not sure if the general version of rclone would matter much (although it's not impossible for sure).

calisro · March 11, 2020, 4:19pm

Yep no maintainer for cache... I'm not really worried about. Figured i'd give it a go. Its definitely caused by the chunk_size parameter in my case. I'm also using it with a https/webdav serve rather than a mount. I don't see why that would matter but who knows.

thestigma · March 11, 2020, 4:32pm

actually that could be a bug, so maybe @ncw should know about it and say if it's worth reporting.

calisro · March 11, 2020, 4:43pm

Can you post your relevant cache entry in the rclone.conf and the mount command you used? I want to cross-reference.

Harry · March 11, 2020, 4:52pm

Oops, I wasn't using mount. Just plain cache.

[zcache]
type = cache
remote = sdrive200:
chunk_size = 0
info_age = 1h0m0s

ncw · March 12, 2020, 7:57am

Yes, that is exactly what the backtrace is saying too - it is trying to divide by a 0 chunk size. I I guess 0 chunk size isn't supported!

Harry · March 12, 2020, 7:59am

Then why it is working for me? (Not using mount.)

ncw · March 12, 2020, 8:05am

First step should be data being cached (1 & 2 above). Basically --vfs-cache-mode full will work as people expect.

Files or file metadata? Assuming file data you could probably do it with something like rclone cat --max-size 2MB /path/to/mount --discard which would freshen all the metadata and fetch all the 2MB or less files.

I'm planning on implementing chunking with sparse files and keeping a note of which chunks have been downloaded. You could do that with cat also with the --head flag.

ncw · March 12, 2020, 8:09am

The crash is in Seek which the mount uses a lot, but isn't used much outside the mount.