Testing for new vfs cache mode features

This looks great, awesome work as always. I'm trying to implement rclone for nginx cache chunk storage with drive so this is perfect as I'm already using full mode as nginx does operations with the files (creates temp and then renames) that are simply better on disk.

I've got a couple of questions about this:

  1. Will it wait for max-age to remove the files from disk? Based on my preliminary testing it looks like it does as it's uploaded chunks but the files remain on disk.

  2. If max-age and max-size are set, which will take precedence? I believe with the old method it was max-age checked every cache-poll-interval, is that still the case? Will it check for stale chunks every minute by default and evict files >max-age from local cache regardless of cache size?

Edit: neverming q3, found out I was running rclone as root so the chunks are in root's home dir. I've removed the question as it's no longer relevant.

Thanks!

I noticed that if a file is written to Google Drive outside of the VFS mount, I never receive a notice via inotifywait that a new file was created/moved_in/etc...

I do however get these notices when local operations happen on the mount.

I'm not sure about the polling features.

Nick could you shed some light on this?

Ideally, when a new file is created / updated / etc having the appropriate notices on the local filesystem would be ideal.

I'm testing the VFS branch for my personal Google Drive at this point so happy to help get to the bottom of this.

If you have a local file and upload it to your google drive, rclone google drive polling would be pick it up based on your polling time. Is that not happening?

Do you mean you have that was uploaded via the cache and replaced on your Google Drive?

I can't figure out how a local file system and inotify comes into play.

I think we are saying the same thing.

If I monitor an rclone mount via inotify and make changes via local access (cp, rm, mkdir, etc) the appropriate notifications seem to be emitted. If however a file is uploaded to the backend by means other than the local mount, no events are emitted. Without "polling" I would not necessarily expect them to be emitted, but if "polling" is happening, it should probably go ahead and emit the events.

So the real questions come down to:

  1. What is "polling" actually doing in the case of a VFS mount (since it is mostly an on-demand mechanism at this point.

  2. What could "polling" be doing in the case of a VFS mount?

  3. What does "polling" look like for something other than a mount or is it only for a mount?

  4. Who implemented "polling" and would they be willing to add some functionality.

  5. Could the idea of "polling" be implemented via something like a "sync --continuous" feature that continually syncs a source remote with a destination (basically what I'm doing between two filesystems using inotify).

  6. Can "polling" actually use a smarter wait mechanism per remote to get delta updates rather than rescanning an entire remote (if it isn't already doing this).

Polling is only for picking up changes remotes. There is no 'polling' that happens on a rclone mount locally as it picks up changes on remotes.

I think you are moving out of topic for the cache mode and it's best to make a new topic and we can discuss that new feature you are talking about.

Maybe a configurable --vfs-retry-count? Does it use any sort of backoff timing?

Polling is only for picking up changes remotes. There is no 'polling' that happens on a rclone mount locally as it picks up changes on remotes.

I'm not sure what words I typed to make you think I meant anything other than that.

I think you are moving out of topic for the cache mode and it's best to make a new topic and we can discuss that new feature you are talking about.

I'm certainly not, but granted I'm trying to expand the dialog slightly. Having spent the last 15 years architecting and building complex data related software my experience says these types of questions could dramatically help whoever is planning and implementing the functionality -- but hey I could be wrong -- please confirm this isn't the type of dialog you want.

More toward your moderation style:

  1. On my VFS Beta mount I see nothing related to polling happening -- what should I be expecting here? (new files are not presented, metadata isn't being brought locally, etc)

  2. I have a pretty broad range of mounts I deal with. Right now I rely on the legacy Cache backend heavily. Could "polling" be used to help with metadata challenges we experience (the answer is of course -- but I pose the question to help spur the dialog).

2 - 5) Per your moderation style I will not pursue.

I suggested you create a new post as it's a good dialogue and I wanted to make sure it got attention and didn't get lost in this feature thread as it had merit.

In my mind, I would think that would come with the next thing @ncw was talking about which was a 'database' of metadata locally, which is what the current cache backend does now. I don't believe that was in the scope for this part though.

I've also just hit the Failed to download: vfs reader: failed to write to cache file: open file failed: googleapi: Error 403: The download quota for this file has been exceeded., downloadQuotaExceeded error, which actually got stuck trying to read that file for hours overnight, but the rest of the mount kept working. Had to restart the mount to clear it. Trying to read the file again right now, it produces the error again. I can kill the process that was trying to read the file and the error continues though, so there doesn't seem to be any retry limit right now or attempt to stop reading it when the file is closed.

On the topic of files missing while uploading...

I haven't been able to replicate this with a small simple test-case, but I have observed that it doesn't actually seem to matter if the cache is full or not. So I'm going to attempt to find a smaller test-case with the python2 move method and see if that's what's actually triggering this.

Is that a normal error for you? Do you get that often? The regular mount or this one would have issues if that popped up as well as I don't think there is a good way to handle it as it might be one file or the whole thing.

Nope, checking logs from my old cache backend mount that didn't use VFS, I don't have any 403 errors in the last month of use. So I can't speak to this being an issue with the previous VFS code or not, or how the cache backend would have handled a 403 error, all I can say if that I haven't experienced this error at all before under the cache backend.

Hmm, that's really strange. Can you check your API console and look for drive.gets and see how that looks before / after?

Should look something like this but the scale is probably different for you:

I'm curious to see if that ramped up as you really should not see those errors.

I switched to the VFS beta on the 18th. So no major differences in traffic. But you can clearly see where the get request kept happening until I restarted the mount
image

Yeah, that's really strange :frowning: I am not sure offhand what would cause that download quota issue.

@ncw there appears to be a minor bug in logging. If a file is renamed before it gets uploaded, the upload line displays the original name. Does not seem to affect functionality though, when checking the mount and file only the new name exists.

Jun 22 04:49:34 vps-8d4f8d8b rclone[3012]: 2/e0/205f88adbe0d682abb94066e7ab30e02.0000006640: Renamed in cache to "2/e0/205f88adbe0d682abb94066e7ab30e02"
Jun 22 04:49:41 vps-8d4f8d8b rclone[3012]: 2/e0/205f88adbe0d682abb94066e7ab30e02: Copied (new)
Jun 22 04:49:41 vps-8d4f8d8b rclone[3012]: 2/e0/205f88adbe0d682abb94066e7ab30e02.0000006640: vfs cache: upload succeeded try #1

I have a semi-related question with regards to file and directory structure caching. I've always had trouble in shared mounts with folder cache expiring and making traversal of the directory structure very slow. Anecdotally, I've traced it to poll interval cache busting more things that are supposed to be busted. I've somewhat worked around it using vfs/refresh via rc using a timer.

For my current use case (nginx chunks) I do not have any other process running rclone in any other machine, or uploading content into that team drive outside of the mount (all writes happen from a single machine and always through the mount). Can I disable polling in this case? My plan is to use:

polling 0
dir-cache-time 168h
vfs-cache-max-size 120G
vfs-cache-max-age 720

So basically, a 30 day TTL on a 120 gig maximum cache (I've verified that LRU kicks in when the cache size goes over quota which fits my use case) while trying to cache file and directory attributes as much as possible.

Any concerns with this setup? I basically want to avoid drive.list as much as possible as it slows things down considerably (and this is a close-to-user-cache so time matters). I see no harm is caching the structure forever as any new content written via the mount should cache itself in addition to what was already cached. Right? I'll still run vfs/refresh once a day just in case.

Thoughts?

It will wait for max-age or max-size to remove the files from the disk.

They are both used so whichever comes first!

Yes

Yes it will but it evicts files not chunks

FUSE doesn't seem to support creating the inotify events, or at least I haven't figured out how to do it. The mount will certainly know about the new files but there isn't an API for me to call to say - look OS a new file has arrived.

Assuming we are talking about change notify polling - rclone pings google drive every --vfs-cache-poll-interval and says - hey - any file changes? If there are then rclone adjusts any metadata caches it has in RAM.

The polling works for the VFS layer so for mount and serve.

Yes it could do that and that is what google intended it for. At the moment rclone just uses it for invalidating its caches so when you list the directory you see the new files.

I can see if you every got this error rclone would not give up reading the file. I don't think the new cache should make that error more likely but I could be wrong about that.

I've made rclone give up after 10 tries if it gets errors reading the file. That isn't configurable yet. Everything is happening a bit asynchronously so maybe an error count isn't the right solution - maybe it should be a time with no successful reads or something like that.

The retries will happen about every 5 seconds.

This logic may not be correct - would appreciate testing and feedback!

Here is a new beta with that in

https://beta.rclone.org/branch/v1.52.1-140-g5e0d8a03-vfs-beta/

I've been testing the beta very hard and I've managed to get it to pass 1000 rounds of testing without error which is quite a milestone!

I've written unit tests for everything. The downloader ones are a bit sketchy but the downloader is tested very well as part of higher level tests.

Great - I haven't tried to reproduce that problem yet - I've been concentrating on getting the tests 100% reliable. It is possible that has fixed your problem too (ever hopeful!)

1 Like

Err, yes. It is a consequence of a design choice I made to try to reduce the locking to try to reduce the chance of deadlocks. I can fix it if it really bothers you!

I think that should work fine especially since you are running vfs/refresh daily anyway.

Yes it will.

Though I did spot one bug I need to fix - if you stop the mount while an upload is in progress and restart it, the upload will recommence which is great, but the file isn't in the directory listings which it really should be.

1 Like

No worries about the logging bug, that's purely cosmetic and you can trace the previous logs to figure out what the file became so no biggie. That last one with the mount stopping sounds worse though :slight_smile: What would happen with those files not in the listing? Does that mean it would have to wait for polling (or in my case vfs/refresh) to show up? If that's the case I guess I can do more frequent refreshes for now til that one gets squashed. Thanks for all the hard work, this is one of the most amazing and resilient pieces of software I've ever seen I think and I've worked in IT for 20 years.