Testing for new vfs cache mode features

I'm running in full mode now, and might have hit another corner case, but it will take some work for me to replicate again. During a post-processing operation, a "No such file or directory" error was reported right after moving over the file, before the upload started.

I think the key piece here may be that the cache wouldn't accommodate the moved file without some purging happening. Limit is set to 40GB, here's the log including the post-processing entries where the move is started, rclone purging some files, the post processor unable to read the file it just moved, and the eventual completion of the file copy.

2020/06/19 12:29:11 INFO  : Cleaned the cache: objects 107 (was 107) in use 1, to upload 0, uploading 0, total size 39.037G (was 39.020G)
2020/06/19 12:30:11 INFO  : Cleaned the cache: objects 107 (was 107) in use 1, to upload 0, uploading 0, total size 39.055G (was 39.037G)

Jun 19 12:30:52 HOST CouchPotato.py[359]: 06-19 12:30:52 INFO [tato.core.plugins.renamer] Moving "/data/download/process/REDACTED.mkv" to "/data/media/REDACTED.mkv"

2020/06/19 12:31:11 INFO  : REDACTED2.mp4: vfs cache: removed cache file as over quota
2020/06/19 12:31:11 INFO  : REDACTED2.mp4: vfs cache: removed metadata from cache as over quota
2020/06/19 12:31:11 INFO  : Cleaned the cache: objects 107 (was 108) in use 2, to upload 0, uploading 0, total size 39.496G (was 39.055G)
2020/06/19 12:32:11 INFO  : REDACTED3.mp4: vfs cache: removed cache file as over quota
2020/06/19 12:32:11 INFO  : REDACTED3.mp4: vfs cache: removed metadata from cache as over quota
2020/06/19 12:32:11 INFO  : REDACTED4.mp4: vfs cache: removed cache file as over quota
2020/06/19 12:32:11 INFO  : REDACTED4.mp4: vfs cache: removed metadata from cache as over quota
2020/06/19 12:32:11 INFO  : REDACTED5.mp4: vfs cache: removed cache file as over quota
2020/06/19 12:32:11 INFO  : REDACTED5.mp4: vfs cache: removed metadata from cache as over quota
2020/06/19 12:32:11 INFO  : REDACTED6.mp4: vfs cache: removed cache file as over quota
2020/06/19 12:32:11 INFO  : REDACTED6.mp4: vfs cache: removed metadata from cache as over quota
2020/06/19 12:32:11 INFO  : Cleaned the cache: objects 103 (was 107) in use 2, to upload 0, uploading 0, total size 39.714G (was 39.496G)

Jun 19 12:32:38 HOST CouchPotato.py[359]: 06-19 12:32:38 ERROR [tato.core.plugins.renamer] Couldn't move file "/data/download/process/REDACTED.mkv" to "/data/media/REDACTED.mkv": Traceback (most recent call last):
Jun 19 12:32:38 HOST CouchPotato.py[359]: OSError: [Errno 2] No such file or directory: '/data/media/REDACTED.mkv'

2020/06/19 12:33:11 INFO  : REDACTED7.mp4: vfs cache: removed cache file as over quota
2020/06/19 12:33:11 INFO  : REDACTED7.mp4: vfs cache: removed metadata from cache as over quota
2020/06/19 12:33:11 INFO  : REDACTED8.mp4: vfs cache: removed cache file as over quota
2020/06/19 12:33:11 INFO  : REDACTED8.mp4: vfs cache: removed metadata from cache as over quota
2020/06/19 12:33:11 INFO  : Cleaned the cache: objects 101 (was 103) in use 1, to upload 0, uploading 1, total size 39.803G (was 39.714G)
2020/06/19 12:34:11 INFO  : Cleaned the cache: objects 101 (was 101) in use 1, to upload 0, uploading 1, total size 39.803G (was 39.803G)
2020/06/19 12:35:11 INFO  : Cleaned the cache: objects 101 (was 101) in use 1, to upload 0, uploading 1, total size 39.803G (was 39.803G)
2020/06/19 12:36:11 INFO  : Cleaned the cache: objects 101 (was 101) in use 1, to upload 0, uploading 1, total size 39.803G (was 39.803G)
2020/06/19 12:37:11 INFO  : Cleaned the cache: objects 101 (was 101) in use 1, to upload 0, uploading 1, total size 39.803G (was 39.803G)
2020/06/19 12:38:11 INFO  : Cleaned the cache: objects 101 (was 101) in use 1, to upload 0, uploading 1, total size 39.803G (was 39.803G)
2020/06/19 12:38:16 INFO  : REDACTED.mkv: Copied (new)
2020/06/19 12:38:16 INFO  : REDACTED.mkv: vfs cache: upload succeeded try #1

This happened 3 times actually, with all files in question being > 10GB, so extra time for file move/copy operations may have factored in to this. Is that enough to go on @ncw? I wasn't running with debug logging on at the time, and there were no errors at the INFO level. This was with build v1.52.1-120-gfcc32a46-vfs-beta and the following mount options:

  --attr-timeout 0 \
  --dir-cache-time 24h \
  --poll-interval 15s \
  --vfs-cache-mode full \
  --vfs-cache-max-age 6h \
  --vfs-cache-max-size 40G \
  --fast-list \
  --drive-skip-gdocs \
  --drive-use-trash=false \
  --drive-disable-http2=false \
  --allow-other \
  --umask 002 \
  --gid 1001 \
  --uid 1000 \
  --rc \
  --use-mmap \
  --log-level INFO

The sequence of events is this

  • start a copy into the cache
  • a cache expire runs
  • you get a file not found error

Is that right?

How do you move files into the cache? Is there a rename involved,

The cache didn't delete the file you were moving in did it?

I'm playing with this now, and I noticed funny behavior during a google "download quota exceeded" error. Reads just hang, and core/stats shows this:

{
        "bytes": 0,
        "checks": 0,
        "deletes": 0,
        "elapsedTime": 25.558658748,
        "errors": 661,
        "fatalError": false,
        "lastError": "vfs reader: failed to write to cache file: open file failed: googleapi: Error 403: The download quota for this file has been exceeded., downloadQuotaExceeded",
        "renames": 0,
        "retryError": true,
        "speed": 0,
        "transferring": [
                {
                        "bytes": 0,
                        "eta": null,
                        "group": "global_stats",
                        "name": "<redacted>",
                        "percentage": 0,
                        "size": 4468654463,
                        "speed": 0,
                        "speedAvg": 0
                }
        ],
        "transfers": 0
} 

I guess I would have expected this to return an IO Error, same as if VFS wasn't enabled. I'm not sure if this is new, as I wasn't actually using vfs before trying the new full mode.

Edit: And I have an ffprobe instance hung waiting on disk now.

Edit 2: Now unbanned. ffprobe completed after hanging for the couple hour wait.

I love sparse files :slight_smile:

% du -hs ~/.cache/rclone/vfs
17G     /home/frank/.cache/rclone/vfs
% du -hs --apparent-size ~/.cache/rclone/vfs
23G     /home/frank/.cache/rclone/vfs

The cache expire could certainly be the culprit here, the file is large enough that it's almost a given that the expire task will run while it's still moving.

This is being done in a Python 2.7 app, using shutil.move(), which looking at the docs, when moving across filesystems, will actually do a copy, followed by an attempt to set the same file attributes as the source file. So that could be exacerbating things. But I don't believe any rename is done once the file is on the destination.

Finally, I don't believe that this file was reaped from the cache, as there would have been plenty of files with an older access time, and later in the logs, I do see the removal of this file from the cache.

I'll try and replicate this with a different cache path and a small max-cache size and a large enough file with debugging on and see what I can find.

1 Like

What do you think rclone should do here? At the moment it will keep trying forever and it did succeed eventually! Maybe it should error out after a while? After so many consecutive errors...?

I always thought once you hit that a mount, pretty much life was over.

Not sure if others have different experiences, but I've never seen that error come out.

I guess it should error out as it won't work until quotas reset.

This looks great, awesome work as always. I'm trying to implement rclone for nginx cache chunk storage with drive so this is perfect as I'm already using full mode as nginx does operations with the files (creates temp and then renames) that are simply better on disk.

I've got a couple of questions about this:

  1. Will it wait for max-age to remove the files from disk? Based on my preliminary testing it looks like it does as it's uploaded chunks but the files remain on disk.

  2. If max-age and max-size are set, which will take precedence? I believe with the old method it was max-age checked every cache-poll-interval, is that still the case? Will it check for stale chunks every minute by default and evict files >max-age from local cache regardless of cache size?

Edit: neverming q3, found out I was running rclone as root so the chunks are in root's home dir. I've removed the question as it's no longer relevant.

Thanks!

I noticed that if a file is written to Google Drive outside of the VFS mount, I never receive a notice via inotifywait that a new file was created/moved_in/etc...

I do however get these notices when local operations happen on the mount.

I'm not sure about the polling features.

Nick could you shed some light on this?

Ideally, when a new file is created / updated / etc having the appropriate notices on the local filesystem would be ideal.

I'm testing the VFS branch for my personal Google Drive at this point so happy to help get to the bottom of this.

If you have a local file and upload it to your google drive, rclone google drive polling would be pick it up based on your polling time. Is that not happening?

Do you mean you have that was uploaded via the cache and replaced on your Google Drive?

I can't figure out how a local file system and inotify comes into play.

I think we are saying the same thing.

If I monitor an rclone mount via inotify and make changes via local access (cp, rm, mkdir, etc) the appropriate notifications seem to be emitted. If however a file is uploaded to the backend by means other than the local mount, no events are emitted. Without "polling" I would not necessarily expect them to be emitted, but if "polling" is happening, it should probably go ahead and emit the events.

So the real questions come down to:

  1. What is "polling" actually doing in the case of a VFS mount (since it is mostly an on-demand mechanism at this point.

  2. What could "polling" be doing in the case of a VFS mount?

  3. What does "polling" look like for something other than a mount or is it only for a mount?

  4. Who implemented "polling" and would they be willing to add some functionality.

  5. Could the idea of "polling" be implemented via something like a "sync --continuous" feature that continually syncs a source remote with a destination (basically what I'm doing between two filesystems using inotify).

  6. Can "polling" actually use a smarter wait mechanism per remote to get delta updates rather than rescanning an entire remote (if it isn't already doing this).

Polling is only for picking up changes remotes. There is no 'polling' that happens on a rclone mount locally as it picks up changes on remotes.

I think you are moving out of topic for the cache mode and it's best to make a new topic and we can discuss that new feature you are talking about.

Maybe a configurable --vfs-retry-count? Does it use any sort of backoff timing?

Polling is only for picking up changes remotes. There is no 'polling' that happens on a rclone mount locally as it picks up changes on remotes.

I'm not sure what words I typed to make you think I meant anything other than that.

I think you are moving out of topic for the cache mode and it's best to make a new topic and we can discuss that new feature you are talking about.

I'm certainly not, but granted I'm trying to expand the dialog slightly. Having spent the last 15 years architecting and building complex data related software my experience says these types of questions could dramatically help whoever is planning and implementing the functionality -- but hey I could be wrong -- please confirm this isn't the type of dialog you want.

More toward your moderation style:

  1. On my VFS Beta mount I see nothing related to polling happening -- what should I be expecting here? (new files are not presented, metadata isn't being brought locally, etc)

  2. I have a pretty broad range of mounts I deal with. Right now I rely on the legacy Cache backend heavily. Could "polling" be used to help with metadata challenges we experience (the answer is of course -- but I pose the question to help spur the dialog).

2 - 5) Per your moderation style I will not pursue.

I suggested you create a new post as it's a good dialogue and I wanted to make sure it got attention and didn't get lost in this feature thread as it had merit.

In my mind, I would think that would come with the next thing @ncw was talking about which was a 'database' of metadata locally, which is what the current cache backend does now. I don't believe that was in the scope for this part though.

I've also just hit the Failed to download: vfs reader: failed to write to cache file: open file failed: googleapi: Error 403: The download quota for this file has been exceeded., downloadQuotaExceeded error, which actually got stuck trying to read that file for hours overnight, but the rest of the mount kept working. Had to restart the mount to clear it. Trying to read the file again right now, it produces the error again. I can kill the process that was trying to read the file and the error continues though, so there doesn't seem to be any retry limit right now or attempt to stop reading it when the file is closed.

On the topic of files missing while uploading...

I haven't been able to replicate this with a small simple test-case, but I have observed that it doesn't actually seem to matter if the cache is full or not. So I'm going to attempt to find a smaller test-case with the python2 move method and see if that's what's actually triggering this.

Is that a normal error for you? Do you get that often? The regular mount or this one would have issues if that popped up as well as I don't think there is a good way to handle it as it might be one file or the whole thing.

Nope, checking logs from my old cache backend mount that didn't use VFS, I don't have any 403 errors in the last month of use. So I can't speak to this being an issue with the previous VFS code or not, or how the cache backend would have handled a 403 error, all I can say if that I haven't experienced this error at all before under the cache backend.

Hmm, that's really strange. Can you check your API console and look for drive.gets and see how that looks before / after?

Should look something like this but the scale is probably different for you:

I'm curious to see if that ramped up as you really should not see those errors.

I switched to the VFS beta on the 18th. So no major differences in traffic. But you can clearly see where the get request kept happening until I restarted the mount
image

Yeah, that's really strange :frowning: I am not sure offhand what would cause that download quota issue.