Removing/recreating cache file upon ENOSPC

I am experimenting with a solution when we Rclone mount runs out of cache space by removing cached files that are not dirty (i.e., no writes have been done to them). At this time the code doing the removal (and a new item.open()) is synchronous as a part of a retry upon an ENOSPC error in item.ReadAt(). It seems to work and I was able to get similar performance compared to when I had enough cache space to avoid the ENOSPC error.

@ncw Can we include a solution like this as a stop-gap before chunk-level cache replacement is added? This would enable us to use full cache mode for files that are cached only for reads without requiring the cache space in a ramdisk tmpfs to be larger than file size, which can be many TBs.

While working on this solution I found item.MetaDirty flag being set and it's not clear to me how that is used and what action needs to be taken before removing/resetting the cache file when MetaDirty is true. The MetaDirty flag is turned into true even when the file is only read (i.e., not changed).

Great! Are you going to send a pull request for this?

I think it sounds like a great idea. Chucking stuff which hasn't been modified is fine with me.

When MetaDirty is set the meta file will be written back to the disk. This is set when we update the Atime which is how we know which files to delete first. This happens even when files are opened read only.

Thanks! Yes, I intend to send a pull request.

Is the write-back of Atime needed for the backend (in our case, S3)? Which function does this metadata write back? I guess it's a meta file inside the cache dir?

:smiley:

Yes. It is how we find the least recently used file in the cache.

Not at computer! These end up in the vfsMeta heirachy in the cache dir.

I don't seem to find the code that sends the Atime info to S3. You mean writing back the Atime to the vfsMeta hierarchy on the local disk in a per-file meta file, right?

Although the experimented solution to do a retry in item.ReadAt to remove and recreate the cache file seems to work now, I hope to improve it so this fix leverages the cleaner thread to examine all clean items to remove. The current fix only removes the item that got the ENOSPC. The timer-driven cleaner thread will be turned on synchronously (assuming it's feasible) upon ENOSPC if not already running and it will remove clean open files once the out-of-space flag is turned on. What do you think about that approach?

When do you estimate 1.53 will be released?

That is correct. We don't send it back to S3.

Great!

So give the cache cleaner a kick if we receive ENOSPC? That sounds like a good idea.

In this bit of code in vfs/vfscache/cache.go

unc (c *Cache) cleaner(ctx context.Context) {
        if c.opt.CachePollInterval <= 0 {
                fs.Debugf(nil, "vfs cache: cleaning thread disabled because poll interval <= 0")
                return
        }
        // Start cleaning the cache immediately
        c.clean()
        // Then every interval specified
        timer := time.NewTicker(c.opt.CachePollInterval)
        defer timer.Stop()
        for {
                select {
                case <-timer.C:
                        c.clean()
                case <-ctx.Done():
                        fs.Debugf(nil, "vfs cache: cleaner exiting")
                        return
                }
        }
}

You could add another channel - say kick which causes the cache to clean now. This could then be sent a message when the ENOSPC is received.

Yes, I got the kick channel for resetting cache of clean open files implemented. It seems to work well with my use case now. I will go through the instructions on pull request and tests now.

Great - well done :smiley:

@ncw I pulled from the master today in preparation for my vfs-clean-cache-reset branch. It looks like the recent removal of the vendor directory caused the following errors. What step(s) might I be missing?

(a partial list of the errors during go build)
backend/azureblob/azureblob.go:23:2: module github.com/Azure/azure-storage-blob-go@latest found (v0.10.0), but does not contain package github.com/Azure/azure-storage-blob-go/azblob

fs/sync/pipe.go:10:2: module github.com/aalpar/deheap@latest found (v0.0.0-20200318053559-9a0c2883bd56), but does not contain package github.com/aalpar/deheap

cmd/mount2/file.go:11:2: module github.com/hanwen/go-fuse/v2@latest found (v2.0.3), but does not contain package github.com/hanwen/go-fuse/v2/fs

My OS and Go versions:
"Ubuntu 18.04.4 LTS"
go version go1.14.2 linux/amd64

What does go env say?

You'll want

GO111MODULE=""

Is the vendor directory removed in your checkout? If not make sure you remove it completely.

Looks like I have the correct GO111MODULE value and the vendor directory is removed in my checkout.

leo@tapsrv01:~/go/src/github.com/leoluan1/rclone$ go env |grep GO111
GO111MODULE=""
leo@tapsrv01:~/go/src/github.com/leoluan1/rclone$ ls
backend  cmd	  CONTRIBUTING.md  Dockerfile  fs      go.mod  graphics  MAINTAINERS.md  MANUAL.html  MANUAL.txt  rclone.1   README.md	 VERSION
bin	 contrib  COPYING	   docs        fstest  go.sum  lib	 Makefile	 MANUAL.md    notes.txt   rclone.go  RELEASE.md  vfs

You might want to blow the module cache away. Here is a blog with someone having a similar problem: https://lecstor.com/go-clear-cache/

Thanks! This step helps the go build of rclone on Ubuntu. But on MacOS, I got the following errors during rclone build after cleaning the cache. Might it be related to the recent osxfuse change?

../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/error_std.go:26:20: undefined: errNoXattr
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:171:12: undefined: mount
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:236:24: undefined: maxWrite
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:425:32: undefined: maxWrite
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:810:12: undefined: openFlags
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:828:18: undefined: openFlags
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:845:18: undefined: openFlags
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:868:18: undefined: openFlags
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:991:12: undefined: openFlags
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:1413:20: undefined: maxWrite
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:1413:20: too many errors

Odd... Did you update bazil/fuse? - the latest version doesn't work with macOS but the version in go.mod should work.

All the CI is passing on macOS so I guess you must be picking up the wrong version somehow.

Sorry for all the mysterious problems! Go normally doesn't do this.

The modcache cleaning step did appear to download bazil.org/fuse.

bash-3.2# go clean -cache -modcache -i -r

go: downloading google.golang.org/api v0.29.0

go: downloading github.com/dropbox/dropbox-sdk-go-unofficial v5.6.0+incompatible

go: downloading golang.org/x/sync v0.0.0-20200625203802-6e8e738ad208

go: downloading golang.org/x/oauth2 v0.0.0-20200107190931-bf48bf16ab8d

go: downloading github.com/mitchellh/go-homedir v1.1.0

go: downloading github.com/spf13/cobra v1.0.0

go: downloading github.com/pkg/errors v0.9.1

go: downloading github.com/pkg/sftp v1.11.0

go: downloading github.com/jzelinskie/whirlpool v0.0.0-20170603002051-c19460b8caa6

go: downloading golang.org/x/text v0.3.3

go: downloading github.com/atotto/clipboard v0.1.2

go: downloading github.com/hanwen/go-fuse v1.0.0

go: downloading github.com/prometheus/client_golang v1.7.1

go: downloading github.com/zeebo/errs v1.2.2

go: downloading bazil.org/fuse v0.0.0-20200524192727-fb710f7dfd05
...

does your go.mod differ from the go.mod in the master branch?

I was building the master. But the master was changed and go.mod was different. I guess it's probably the result of my trying to run go mod tidy command when I ran into the cache cleaning need. After a git reset, it's building now. Thanks!

Great, glad you are running again!

@ncw Are the changes supposed to pass the go.1.11 and go.1.12 tests? I am using github.com/pkg/errors in vfs/read_write.go (errors.Is()) for detecting the ENOSPC error and it fails 1.11 and 1.12 tests complaining about the errors package's inclusion. But this package seems to be also used elsewhere in rclone. Also, syscall.ENOSPC is acceptable for all platforms, but not other_os. Any suggestion?

errors.Is is part of the standard library and was only introduced in go1.13 so you can't use that for the moment unless we conditionally compile the code for go1.13 and above.

pkg/errors just calls that and it looks like it isn't compiled in for < go1.13

I think we will have to write our own IsErrNoSpace() function...

See fserrors.Cause for an error unwrapper which should work for all go versions.

I suggest you define an IsErrNoSpace() function in fs/fserrors which uses fserrors.Cause and compares it with syscall.ENOSPACE

I would suspect (can you link the build results?) that it is only plan9 which doesn't define this, so you could conditionally compile the plan9 IsErrNoSpace to always return false.