Removing/recreating cache file upon ENOSPC

leoluan · July 10, 2020, 12:11pm

I am experimenting with a solution when we Rclone mount runs out of cache space by removing cached files that are not dirty (i.e., no writes have been done to them). At this time the code doing the removal (and a new item.open()) is synchronous as a part of a retry upon an ENOSPC error in item.ReadAt(). It seems to work and I was able to get similar performance compared to when I had enough cache space to avoid the ENOSPC error.

@ncw Can we include a solution like this as a stop-gap before chunk-level cache replacement is added? This would enable us to use full cache mode for files that are cached only for reads without requiring the cache space in a ramdisk tmpfs to be larger than file size, which can be many TBs.

While working on this solution I found item.MetaDirty flag being set and it's not clear to me how that is used and what action needs to be taken before removing/resetting the cache file when MetaDirty is true. The MetaDirty flag is turned into true even when the file is only read (i.e., not changed).

ncw · July 10, 2020, 3:08pm

Great! Are you going to send a pull request for this?

I think it sounds like a great idea. Chucking stuff which hasn't been modified is fine with me.

When MetaDirty is set the meta file will be written back to the disk. This is set when we update the Atime which is how we know which files to delete first. This happens even when files are opened read only.

leoluan · July 10, 2020, 5:29pm

Thanks! Yes, I intend to send a pull request.

Is the write-back of Atime needed for the backend (in our case, S3)? Which function does this metadata write back? I guess it's a meta file inside the cache dir?

ncw · July 10, 2020, 10:32pm

Yes. It is how we find the least recently used file in the cache.

Not at computer! These end up in the vfsMeta heirachy in the cache dir.

leoluan · July 13, 2020, 7:35pm

I don't seem to find the code that sends the Atime info to S3. You mean writing back the Atime to the vfsMeta hierarchy on the local disk in a per-file meta file, right?

Although the experimented solution to do a retry in item.ReadAt to remove and recreate the cache file seems to work now, I hope to improve it so this fix leverages the cleaner thread to examine all clean items to remove. The current fix only removes the item that got the ENOSPC. The timer-driven cleaner thread will be turned on synchronously (assuming it's feasible) upon ENOSPC if not already running and it will remove clean open files once the out-of-space flag is turned on. What do you think about that approach?

When do you estimate 1.53 will be released?

ncw · July 14, 2020, 2:28pm

That is correct. We don't send it back to S3.

Great!

So give the cache cleaner a kick if we receive ENOSPC? That sounds like a good idea.

In this bit of code in vfs/vfscache/cache.go

unc (c *Cache) cleaner(ctx context.Context) {
        if c.opt.CachePollInterval <= 0 {
                fs.Debugf(nil, "vfs cache: cleaning thread disabled because poll interval <= 0")
                return
        }
        // Start cleaning the cache immediately
        c.clean()
        // Then every interval specified
        timer := time.NewTicker(c.opt.CachePollInterval)
        defer timer.Stop()
        for {
                select {
                case <-timer.C:
                        c.clean()
                case <-ctx.Done():
                        fs.Debugf(nil, "vfs cache: cleaner exiting")
                        return
                }
        }
}

You could add another channel - say kick which causes the cache to clean now. This could then be sent a message when the ENOSPC is received.

leoluan · July 17, 2020, 5:15pm

Yes, I got the kick channel for resetting cache of clean open files implemented. It seems to work well with my use case now. I will go through the instructions on pull request and tests now.

ncw · July 17, 2020, 10:54pm

Great - well done

leoluan · July 22, 2020, 8:39am

@ncw I pulled from the master today in preparation for my vfs-clean-cache-reset branch. It looks like the recent removal of the vendor directory caused the following errors. What step(s) might I be missing?

(a partial list of the errors during go build)
backend/azureblob/azureblob.go:23:2: module github.com/Azure/azure-storage-blob-go@latest found (v0.10.0), but does not contain package github.com/Azure/azure-storage-blob-go/azblob

fs/sync/pipe.go:10:2: module github.com/aalpar/deheap@latest found (v0.0.0-20200318053559-9a0c2883bd56), but does not contain package github.com/aalpar/deheap

cmd/mount2/file.go:11:2: module github.com/hanwen/go-fuse/v2@latest found (v2.0.3), but does not contain package github.com/hanwen/go-fuse/v2/fs

My OS and Go versions:
"Ubuntu 18.04.4 LTS"
go version go1.14.2 linux/amd64

ncw · July 22, 2020, 9:30am

What does go env say?

You'll want

GO111MODULE=""

Is the vendor directory removed in your checkout? If not make sure you remove it completely.

leoluan · July 22, 2020, 10:10am

Looks like I have the correct GO111MODULE value and the vendor directory is removed in my checkout.

leo@tapsrv01:~/go/src/github.com/leoluan1/rclone$ go env |grep GO111
GO111MODULE=""
leo@tapsrv01:~/go/src/github.com/leoluan1/rclone$ ls
backend  cmd	  CONTRIBUTING.md  Dockerfile  fs      go.mod  graphics  MAINTAINERS.md  MANUAL.html  MANUAL.txt  rclone.1   README.md	 VERSION
bin	 contrib  COPYING	   docs        fstest  go.sum  lib	 Makefile	 MANUAL.md    notes.txt   rclone.go  RELEASE.md  vfs

ncw · July 22, 2020, 7:18pm

You might want to blow the module cache away. Here is a blog with someone having a similar problem: https://lecstor.com/go-clear-cache/

leoluan · July 22, 2020, 9:53pm

Thanks! This step helps the go build of rclone on Ubuntu. But on MacOS, I got the following errors during rclone build after cleaning the cache. Might it be related to the recent osxfuse change?

../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/error_std.go:26:20: undefined: errNoXattr
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:171:12: undefined: mount
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:236:24: undefined: maxWrite
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:425:32: undefined: maxWrite
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:810:12: undefined: openFlags
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:828:18: undefined: openFlags
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:845:18: undefined: openFlags
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:868:18: undefined: openFlags
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:991:12: undefined: openFlags
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:1413:20: undefined: maxWrite
../../../../pkg/mod/bazil.org/fuse@v0.0.0-20200524192727-fb710f7dfd05/fuse.go:1413:20: too many errors

ncw · July 22, 2020, 10:17pm

Odd... Did you update bazil/fuse? - the latest version doesn't work with macOS but the version in go.mod should work.

All the CI is passing on macOS so I guess you must be picking up the wrong version somehow.

Sorry for all the mysterious problems! Go normally doesn't do this.

leoluan · July 22, 2020, 11:00pm

The modcache cleaning step did appear to download bazil.org/fuse.

bash-3.2# go clean -cache -modcache -i -r

go: downloading google.golang.org/api v0.29.0

go: downloading github.com/dropbox/dropbox-sdk-go-unofficial v5.6.0+incompatible

go: downloading golang.org/x/sync v0.0.0-20200625203802-6e8e738ad208

go: downloading golang.org/x/oauth2 v0.0.0-20200107190931-bf48bf16ab8d

go: downloading github.com/mitchellh/go-homedir v1.1.0

go: downloading github.com/spf13/cobra v1.0.0

go: downloading github.com/pkg/errors v0.9.1

go: downloading github.com/pkg/sftp v1.11.0

go: downloading github.com/jzelinskie/whirlpool v0.0.0-20170603002051-c19460b8caa6

go: downloading golang.org/x/text v0.3.3

go: downloading github.com/atotto/clipboard v0.1.2

go: downloading github.com/hanwen/go-fuse v1.0.0

go: downloading github.com/prometheus/client_golang v1.7.1

go: downloading github.com/zeebo/errs v1.2.2

go: downloading bazil.org/fuse v0.0.0-20200524192727-fb710f7dfd05
...

ncw · July 22, 2020, 11:04pm

does your go.mod differ from the go.mod in the master branch?

leoluan · July 22, 2020, 11:14pm

I was building the master. But the master was changed and go.mod was different. I guess it's probably the result of my trying to run go mod tidy command when I ran into the cache cleaning need. After a git reset, it's building now. Thanks!

ncw · July 23, 2020, 4:50pm

Great, glad you are running again!

leoluan · July 30, 2020, 9:33am

@ncw Are the changes supposed to pass the go.1.11 and go.1.12 tests? I am using github.com/pkg/errors in vfs/read_write.go (errors.Is()) for detecting the ENOSPC error and it fails 1.11 and 1.12 tests complaining about the errors package's inclusion. But this package seems to be also used elsewhere in rclone. Also, syscall.ENOSPC is acceptable for all platforms, but not other_os. Any suggestion?

ncw · July 30, 2020, 9:44am

errors.Is is part of the standard library and was only introduced in go1.13 so you can't use that for the moment unless we conditionally compile the code for go1.13 and above.

pkg/errors just calls that and it looks like it isn't compiled in for < go1.13

I think we will have to write our own IsErrNoSpace() function...

See fserrors.Cause for an error unwrapper which should work for all go versions.

I suggest you define an IsErrNoSpace() function in fs/fserrors which uses fserrors.Cause and compares it with syscall.ENOSPACE

I would suspect (can you link the build results?) that it is only plan9 which doesn't define this, so you could conditionally compile the plan9 IsErrNoSpace to always return false.