Excessive FspFileSystemRemoveMountPoint when using rclone mount with cache on windows

What is the problem you are having with rclone?

We currently use ArcMap (ESRI ArcGIS) on Windows and need do some operations raster files. We use rclone mount to access those files stored on our Ceph S3 Storage cluster.

The problem is not the cluster or networking, because when simply copying from the mountpoint to local disk takes barely about 4 minutes. The folder we are working contains ~14000 files with a total size of ~27GB.

But when accessing files over mountpoint, it takes ages to run our ArcMap job.

Here some timings of running the same job:

  • 1h30s when files stored locally on host
  • 6m30s when files stored on a remote share over CIFS
  • 45 minutes or more when using rclone mount

When looking the cache folder content, it's filling as expected and rclone logs show that files are found in cache. Our software tries to read files multiple times, but with a cache this should be very fast but that's not the case.

While digging more with Process Monitor from SysInternals, we see rclone using a fair amount of CPU time but this can be normal. But when doing thread monitoring, we see a lot of cycles spent on winfsp-x64.dll!FspFileSystemRemoveMountPoint+0xa0

Screenshot:

We know that this mount emulation layer has a cost, but with an effective cache the difference should be in single digit percent performance hit.

Any ideas? Bad parameters? RClone or WinFSP problem?

Thanks

What is your rclone version (output from rclone version)

rclone v1.52.3

  • os/arch: windows/amd64
  • go version: go1.14.7

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Windows Server 2019

Which cloud storage system are you using? (eg Google Drive)

S3 Compatible Storage - Ceph

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone mount PROJECT:poc v: --fuse-flag --VolumePrefix=\s3.domain.com\geoindex -vv --read-only --attr-timeout 15m --dir-cache-time 15m --cache-dir c:\rclone-cache --vfs-cache-mode full --vfs-cache-max-age 1h --vfs-cache-poll-interval 5m --vfs-cache-max-size 50g

The rclone config contents with secrets removed.

[PROJECT]
type = s3
provider = Ceph
env_auth = false
access_key_id = <REDACTED>
secret_access_key = <REDACTED>
endpoint = s3.domain.com

A log from the command with the -vv flag

It ouputs a lot of logs, but can see that cache is working:

2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: >Getattr: errc=0
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: Releasedir: fh=0x12
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: >Releasedir: errc=0
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif/31h15202.PASS: Getattr: fh=0xFFFFFFFFFFFFFFFF
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: Getattr: fh=0xFFFFFFFFFFFFFFFF
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: Getattr: fh=0xFFFFFFFFFFFFFFFF
2020/08/28 13:57:40 DEBUG : DATA/ulaval/Rasters/bdtq_20k/tif/22a10101.tif.aux.xml(0xc001c92840): Opened existing cached copy with flags=O_RDONLY
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif/31h15202.PASS: >Getattr: errc=-2
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif/22a10101.tif.aux.xml: >Read: n=119
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: >Getattr: errc=0
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: >Getattr: errc=0
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif/HDR02.txt: Getattr: fh=0xFFFFFFFFFFFFFFFF
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: Getattr: fh=0xFFFFFFFFFFFFFFFF
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: Opendir:
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: OpenFile: flags=O_RDONLY, perm=-rwxrwxrwx
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif/22a10101.tif.aux.xml: Flush: fh=0x4A
2020/08/28 13:57:40 DEBUG : DATA/ulaval/Rasters/bdtq_20k/tif/22a10101.tif.aux.xml(0xc001c92840): RWFileHandle.Flush ignoring flush on unwritten handle
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif/22a10101.tif.aux.xml: >Flush: errc=0
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif/22a10101.tif.aux.xml: Release: fh=0x4A
2020/08/28 13:57:40 DEBUG : DATA/ulaval/Rasters/bdtq_20k/tif/22a10101.tif.aux.xml(0xc001c92840): RWFileHandle.Release closing
2020/08/28 13:57:40 DEBUG : DATA/ulaval/Rasters/bdtq_20k/tif/22a10101.tif.aux.xml(0xc001c92840): close:
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif/22a10101.tif.ovr: Getattr: fh=0xFFFFFFFFFFFFFFFF
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif/22a10101.tif.ovr: >Getattr: errc=0
2020/08/28 13:57:40 DEBUG : DATA/ulaval/Rasters/bdtq_20k/tif/22a10101.tif.aux.xml(0xc001c92840): >close: err=<nil>
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif/22a10101.tif.aux.xml: >Release: errc=0
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: >OpenFile: fd=DATA/ulaval/Rasters/bdtq_20k/tif/ (r), err=<nil>
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: >Opendir: errc=0, fh=0x12
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: >Getattr: errc=0
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif/HDR02.txt: >Getattr: errc=-2
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif/22a10101.tif.ovr: Getattr: fh=0xFFFFFFFFFFFFFFFF
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: Opendir:
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: OpenFile: flags=O_RDONLY, perm=-rwxrwxrwx
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: >OpenFile: fd=DATA/ulaval/Rasters/bdtq_20k/tif/ (r), err=<nil>
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif: >Opendir: errc=0, fh=0x4A
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif/21l16101.tfw: Getattr: fh=0xFFFFFFFFFFFFFFFF
2020/08/28 13:57:40 DEBUG : /DATA/ulaval/Rasters/bdtq_20k/tif/22a10101.tif.ovr: >Getattr: errc=0

hi,
apps with lots of random access do not perform well over mount.
are those stats correct, that local takes 10 times longer then remote share?

have you tried the beta, has a new vfs cache.

I don't know if this soft is doing a lot of seeks in files, but since it is cached locally I don't expect that takes that long.

Regarding the rclone cache command, I saw on the official docs on this page that the cache backend is due to be phased out and that it's not that stable and not well maintained.

Since we're read-only, I think we can try it with more confidence. But as I can see it will be phased out and replaced by the VFS cache layer... Is that cache layer the one currently used by rclone mount ? If yes, should we dig deeper on this side for the bad performances we are currently experiencing?

I suspect that WinFSP does something bad. Are rclone developers knowledgeable enough to pinpoint the real problem?

Thanks in advance
Bruno

the answer is no, you need to use the rclone beta as per the link i shared.

the cache is a backend and the vfs cache are a set of flags for the mount command.
two different things, two different set of flags and tweaks.

the new vfs cache is the best of both as just a set of vfs flags, no cache backend.

Ok, to be clear, should I try «rclone mount» with vfs cache OR «rclone cache» with the latest beta?

thanks

perhaps neither option is needed.

given the small amount of data and using it as read-only, then i would not bother with winfsp and rclone mount.

  1. rclone sync PROJECT:poc c:\path\to\local\folder
  2. run ArcMap on c:\path\to\local\folder

I know that it can seems stupid, but the sync or copy way is not an option for our end-users.

I'll try rclone mount from latest beta release from https://beta.rclone.org

did not know you have end-users but i am curious as to why sync is not an option?

what version of winfsp?

i run my read-only mounts as system user, not local user.
might try that

never used --fuse-flag --VolumePrefix=\s3.domain.com\geoindex
not sure it would make a difference but you might remove and test.

i also do not mount as drive letter but as mount point.
rclone mount PROJECT:poc: b:\mount\geoindex\

Hi,
We use latest version of winfsp and latest beta, no difference between rclone mount to a local directory without --fuse-flag.

We also tried the «cache» backend with a little bit of improvement, but still order of magnitudes worse than locally of CIFS stored files.

Something is inneficient somewhere.

i would update to latest stable v.153.1

what version of winfsp are you using?

Currently WinFSP 2020.1 (1.7.20172)
I'll try again with latest stable.

https://downloads.rclone.org/ 1.53.1 not available

the website has a typo
in fact, the latest is v1.53.0.

how are you running the mount command, command line, task scheduler?
what user are you running the mount, system user or what?

have you read this and tried the various options and suggestions?
i would try with and without each and every flag.
https://rclone.org/commands/rclone_mount/#vfs-performance
"S3 and Swift benefit hugely from the --no-modtime flag"

https://rclone.org/commands/rclone_mount/#mount-as-a-network-drive
i would not use that --fuse-flag, yes i know you wrote that you tested with and without,
but as per the docs, you should run without unless you have a very specific reason for using it.
"In network drives, many I/O operations are optimized, as the high latency and low reliability (compared to a normal drive) of a network is expected."

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.