`--vfs-cache-mode writes` upload expected behaviour?

On Windows Server 2012:

  • Mount = Gdrive->Crypt (X: drive letter)
  • Using --vfs-cache-mode writes

When I copy a large file to the mount (using either the OS file Explorer or cp command, not rclone copy):

  • The file is first copied into the local folder defined by --cache-dir
  • Then I can see my nic’s upload bandwidth being used as the file begins uploading to Gdrive
  • Being a large file this upload could take (say) an hour.

During this (say) hour, the X: drive/mount is locked and inaccessible for both reading and writing (in both the file Explorer as well as on the cmd line).

As soon as the upload is finished, X: drive is accessible again.

Is this expected behaviour?

Probably this?


Hmm, could be, but probably not I’d say.

Can you access any directories in the mount? Say one which doesn’t have the file in?

So what it looks like is happening is that because a transaction is in progress (rclone won’t return success to WinFSP until the file is uploaded) that is blocking other transactions from happening.

I’m under the impression that WinFSP drivers are multi-threaded so the fact that this one call doesn’t return for an hour shouldn’t cause a problem, but maybe I’m wrong - @billziss-gh - do you have any thoughts?

No, cannot access any directories in the mount, including those without the file.

Hmm, that sounds like the whole file system is locked up rather than rclone’s locking…

If it is easy for you, can you try this on mac or linux and see if it has the same effect?

I don’t have a linux box handy but I just tested on osx Mojave and no problem there.

I tested on Mojave using both the 1.43 and 1.45 versions of rclone, both without issue.

Pretty sure this is Windows specific.

In case it is relevant, back on Windows, whilst the rclone drive (X:) is locked up, other local drives (eg. C:, F:) are all still accessible.

Hmm, so testing some more today I have found this to be intermittent.

Whilst copying multiple files from the local drive to the mount:

  • Most of the time, during the upload to gdrive, the mount is locked (as reported above);
  • But for short periods it is unlocked, then locks up again.

I havn’t been able to detect a pattern to this. For example, it doesn’t seem to match the brief period between when one file finishes uploading and the next begins.

I will keep looking for a pattern. Meanwhile, any thoughts on what I might test/look for?

If you could have a go with the latest beta which has a couple of deadlock fixes and see if that helps.

Also if you could try with the latest WinFSP also

It might be useful if you have the rclone log scrolling in one window you keep visible so you can see the pauses in the rclone log - that might give you a clue.

OK, so the latest beta improves things:

  • I can access the mount in both ms-dos and other programs (yay!)
  • But Windows Exporer still chokes, yet not as badly as before. Previously it would freeze and I could not even close the Explorer window until after the upload had completed. Now, Explorer starts spinning unable to view any drives but it’s window can be closed. If I open it again, still spinning. Have to go into Task Manager and restart the Explorer process - but after doing this, Explorer can also see the mount (all the while with a large file uploading).

So definitely progress!

I am currently using the last stable Winfsp release. Now will try the latest beta Winfsp release and see if that changes anything.

EDIT: This was intermittent again, see next post

With latest Winfsp beta and latest Rclone beta (same setup as this) I get the same outcome as the post above.

Ie. It all works but Windows Explorer flakes out if I try to access the mount during an upload and I then need to restart the Explorer process.

I can live with that, but it would be nice to know why it is happening :slight_smile: . All the logs are as previously reported and nothing in the Windows Event Log.

Nope, it is back again. Mount is inaccessible during an upload, but was fine during an earlier upload. I will keep monitoring the logs and see if I can detect a pattern.

I worked out what is going on. My simple command line tests were missing the mv/rename step that my software was executing:

  1. Copy a large file into the VFS mount copy test.png X:\subdir\test.png
  2. File gets copied to the local vfs cache folder
  3. File starts uploading to Gdrive
  4. Whilst still uploading rename X:\subdir\test.png X:\subdir\test-renamed.png

Step 4 spins it’s wheels waiting for the mount to complete the upload transaction. After the upload finishes, the rename occurs.

I can see why this particular file would be locked during the upload, obviously we can’t change the original file in the vfs local cache until it has finished uploading.

But shouldn’t rclone/winfsp lock just the file being uploaded, not the whole mount?

Because it is definitely locking the whole mount until the upload transaction is finished.

Is it perhaps related to this?

 * User mode file system locking strategy.
 * Two concurrency models are provided:
 * 1. A fine-grained concurrency model where file system NAMESPACE accesses
 * are guarded using an exclusive-shared (read-write) lock. File I/O is not
 * guarded and concurrent reads/writes/etc. are possible. [Note that the FSD
 * will still apply an exclusive-shared lock PER INDIVIDUAL FILE, but it will
 * not limit I/O operations for different files.]
 * The fine-grained concurrency model applies the exclusive-shared lock as
 * follows:
 * <ul>
 * <li>EXCL: SetVolumeLabel, Create, Cleanup(Delete), SetInformation(Rename)</li>
 * <li>SHRD: GetVolumeInfo, Open, SetInformation(Disposition), ReadDirectory</li>
 * <li>NONE: all other operations</li>
 * </ul>
 * 2. A coarse-grained concurrency model where all file system accesses are
 * guarded by a mutually exclusive lock.
 * @see FspFileSystemSetOperationGuardStrategy


One more piece of info:

The lock is placed on the whole mount even if we change the order to be:

  1. Copy a large file into the VFS mount: copy test.png X:\subdir\test.png
  2. File gets copied to the local vfs cache folder
    Immediately and before the file starts uploading to Gdrive: rename X:\subdir\test.png X:\subdir\test-renamed.png
  3. Then I see the nic bandwidth going to egress as the file starts uploading to Gdrive.

But at (2), the the whole mount is locked until (3) finishes.

Excellent debugging :smile:

What should happen is that rclone queues the rename until the file is uploaded. That works perfectly with rclone mount but with rclone cmount (which is rclone mount on Windows) it doesn’t.

I managed to replicate this on linux using cmount but not with mount, so there is some kind of extra locking going on.

mount works like this on linux

vfs -> mount -> FUSE

cmount works like this on linux

vfs -> cmount -> cgofuse -> FUSE

and this on windows

vfs -> cmount -> cgofuse -> WinFSP FUSE compatiblity layer -> WinFSP

There is no locking at all in the cmount layer and no locking that is in play in the cgofuse layer.

So the suspects are

  • locking in the VFS layer
  • some FUSE mount option

Hmm, I think what is happening is that the directory the file is in is being locked rather than the whole mount. For example I can still use df to read the size of the disk.

What we need to do is find out exactly what is happening when the rename call goes in.

I got a trace of the running goroutines when it locks up and there is nothing locked in a mutex and nothing out of the ordinary (no fuse transactions in progress) which means that it probably isn’t locking in the vfs layer causing this. and it more likely to be locking in FUSE/WinFSP FUSE layer.

So I think what we need to do next is experiment with FUSE options / WinFSP options.

I’ve got to dash now - will experiment more later!

I got a bit of definitive debug from FUSE

unique: 20, opcode: LOOKUP (1), nodeid: 1, insize: 46, pid: 14311
getattr /100Ma
2018/12/28 17:24:32 DEBUG : /100Ma: Getattr: fh=0xFFFFFFFFFFFFFFFF
2018/12/28 17:24:32 DEBUG : /100Ma: >Getattr: errc=-2
   unique: 20, error: -2 (No such file or directory), outsize: 16
unique: 21, opcode: LOOKUP (1), nodeid: 1, insize: 45, pid: 14311
getattr /100M
2018/12/28 17:24:32 DEBUG : /100M: Getattr: fh=0xFFFFFFFFFFFFFFFF
2018/12/28 17:24:32 DEBUG : /100M: >Getattr: errc=0
   NODEID: 3
   unique: 21, success, outsize: 144
unique: 22, opcode: LOOKUP (1), nodeid: 1, insize: 46, pid: 14311
getattr /100Ma
2018/12/28 17:24:32 DEBUG : /100Ma: Getattr: fh=0xFFFFFFFFFFFFFFFF
2018/12/28 17:24:32 DEBUG : /100Ma: >Getattr: errc=-2
   unique: 22, error: -2 (No such file or directory), outsize: 16
unique: 23, opcode: LOOKUP (1), nodeid: 1, insize: 46, pid: 14311
getattr /100Ma
2018/12/28 17:24:32 DEBUG : /100Ma: Getattr: fh=0xFFFFFFFFFFFFFFFF
2018/12/28 17:24:32 DEBUG : /100Ma: >Getattr: errc=-2
   unique: 23, error: -2 (No such file or directory), outsize: 16
unique: 24, opcode: RENAME (12), nodeid: 1, insize: 59, pid: 14311
      PATH2 1

The last is the rename which blocks - and by the look of it it is queued - which means we are on the right track hopefully!

Awesome, so it really does look like the Fuse options / Winfsp options then.

I was looking for docs on the winfsp site to see if I could test mounting with various options, but I got lost in there! I remember there was a comment within the winfsp github code about the 2 different locking strategies available ~ lock the whole drive or just the file. Relevant?

I couldn’t actually figure out which one of those modes rclone is using! Rclone doesn’t use WinFSP directly it uses it’s version of FUSE via cgofuse. That symbol isn’t mentioned in cgofuse - I didn’t check in the WinFSP FUSE layer yet though. It might well be relevant though!

I’ve had a look through the libfuse source code and yes, that is what it seems to do. I couldn’t find an option to turn it off either.

What is happening is because rclone hasn’t finished copying the file it doesn’t return from theRelease call that FUSE has called.

When the rename comes along FUSE checks to see whether there are any outstanding actions on that file and if there are then blocks until they have finished.

The reason that this works in rclone mount is that it doesn’t use libfuse it calls the kernel FUSE API directly so it doesn’t implement that waiting algorithm which is implemented entirely in libfuse and not in the kernel…

Here is an experiment which does the Release asynchronously. This should fix the problem for you (I only tried it on linux though). I’m not 100% sure of the consequences of this - I need to think a bit more!

https://beta.rclone.org/branch/v1.45-053-g372f2c9b-vfs-async-release-beta/ (uploaded in 15-30 mins)

Sorry, that beta url is returning a 404 error for me.

Yeah @ncw I just checked from my PC (was remote before) and https://beta.rclone.org/branch/ doesn’t have the release linked above in there. The latest release is v1.45-053-gc6b64d55-onedrive-sharedfolders-pr-beta. Should I use that?