When I copy a large file to the mount (using either the OS file Explorer or cp command, not rclone copy):
The file is first copied into the local folder defined by --cache-dir
Then I can see my nic’s upload bandwidth being used as the file begins uploading to Gdrive
Being a large file this upload could take (say) an hour.
During this (say) hour, the X: drive/mount is locked and inaccessible for both reading and writing (in both the file Explorer as well as on the cmd line).
As soon as the upload is finished, X: drive is accessible again.
Can you access any directories in the mount? Say one which doesn’t have the file in?
So what it looks like is happening is that because a transaction is in progress (rclone won’t return success to WinFSP until the file is uploaded) that is blocking other transactions from happening.
I’m under the impression that WinFSP drivers are multi-threaded so the fact that this one call doesn’t return for an hour shouldn’t cause a problem, but maybe I’m wrong - @billziss-gh - do you have any thoughts?
Hmm, so testing some more today I have found this to be intermittent.
Whilst copying multiple files from the local drive to the mount:
Most of the time, during the upload to gdrive, the mount is locked (as reported above);
But for short periods it is unlocked, then locks up again.
I havn’t been able to detect a pattern to this. For example, it doesn’t seem to match the brief period between when one file finishes uploading and the next begins.
I will keep looking for a pattern. Meanwhile, any thoughts on what I might test/look for?
If you could have a go with the latest beta which has a couple of deadlock fixes and see if that helps.
Also if you could try with the latest WinFSP also
It might be useful if you have the rclone log scrolling in one window you keep visible so you can see the pauses in the rclone log - that might give you a clue.
I can access the mount in both ms-dos and other programs (yay!)
But Windows Exporer still chokes, yet not as badly as before. Previously it would freeze and I could not even close the Explorer window until after the upload had completed. Now, Explorer starts spinning unable to view any drives but it’s window can be closed. If I open it again, still spinning. Have to go into Task Manager and restart the Explorer process - but after doing this, Explorer can also see the mount (all the while with a large file uploading).
So definitely progress!
I am currently using the last stable Winfsp release. Now will try the latest beta Winfsp release and see if that changes anything.
Nope, it is back again. Mount is inaccessible during an upload, but was fine during an earlier upload. I will keep monitoring the logs and see if I can detect a pattern.
I worked out what is going on. My simple command line tests were missing the mv/rename step that my software was executing:
Copy a large file into the VFS mount copy test.png X:\subdir\test.png
File gets copied to the local vfs cache folder
File starts uploading to Gdrive
Whilst still uploading rename X:\subdir\test.png X:\subdir\test-renamed.png
Step 4 spins it’s wheels waiting for the mount to complete the upload transaction. After the upload finishes, the rename occurs.
I can see why this particular file would be locked during the upload, obviously we can’t change the original file in the vfs local cache until it has finished uploading.
But shouldn’t rclone/winfsp lock just the file being uploaded, not the whole mount?
Because it is definitely locking the whole mount until the upload transaction is finished.
* User mode file system locking strategy.
*
* Two concurrency models are provided:
*
* 1. A fine-grained concurrency model where file system NAMESPACE accesses
* are guarded using an exclusive-shared (read-write) lock. File I/O is not
* guarded and concurrent reads/writes/etc. are possible. [Note that the FSD
* will still apply an exclusive-shared lock PER INDIVIDUAL FILE, but it will
* not limit I/O operations for different files.]
*
* The fine-grained concurrency model applies the exclusive-shared lock as
* follows:
* <ul>
* <li>EXCL: SetVolumeLabel, Create, Cleanup(Delete), SetInformation(Rename)</li>
* <li>SHRD: GetVolumeInfo, Open, SetInformation(Disposition), ReadDirectory</li>
* <li>NONE: all other operations</li>
* </ul>
*
* 2. A coarse-grained concurrency model where all file system accesses are
* guarded by a mutually exclusive lock.
*
* @see FspFileSystemSetOperationGuardStrategy
Ie. does rclone/winsfp use FSP_FILE_SYSTEM_OPERATION_GUARD_STRATEGY_FINE or FSP_FILE_SYSTEM_OPERATION_GUARD_STRATEGY_COARSE?
The lock is placed on the whole mount even if we change the order to be:
Copy a large file into the VFS mount: copy test.png X:\subdir\test.png
File gets copied to the local vfs cache folder
Immediately and before the file starts uploading to Gdrive: rename X:\subdir\test.png X:\subdir\test-renamed.png
Then I see the nic bandwidth going to egress as the file starts uploading to Gdrive.
But at (2), the the whole mount is locked until (3) finishes.
What should happen is that rclone queues the rename until the file is uploaded. That works perfectly with rclone mount but with rclone cmount (which is rclone mount on Windows) it doesn’t.
I managed to replicate this on linux using cmount but not with mount, so there is some kind of extra locking going on.
There is no locking at all in the cmount layer and no locking that is in play in the cgofuse layer.
So the suspects are
locking in the VFS layer
some FUSE mount option
Hmm, I think what is happening is that the directory the file is in is being locked rather than the whole mount. For example I can still use df to read the size of the disk.
What we need to do is find out exactly what is happening when the rename call goes in.
I got a trace of the running goroutines when it locks up and there is nothing locked in a mutex and nothing out of the ordinary (no fuse transactions in progress) which means that it probably isn’t locking in the vfs layer causing this. and it more likely to be locking in FUSE/WinFSP FUSE layer.
So I think what we need to do next is experiment with FUSE options / WinFSP options.
I’ve got to dash now - will experiment more later!
Awesome, so it really does look like the Fuse options / Winfsp options then.
I was looking for docs on the winfsp site to see if I could test mounting with various options, but I got lost in there! I remember there was a comment within the winfsp github code about the 2 different locking strategies available ~ lock the whole drive or just the file. Relevant?
I couldn’t actually figure out which one of those modes rclone is using! Rclone doesn’t use WinFSP directly it uses it’s version of FUSE via cgofuse. That symbol isn’t mentioned in cgofuse - I didn’t check in the WinFSP FUSE layer yet though. It might well be relevant though!
…
I’ve had a look through the libfuse source code and yes, that is what it seems to do. I couldn’t find an option to turn it off either.
What is happening is because rclone hasn’t finished copying the file it doesn’t return from theRelease call that FUSE has called.
When the rename comes along FUSE checks to see whether there are any outstanding actions on that file and if there are then blocks until they have finished.
The reason that this works in rclone mount is that it doesn’t use libfuse it calls the kernel FUSE API directly so it doesn’t implement that waiting algorithm which is implemented entirely in libfuse and not in the kernel…
Here is an experiment which does the Release asynchronously. This should fix the problem for you (I only tried it on linux though). I’m not 100% sure of the consequences of this - I need to think a bit more!
Yeah @ncw I just checked from my PC (was remote before) and https://beta.rclone.org/branch/ doesn’t have the release linked above in there. The latest release is v1.45-053-gc6b64d55-onedrive-sharedfolders-pr-beta. Should I use that?