Rclone mount deadlock when --dir-cache-time strikes

What is the problem you are having with rclone?

The remote B2 storage that I've mounted with rclone mount works great for a period of time, but then experiences a deadlock where any process interacting with that filesystem goes into an uninterruptible state. (e.g., if I try to cp a file to it, the command doesn't succeed, and a Ctrl-C will not kill the process.).

This deadlock is absolutely affected by the --dir-cache-time option. When running with the default, the deadlock occurs about 5 minutes after I start my sync process. If I crank it up (e.g. to 13 minutes), it will take exactly 13 minutes for the deadlock to occur.

To recover, I must Ctrl-C the actual rclone mount command (I've tried waiting hours to no avail), which will cause any processes waiting on that filesystem to come back to life. One thing I find interesting, is that rclone mount will still output statistics on the defined interval, however nothing else appears to function -- the writeback cache does not write back, and the (read?) cache doesn't age objects out. After killing and remounting, however, the cache cleaning operations will complete successfully.

== More (probably useless) backstory ==

The process I'm trying to ultimately accomplish, is to copy about 1.5 billion relatively small files (images) from on-premise to B2 storage. I would use rclone copy (or sync), except we're changing the folder structure of the images from the source to the destination, and I was unable to come up with a reliable and efficient way to do this with a copy/sync.

Run the command 'rclone version' and share the full output of the command.

- os/version: centos 7.9.2009 (64 bit)
- os/kernel: 3.10.0-1160.62.1.el7.x86_64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.18.6
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Backblaze B2

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone mount \
 --dir-cache-time 13m \
 --no-modtime --no-seek \
 --transfers=50 \
 --vfs-cache-mode writes \
 --vfs-fast-fingerprint \
 --vfs-write-back 10s \
 --vfs-cache-max-age 30s \
 --vfs-cache-max-size 5g \
 --stats-one-line -P --stats=30s \
 --log-file /var/log/rclone_mount.log --log-level DEBUG \
 backblaze-fbs-photos:fbs-photos /mnt/backblaze_fbs-photos

The rclone config contents with secrets removed.

[backblaze-fbs-photos]
type = b2
account = <redacted>
key = <redacted>

A log from the command with the -vv flag

This is an abridged copy of the log file, with just the notable events being included, and other repetitive output removed.

2022/10/12 09:36:05 DEBUG : rclone: Version "v1.59.2" starting with parameters ["rclone" "mount" "--no-modtime" "--no-seek" "--transfers=50" "--vfs-cache-mode" "writes" "--vfs-fast-fingerprint" "--vfs-write-back" "10s" "--vfs-cache-max-age" "30s" "--vfs-cache-max-size" "5g" "-P" "--stats-one-line" "--stats=30s" "backblaze-fbs-photos:fbs-photos" "--log-file" "/var/log/rclone_mount.log" "--log-level" "DEBUG" "/mnt/backblaze_fbs-photos"]
2022/10/12 09:36:05 DEBUG : Creating backend with remote "backblaze-fbs-photos:fbs-photos"
2022/10/12 09:36:05 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2022/10/12 09:36:06 INFO  : B2 bucket fbs-photos: poll-interval is not supported by this remote
2022/10/12 09:36:06 DEBUG : vfs cache: root is "/root/.cache/rclone"
2022/10/12 09:36:06 DEBUG : vfs cache: data root is "/root/.cache/rclone/vfs/backblaze-fbs-photos/fbs-photos"
2022/10/12 09:36:06 DEBUG : vfs cache: metadata root is "/root/.cache/rclone/vfsMeta/backblaze-fbs-photos/fbs-photos"
2022/10/12 09:36:06 DEBUG : Creating backend with remote "/root/.cache/rclone/vfs/backblaze-fbs-photos/fbs-photos"
2022/10/12 09:36:06 DEBUG : Creating backend with remote "/root/.cache/rclone/vfsMeta/backblaze-fbs-photos/fbs-photos"
2022/10/12 09:36:07 DEBUG : Couldn't decode error response: EOF
2022/10/12 09:36:07 DEBUG : lbr/20160610200426187490000000-o.jpg: vfs cache: truncate to size=657169
2022/10/12 09:36:07 DEBUG : lbr/20160610200426187490000000-o.jpg: vfs cache: setting modification time to 2022-10-11 15:44:26.605852561 -0500 CDT
2022/10/12 09:36:07 INFO  : lbr/20160610200426187490000000-o.jpg: vfs cache: queuing for upload in 10s
2022/10/12 09:36:08 DEBUG : lbr: Added virtual directory entry vAddFile: "20160610200426187490000000-o.jpg"
2022/10/12 09:36:08 DEBUG : Couldn't decode error response: EOF
2022/10/12 09:36:08 DEBUG : lbr/20160610200426187490000000-t.jpg: vfs cache: truncate to size=13841
2022/10/12 09:36:08 DEBUG : lbr/20160610200426187490000000-t.jpg: vfs cache: setting modification time to 2022-10-11 15:44:26.628189065 -0500 CDT
2022/10/12 09:36:08 INFO  : lbr/20160610200426187490000000-t.jpg: vfs cache: queuing for upload in 10s
2022/10/12 09:36:08 DEBUG : lbr: Added virtual directory entry vAddFile: "20160610200426187490000000-t.jpg"
2022/10/12 09:36:08 DEBUG : Couldn't decode error response: EOF
2022/10/12 09:36:08 DEBUG : lbr/20160610200426187490000000.jpg: vfs cache: truncate to size=29994
2022/10/12 09:36:08 DEBUG : lbr/20160610200426187490000000.jpg: vfs cache: setting modification time to 2022-10-11 15:44:26.552869127 -0500 CDT
2022/10/12 09:36:08 INFO  : lbr/20160610200426187490000000.jpg: vfs cache: queuing for upload in 10s
2022/10/12 09:36:08 DEBUG : lbr: Added virtual directory entry vAddFile: "20160610200426187490000000.jpg"
2022/10/12 09:36:08 DEBUG : Couldn't decode error response: EOF

[... lots of debugging ...]

2022/10/12 09:36:29 DEBUG : rva/20220324195016759334000000-o.jpg: sha1 = 7f94597de2a1a73d760a9fb9eafe17f42c5d997d OK
2022/10/12 09:36:29 INFO  : rva/20220324195016759334000000-o.jpg: Copied (new)
2022/10/12 09:36:29 DEBUG : rva/20220324195016759334000000-o.jpg: vfs cache: fingerprint now "2512935,2022-10-11 20:44:30.157 +0000 UTC,7f94597de2a1a73d760a9fb9eafe17f42c5d997d"
2022/10/12 09:36:29 INFO  : rva/20220324195016759334000000-o.jpg: vfs cache: upload succeeded try #1
2022/10/12 09:36:29 DEBUG : rva/20221004175321567200000000.jpg: sha1 = 4249d7fe7b0d183a0824f30db983ef5a0c0f2aa7 OK
2022/10/12 09:36:29 INFO  : rva/20221004175321567200000000.jpg: Copied (new)
2022/10/12 09:36:29 DEBUG : rva/20221004175321567200000000.jpg: vfs cache: fingerprint now "15202,2022-10-11 20:44:31.242 +0000 UTC,4249d7fe7b0d183a0824f30db983ef5a0c0f2aa7"
2022/10/12 09:36:29 INFO  : rva/20221004175321567200000000.jpg: vfs cache: upload succeeded try #1
2022/10/12 09:36:30 DEBUG : rva/20171106185743246969000000-t.jpg: sha1 = 87e2b152c801ff9d985929335b7e7b5381684fdb OK
2022/10/12 09:36:30 INFO  : rva/20171106185743246969000000-t.jpg: Copied (new)
2022/10/12 09:36:30 DEBUG : rva/20171106185743246969000000-t.jpg: vfs cache: fingerprint now "28457,2022-10-11 20:44:27.125 +0000 UTC,87e2b152c801ff9d985929335b7e7b5381684fdb"
2022/10/12 09:36:30 INFO  : rva/20171106185743246969000000-t.jpg: vfs cache: upload succeeded try #1
2022/10/12 09:36:35 INFO  :    24.235 MiB / 24.235 MiB, 100%, 1.331 MiB/s, ETA 0s

[... lots of debugging ...]

2022/10/12 09:36:50 DEBUG : /: Attr: 
2022/10/12 09:36:50 DEBUG : /: >Attr: attr=valid=1s ino=0 size=0 mode=drwxr-xr-x, err=<nil>
2022/10/12 09:36:51 DEBUG : /: ReadDirAll: 
2022/10/12 09:36:51 DEBUG : /: >ReadDirAll: item=5, err=<nil>
2022/10/12 09:36:51 DEBUG : /: Lookup: name="fcb"
2022/10/12 09:36:51 DEBUG : /: >Lookup: node=fcb/, err=<nil>
2022/10/12 09:36:51 DEBUG : fcb/: Attr: 
2022/10/12 09:36:51 DEBUG : fcb/: >Attr: attr=valid=1s ino=0 size=0 mode=drwxr-xr-x, err=<nil>
2022/10/12 09:36:51 DEBUG : /: Lookup: name="lbr"
2022/10/12 09:36:51 DEBUG : /: >Lookup: node=lbr/, err=<nil>
2022/10/12 09:36:51 DEBUG : lbr/: Attr: 
2022/10/12 09:36:51 DEBUG : lbr/: >Attr: attr=valid=1s ino=0 size=0 mode=drwxr-xr-x, err=<nil>
2022/10/12 09:36:51 DEBUG : /: Lookup: name="rva"
2022/10/12 09:36:51 DEBUG : /: >Lookup: node=rva/, err=<nil>
2022/10/12 09:36:51 DEBUG : rva/: Attr: 
2022/10/12 09:36:51 DEBUG : rva/: >Attr: attr=valid=1s ino=0 size=0 mode=drwxr-xr-x, err=<nil>
2022/10/12 09:37:05 INFO  :    24.235 MiB / 24.235 MiB, 100%, 196.638 KiB/s, ETA 0s
2022/10/12 09:37:05 DEBUG : /: Lookup: name="rva"
2022/10/12 09:37:05 DEBUG : /: >Lookup: node=rva/, err=<nil>
2022/10/12 09:37:05 DEBUG : rva/: Attr: 
2022/10/12 09:37:05 DEBUG : rva/: >Attr: attr=valid=1s ino=0 size=0 mode=drwxr-xr-x, err=<nil>
2022/10/12 09:37:05 DEBUG : rva/: Lookup: name="20200526164253237442000000.jpg"
2022/10/12 09:37:19 INFO  : rva/20080605193453522332000000-t.jpg: vfs cache: removed cache file as Removing old cache file not in use
2022/10/12 09:37:19 DEBUG : rva/20080605193453522332000000-t.jpg: vfs cache: removed metadata from cache as Removing old cache file not in use
2022/10/12 09:37:19 INFO  : vfs cache RemoveNotInUse (maxAge=30000000000, emptyOnly=false): item rva/20080605193453522332000000-t.jpg was removed, freed 995 bytes
2022/10/12 09:37:19 INFO  : rva/20180505140139971851000000.jpg: vfs cache: removed cache file as Removing old cache file not in use
2022/10/12 09:37:19 DEBUG : rva/20180505140139971851000000.jpg: vfs cache: removed metadata from cache as Removing old cache file not in use
2022/10/12 09:37:19 INFO  : vfs cache RemoveNotInUse (maxAge=30000000000, emptyOnly=false): item rva/20180505140139971851000000.jpg was removed, freed 54946 bytes
2022/10/12 09:37:19 INFO  : rva/20170508043340803652000000-o.jpg: vfs cache: removed cache file as Removing old cache file not in use
2022/10/12 09:37:19 DEBUG : rva/20170508043340803652000000-o.jpg: vfs cache: removed metadata from cache as Removing old cache file not in use
2022/10/12 09:37:19 INFO  : vfs cache RemoveNotInUse (maxAge=30000000000, emptyOnly=false): item rva/20170508043340803652000000-o.jpg was removed, freed 632120 bytes
2022/10/12 09:37:19 INFO  : rva/20180829143827173616000000-o.jpg: vfs cache: removed cache file as Removing old cache file not in use
2022/10/12 09:37:19 DEBUG : rva/20180829143827173616000000-o.jpg: vfs cache: removed metadata from cache as Removing old cache file not in use
2022/10/12 09:37:19 INFO  : vfs cache RemoveNotInUse (maxAge=30000000000, emptyOnly=false): item rva/20180829143827173616000000-o.jpg was removed, freed 103118 bytes
2022/10/12 09:37:19 INFO  : rva/20160407134413830587000000-t.jpg: vfs cache: removed cache file as Removing old cache file not in use
2022/10/12 09:37:19 DEBUG : rva/20160407134413830587000000-t.jpg: vfs cache: removed metadata from cache as Removing old cache file not in use
2022/10/12 09:37:19 INFO  : vfs cache RemoveNotInUse (maxAge=30000000000, emptyOnly=false): item rva/20160407134413830587000000-t.jpg was removed, freed 2113 bytes
2022/10/12 09:37:19 INFO  : rva/20100528142058853552000000-o.jpg: vfs cache: removed cache file as Removing old cache file not in use
2022/10/12 09:37:19 DEBUG : rva/20100528142058853552000000-o.jpg: vfs cache: removed metadata from cache as Removing old cache file not in use
2022/10/12 09:37:19 INFO  : vfs cache RemoveNotInUse (maxAge=30000000000, emptyOnly=false): item rva/20100528142058853552000000-o.jpg was removed, freed 11772 bytes
2022/10/12 09:37:19 INFO  : rva/20131101124146682705000000-o.jpg: vfs cache: removed cache file as Removing old cache file not in use
2022/10/12 09:37:19 DEBUG : rva/20131101124146682705000000-o.jpg: vfs cache: removed metadata from cache as Removing old cache file not in use
2022/10/12 09:37:19 INFO  : vfs cache RemoveNotInUse (maxAge=30000000000, emptyOnly=false): item rva/20131101124146682705000000-o.jpg was removed, freed 308608 bytes

[... lots of debugging ...]

2022/10/12 09:37:19 INFO  : rva/20100413145233013372000000.jpg: vfs cache: removed cache file as Removing old cache file not in use
2022/10/12 09:37:19 DEBUG : rva/20100413145233013372000000.jpg: vfs cache: removed metadata from cache as Removing old cache file not in use
2022/10/12 09:37:19 INFO  : vfs cache RemoveNotInUse (maxAge=30000000000, emptyOnly=false): item rva/20100413145233013372000000.jpg was removed, freed 4756 bytes
2022/10/12 09:37:19 INFO  : rva/20170508043340803652000000.jpg: vfs cache: removed cache file as Removing old cache file not in use
2022/10/12 09:37:19 DEBUG : rva/20170508043340803652000000.jpg: vfs cache: removed metadata from cache as Removing old cache file not in use
2022/10/12 09:37:19 INFO  : vfs cache RemoveNotInUse (maxAge=30000000000, emptyOnly=false): item rva/20170508043340803652000000.jpg was removed, freed 17990 bytes
2022/10/12 09:37:19 INFO  : rva/20110315173442599061000000-o.jpg: vfs cache: removed cache file as Removing old cache file not in use
2022/10/12 09:37:19 DEBUG : rva/20110315173442599061000000-o.jpg: vfs cache: removed metadata from cache as Removing old cache file not in use
2022/10/12 09:37:19 INFO  : vfs cache RemoveNotInUse (maxAge=30000000000, emptyOnly=false): item rva/20110315173442599061000000-o.jpg was removed, freed 26234 bytes
2022/10/12 09:37:19 INFO  : vfs cache: cleaned: objects 0 (was 172) in use 0, to upload 0, uploading 0, total size 0 (was 24.235Mi)
2022/10/12 09:37:35 INFO  :    24.235 MiB / 24.235 MiB, 100%, 84.976 KiB/s, ETA 0s
2022/10/12 09:38:05 INFO  :    24.235 MiB / 24.235 MiB, 100%, 84.976 KiB/s, ETA 0s
2022/10/12 09:38:06 DEBUG : rva: Removed virtual directory entry vAddFile: "20060311021156926783000000-o.jpg"
2022/10/12 09:38:06 DEBUG : rva: Removed virtual directory entry vAddFile: "20060311021156926783000000-t.jpg"
2022/10/12 09:38:06 DEBUG : rva: Removed virtual directory entry vAddFile: "20060311021156926783000000.jpg"
2022/10/12 09:38:06 DEBUG : rva: Removed virtual directory entry vAddFile: "20060619204904205499000000-o.jpg"
2022/10/12 09:38:06 DEBUG : rva: Removed virtual directory entry vAddFile: "20060619204904205499000000-t.jpg"
2022/10/12 09:38:06 DEBUG : rva: Removed virtual directory entry vAddFile: "20060619204904205499000000.jpg"
2022/10/12 09:38:06 DEBUG : rva: Removed virtual directory entry vAddFile: "20080605193453522332000000-o.jpg"
2022/10/12 09:38:06 DEBUG : rva: Removed virtual directory entry vAddFile: "20080605193453522332000000-t.jpg"
2022/10/12 09:38:06 DEBUG : rva: Removed virtual directory entry vAddFile: "20080605193453522332000000.jpg"

[... lots of debugging ...]

2022/10/12 09:42:08 DEBUG : rva/20190204161540046985000000-t.jpg: sha1 = 2fde80202889a52005a13197530d3af255129737 OK
2022/10/12 09:42:08 INFO  : rva/20190204161540046985000000-t.jpg: Copied (new)
2022/10/12 09:42:08 DEBUG : rva/20190204161540046985000000-t.jpg: vfs cache: fingerprint now "12883,2022-10-12 14:41:48.642 +0000 UTC,2fde80202889a52005a13197530d3af255129737"
2022/10/12 09:42:08 DEBUG : rva/20190204161540046985000000-t.jpg: vfs cache: writeback object to VFS layer
2022/10/12 09:42:08 DEBUG : rva/20180427142348909961000000-o.jpg: sha1 = a7261611014b6a2d29d27981d69a2a8f6e2eddbd OK
2022/10/12 09:42:08 INFO  : rva/20180427142348909961000000-o.jpg: Copied (new)
2022/10/12 09:42:08 DEBUG : rva/20180427142348909961000000-o.jpg: vfs cache: fingerprint now "982191,2022-10-12 14:41:58.131 +0000 UTC,a7261611014b6a2d29d27981d69a2a8f6e2eddbd"
2022/10/12 09:42:08 DEBUG : rva/20180427142348909961000000-o.jpg: vfs cache: writeback object to VFS layer
2022/10/12 09:42:09 DEBUG : rva/20200723121018433702000000.jpg: sha1 = d3f9231607116563e6bc6cba447ad6acb7fbb71a OK
2022/10/12 09:42:09 INFO  : rva/20200723121018433702000000.jpg: Copied (new)
2022/10/12 09:42:09 DEBUG : rva/20200723121018433702000000.jpg: vfs cache: fingerprint now "14983,2022-10-12 14:41:55.185 +0000 UTC,d3f9231607116563e6bc6cba447ad6acb7fbb71a"
2022/10/12 09:42:09 DEBUG : rva/20200723121018433702000000.jpg: vfs cache: writeback object to VFS layer
2022/10/12 09:42:09 DEBUG : rva/20150317153826634067000000-o.jpg: sha1 = 5033e31e2d4e60f8ae08f3fe9ef607d8286c3613 OK
2022/10/12 09:42:09 INFO  : rva/20150317153826634067000000-o.jpg: Copied (new)
2022/10/12 09:42:09 DEBUG : rva/20150317153826634067000000-o.jpg: vfs cache: fingerprint now "5770432,2022-10-12 14:41:57.329 +0000 UTC,5033e31e2d4e60f8ae08f3fe9ef607d8286c3613"
2022/10/12 09:42:09 DEBUG : rva/20150317153826634067000000-o.jpg: vfs cache: writeback object to VFS layer
2022/10/12 09:42:19 DEBUG : rva/20180427142348909961000000-t.jpg: vfs cache: delaying writeback as --transfers exceeded
2022/10/12 09:42:19 INFO  : rva/20090601212417920141000000-t.jpg: vfs cache: removed cache file as Removing old cache file not in use
2022/10/12 09:42:19 DEBUG : rva/20090601212417920141000000-t.jpg: vfs cache: removed metadata from cache as Removing old cache file not in use
2022/10/12 09:42:19 INFO  : vfs cache RemoveNotInUse (maxAge=30000000000, emptyOnly=false): item rva/20090601212417920141000000-t.jpg was removed, freed 1365 bytes
2022/10/12 09:42:19 DEBUG : rva/20180427142348909961000000-t.jpg: vfs cache: delaying writeback as --transfers exceeded


2022/10/12 09:42:19 DEBUG : rva/20190204161540046985000000-t.jpg: vfs cache: cancelling upload
2022/10/12 09:42:35 INFO  :   688.679 MiB / 688.679 MiB, 100%, 84.976 KiB/s, ETA 0s

hello and welcome to the forum,

i might not understand your case, but
if the flow is from source to dest , then might might set --dir-cache-time to a something like 99999h
for each file rclone copies to B2, rclone will update the vfs dir cache.

Thanks for the "welcome" and the quick response. Cranking up --dir-cache-time what I've done for a workaround -- I actually didn't stumble upon the fact that the deadlock follows the dir-cache-time setting until I was just about done with this post :slight_smile:. This workaround is working just fine for me right now, so I hope it continues!

I'm not sure why the dir cache timing out causes it to end up in such a weird state. Something doesn't seem quite right, there. I feel like there's a bug somewhere, but I'm not sure if it's with rclone, or with FUSE.

Hi walkeran,

Just a quick guess:

You mount may become unresponsive due to B2 overload or throttling of your requests. You are potentially doing 50 concurrent transfers while asking for a refresh of one or more folders that potentially contains a lot of files to be listed (you have 1.5M files in total in how many folders?). This causes a lot of requests and may delay the directory listing so much that it cannot complete before it needs to do the next refresh, and then everything related to the directory probably freezes while the transfers and stats continues.

So it may just be you overloading B2 (or your internet connection), not a bug.

So it may just be you overloading B2

That's what I thought initially, too, but I don't believe that to be the case.

When the deadlock happens, I can let it sit for as long as I want, and it will never recover. Last night, I let it sit overnight, with no transfer activity, just hoping that it would snap out of it or finish what it was doing, but it remained stuck. However, if I bail out of the rclone mount, and start it up again (regardless of how long I'd waited), it will spin up and function just fine.

Also, now that I have the --dir-cache-time set and cranked way up, I'm not seeing this issue anymore. I've even doubled the --transfers from 50 to 100 (which Backblaze suggested), and quadrupled the amount of threads that I have throwing files at it.

I guess, one could argue that the request flow is slightly different -- now it's not having to make requests to pull the directory listing to refresh the cache every 5 minutes, so that could affect how Backblaze is handling the traffic. But, it's still the case that I could shut off the mount, purge the writeback cache to start over clean, and spin the mount back up, and it will start working fine.

kill -SIGQUIT <pid of rclone> will dump the process stack trace which should indicate whether any thread is deadlock-ed or not.

This does indeed sound like a deadlock. There have been quite a few deadlocks in the VFS - so many in fact that I made a conference talk on how to find, fix and avoid them!

As @darthShadow mentioned above, the first step is to kill -QUIT <pid of rclone> which will give a very long backtrace. If you can capture that and attach it here (or stick it on pastebin/github gists) then I'll analyse it and work out what is going wrong.

If you could do this with the latest beta that would be most helpful.

@ncw, thanks for the pointers! As soon as I got this message yesterday, I was going to get you the backtrace. Sure enough, just as you'd expect, as soon as I wanted it to fail, it wasn't.

Finally, this morning, though, I got one of my instances to lock up. Sadly, this one was the same version as before, and not the latest beta. But I did get the backtrace.

I still have 1.4 billion files to copy, so there's definitely still a chance that I will be able to get you a backtrace from a beta instance.

Thanks again!

Thank you, that is very helpful

Here are the edited highlights of the condensed backtrace (thanks again panicparse!)

1: semacquire [16 minutes] [Created by fs.(*Server).Serve @ serve.go:510]
    sync      sema.go:71                  runtime_SemacquireMutex(0, 0x24, #66)
    sync      mutex.go:162                (*Mutex).lockSlow(#26)
    sync      mutex.go:81                 (*Mutex).Lock(...)
    vfscache  cache.go:302                (*Cache).InUse(#25, {#117, #196})    <---- BLOCKED on acquiring vfscache.Cache lock
    vfs       dir.go:530                  (*Dir)._purgeVirtual(#27)
    vfs       dir.go:552                  (*Dir)._newManageVirtuals(...)
    vfs       dir.go:614                  (*Dir)._readDirFromEntries(#27, {#147, 0x3df8b, #31}, 0, {#18, 3, 0})
    vfs       dir.go:461                  (*Dir)._readDir(#27)
    vfs       dir.go:702                  (*Dir).stat(#27, {#63, 0x20}) <---- ACQUIRED vfs.Dir lock
    vfs       dir.go:799                  (*Dir).Stat(#1, {#63, #2})
    mount     dir.go:77                   (*Dir).Lookup(#24, {#4, #24}, #305, #75)
    fs        serve.go:1187               (*Server).handleRequest(#29, {#7, #103}, {#4, #24}, #19, {#6, #305}, #32)
    fs        serve.go:1015               (*Server).serve(#29, {#6, #305})
    fs        serve.go:512                (*Server).Serve.func1()
50: semacquire [17 minutes] [Created by writeback.(*WriteBack).processItems @ writeback.go:450]
    sync      sema.go:71                  runtime_SemacquireMutex(*, 0, *)
    sync      mutex.go:162                (*Mutex).lockSlow(#28)
    sync      mutex.go:81                 (*Mutex).Lock(...)
    sync      rwmutex.go:139              (*RWMutex).Lock(*)
    vfs       dir.go:373                  (*Dir).addObject(#27, {#9, *}) <---- BLOCKED on acquiring vfs.Dir lock
    vfs       file.go:462                 (*File).setObject(*, {#8, *})
    vfscache  item.go:604                 (*Item)._store(*, {#7, *}, *)
    vfscache  item.go:615                 (*Item).store(0x1a, {#7, *}, 0)
    vfscache  item.go:714                 (*Item).Close.func2({#7, *})
    writeback writeback.go:354            (*WriteBack).upload(#17, {#7, *}, *)
1: chan receive [16 minutes] [Created by vfscache.New @ cache.go:144] <--- this is the go routine doing the --vfs-cache-max-size cleaning
    writeback writeback.go:403            (*WriteBack)._cancelUpload(#17, #273)  <--- waiting for upload to be cancelled
    writeback writeback.go:290            (*WriteBack)._remove(#17, 0x17f)
    writeback writeback.go:308            (*WriteBack).Remove(#33, 0)
    vfscache  item.go:862                 (*Item)._remove(#124, {#3, 0x22})
    vfscache  item.go:910                 (*Item).RemoveNotInUse(#124, #13, 0)
    vfscache  cache.go:573                (*Cache).removeNotInUse(#25, #124, 0x38, 0)
    vfscache  cache.go:662                (*Cache).purgeOld(#25, 0)    <---- ACQUIRED vfscache.Cache lock
    vfscache  cache.go:745                (*Cache).clean(#25, 0)
    vfscache  cache.go:811                (*Cache).cleaner(#25, {#7, #23})
1: syscall [Created by mount.mount @ mount.go:98]
    syscall   asm_linux_amd64.s:20        Syscall(0, 0xe, #34, 0x21000)
    syscall   zsyscall_linux_amd64.go:696 read(#10, {#34, 0x20, 0})
    syscall   syscall_unix.go:188         Read(...)
    fuse      fuse.go:580                 (*Conn).ReadRequest(#20)
    fs        serve.go:501                (*Server).Serve(#29, {#5, #22})
    mount     mount.go:99                 mount.func2()
1: select [Created by cmd.StartStats @ cmd.go:366]
    cmd       cmd.go:370                  StartStats.func2()
1: chan receive [84 minutes] [Created by atexit.Register.func1 @ atexit.go:44]
    atexit    atexit.go:45                Register.func1.1()
1: select [Created by view.init.0 @ worker.go:34]
    view      worker.go:276               (*worker).start(#15)

The thing that appears to have jammed everything else up is the fact that rclone has asked for an upload to be cancelled, but the cancel request has never finished.

This means that you should find in your log, probably 18 minutes before the end a log like

	fs.Debugf(wbItem.name, "vfs cache: cancelling upload")

Any logs to do with whatever that item is would be very useful.

Why rclone is deciding to remove a file that is being uploaded is interesting. That shouldn't happen!

I don't immediately see the link with --dir-cache-time but I'm prepared to believe there is one!

Did your disk get over the quota of 5GB in the cache?

We fixed a bug in the beta which could conceivably be the same problem if your disk is nudging over quota (which seems likely from the backtrace).

Nick,

Found it! In fact it was quite easy to find -- literally the last DEBUG line before the deadlock. Here are all of the log lines that pertain to that file:

2022/10/14 08:36:12 DEBUG : sha/: Lookup: name="20190531203216780282000000.jpg"
2022/10/14 08:36:12 DEBUG : sha/: Create: name="20190531203216780282000000.jpg"
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: Open: flags=O_WRONLY|O_CREATE|O_EXCL
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: newRWFileHandle:
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9740): openPending:
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: vfs cache: truncate to size=0
2022/10/14 08:36:12 DEBUG : sha: Added virtual directory entry vAddFile: "20190531203216780282000000.jpg"
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9740): >openPending: err=<nil>
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: >newRWFileHandle: err=<nil>
2022/10/14 08:36:12 DEBUG : sha: Added virtual directory entry vAddFile: "20190531203216780282000000.jpg"
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: >Open: fd=sha/20190531203216780282000000.jpg (rw), err=<nil>
2022/10/14 08:36:12 DEBUG : sha/: >Create: node=sha/20190531203216780282000000.jpg, handle=&{sha/20190531203216780282000000.jpg (rw)}, err=<nil>
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: Attr:
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: >Attr: a=valid=1s ino=0 size=0 mode=-rw-r--r--, err=<nil>
2022/10/14 08:36:12 DEBUG : &{sha/20190531203216780282000000.jpg (rw)}: Write: len=52106, offset=0
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9740): _writeAt: size=52106, off=0
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9740): >_writeAt: n=52106, err=<nil>
2022/10/14 08:36:12 DEBUG : &{sha/20190531203216780282000000.jpg (rw)}: >Write: written=52106, err=<nil>
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: Attr:
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: >Attr: a=valid=1s ino=0 size=52106 mode=-rw-r--r--, err=<nil>
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: Open: flags=OpenReadOnly
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: Open: flags=O_RDONLY
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: newRWFileHandle:
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: >newRWFileHandle: err=<nil>
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: >Open: fd=sha/20190531203216780282000000.jpg (rw), err=<nil>
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: >Open: fh=&{sha/20190531203216780282000000.jpg (rw)}, err=<nil>
2022/10/14 08:36:12 DEBUG : &{sha/20190531203216780282000000.jpg (rw)}: Read: len=53248, offset=0
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9f40): _readAt: size=53248, off=0
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9f40): openPending:
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: vfs cache: truncate to size=52106
2022/10/14 08:36:12 DEBUG : sha: Added virtual directory entry vAddFile: "20190531203216780282000000.jpg"
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9f40): >openPending: err=<nil>
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9f40): >_readAt: n=52106, err=EOF
2022/10/14 08:36:12 DEBUG : &{sha/20190531203216780282000000.jpg (rw)}: >Read: read=52106, err=<nil>
2022/10/14 08:36:12 DEBUG : &{sha/20190531203216780282000000.jpg (rw)}: Flush:
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9740): RWFileHandle.Flush
2022/10/14 08:36:12 DEBUG : &{sha/20190531203216780282000000.jpg (rw)}: >Flush: err=<nil>
2022/10/14 08:36:12 DEBUG : &{sha/20190531203216780282000000.jpg (rw)}: Release:
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9f40): RWFileHandle.Release
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9f40): close:
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9f40): >close: err=<nil>
2022/10/14 08:36:12 DEBUG : &{sha/20190531203216780282000000.jpg (rw)}: >Release: err=<nil>
2022/10/14 08:36:12 DEBUG : &{sha/20190531203216780282000000.jpg (rw)}: Release:
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9740): RWFileHandle.Release
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9740): close:
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg: vfs cache: setting modification time to 2022-10-14 08:36:12.296898707 -0500 CDT m=+4013.179873808
2022/10/14 08:36:12 INFO  : sha/20190531203216780282000000.jpg: vfs cache: queuing for upload in 10s
2022/10/14 08:36:12 DEBUG : sha/20190531203216780282000000.jpg(0xc030cc9740): >close: err=<nil>
2022/10/14 08:36:12 DEBUG : &{sha/20190531203216780282000000.jpg (rw)}: >Release: err=<nil>
2022/10/14 08:36:19 DEBUG : vfs cache RemoveNotInUse (maxAge=30000000000, emptyOnly=false): item sha/20190531203216780282000000.jpg not removed, freed 0 bytes
2022/10/14 08:36:22 DEBUG : sha/20190531203216780282000000.jpg: vfs cache: starting upload
2022/10/14 08:36:22 DEBUG : sha/20190531203216780282000000.jpg: sha1 = bb28e7805a1b0f05035a4e878317655b1755c0a8 OK
2022/10/14 08:36:22 INFO  : sha/20190531203216780282000000.jpg: Copied (new)
2022/10/14 08:36:22 DEBUG : sha/20190531203216780282000000.jpg: vfs cache: fingerprint now "52106,2022-10-14 13:36:12.296 +0000 UTC,bb28e7805a1b0f05035a4e878317655b1755c0a8"
2022/10/14 08:36:22 DEBUG : sha/20190531203216780282000000.jpg: vfs cache: writeback object to VFS layer
2022/10/14 08:37:19 DEBUG : sha/20190531203216780282000000.jpg: vfs cache: cancelling writeback (uploading true) 0xc0306e6b60 item 250267
2022/10/14 08:37:19 DEBUG : sha/20190531203216780282000000.jpg: vfs cache: cancelling upload

I'm pretty sure it did not, though I'm not certain on that. I'll definitely keep my eye open for that if/when it happens again.

I've got the beta running on 3 of 4 servers that are tackling this sync now. I suppose the fact that I haven't run into this problem on them, yet, is perhaps a sign that commit fixed it. I'll keep you updated.

Thanks again!
Andy

(Edit: Fixed DEBUG output which included some unrelated lines)

Thank you for your logs.

I think I've figured out what is going on - the cache cleaner is trying to reap the object because it got marked as clean too early.

Hopefully this will fix it - can you give it a go please?

v1.60.0-beta.6492.016ea95f7.fix-vfs-deadlock on branch fix-vfs-deadlock (uploaded in 15-30 mins)

Thanks for the quick turnarounds!

I've been trying to try the 6492 release out this last day, and have been running into some issues. The problems aren't related specifically to that release, however, and have slowly gotten worse as my sync has progressed. What I appear to be running into is -- there are enough file entries in each of the sub-folders that I'm syncing to, that the initialization process (directory listing of sorts?) is failing.

The specific turn of events is, I can run the rclone mount, do an ls in the root and get the list of about 10 sub-folders, and everything is fine; but then as soon as I try to cp a file into one of those folders, things start to poorly. This is the step that was fairly quick early on in this project, but has progressively gotten longer and longer as the sub-folders start filling up. Now it's taking multiple hours, and the amount of memory (not disk) that the rclone mount process is using goes out of control.

This is a real shot in the dark, but is there a way that I can prevent a directory listing from happening as I'm copying files into a folder? I'm starting to think that maybe I should have adjusted our folder structure to split out all of the contents into a deeper tree, but it would be a bummer to have to start over at this point.

Thoughts?

Thanks,
Andy

May be helpful: --no-check-dest

Thanks @darthShadow. I'm not sure if/how that flag affects how rclone mount works. For some reason I thought it only worked for rclone sync and rclone copy... but I digress. Either way, I started a mount with --no-check-dest about an hour ago, and the sync still hasn't started -- the mount process is still just slowly eating up RAM. So far, it feels like --no-check-dest isn't affecting the performance problem I'm hitting.

After letting it run all night, one of my sync processes actually did end up starting, though, so that might be enough to verify that the beta fixes the VFS bug I was running into.

Yeah, it only works for copy/sync which may be faster for your case. There is no way to prevent a mount from reading the remote since it needs to know which files are present.

Listing directories should be quick with B2 unless we've messed something up!

How many files do you have in a directory? That will be the limiting factor on memory.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.