ExecStop / systemd + rclone - not exiting correctly

lkno · October 1, 2020, 11:13pm

What is the problem you are having with rclone?

When I perform systemd restart/stop, rclone does not properly stop leaving rclone in a stale state on the machine with system, and the mounted file system does not unmount. Once it is stopped, I cannot restart the systemd service as it is in a broken state.

My belief is that this is due to rclone not properly stopping, and fusermount not being able to forcefully unmount the efs/remote and/or the port not closing properly, leaving the process in a hung state. This happens if you have a daemon process continually spawning from the looks of it, or if rclone gets in a weird state.

What is your rclone version (output from `rclone version`)

rclone version
rclone v1.53.1
- os/arch: linux/amd64
- go version: go1.15

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Ubuntu 20.04.1

Which cloud storage system are you using? (eg Google Drive)

Google Drive File Stream - Shared Drives

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

The rclone config contents with secrets removed.

A log from the command with the `-vv` flag

andrew@nas:/etc/systemd/system$ sudo systemctl status rclone-ebooks-crypt.service 
● rclone-ebooks-crypt.service - RClone Service
     Loaded: loaded (/etc/systemd/system/rclone-ebooks-crypt.service; disabled; vendor preset: enabled)
     Active: active (running) since Thu 2020-10-01 15:40:00 PDT; 2min 32s ago
   Main PID: 2191 (rclone)
      Tasks: 12 (limit: 57733)
     Memory: 42.0M
     CGroup: /system.slice/rclone-ebooks-crypt.service
             └─2191 /usr/bin/rclone mount gdriveebooks-crypt: /mnt/rclone/gebooks/books --allow-other --buffer-size 256M --dir-cache-time 1000h --log-level INFO --log-file /var/log/rclone/books-mount.log --poll-interval 15s --timeout 1>

Oct 01 15:39:58 nas systemd[1]: Starting RClone Service...
Oct 01 15:40:00 nas systemd[1]: Started RClone Service.
andrew@nas:/etc/systemd/system$ sudo ps aux | grep 2191
andrew      2191  0.1  0.1 742824 58880 ?        Ssl  15:39   0:00 /usr/bin/rclone mount gdriveebooks-crypt: /mnt/rclone/gebooks/books --allow-other --buffer-size 256M --dir-cache-time 1000h --log-level INFO --log-file /var/log/rclone/books-mount.log --poll-interval 15s --timeout 1h --umask 002 --rc --rc-addr 127.0.0.1:5584
andrew     32976  0.0  0.0   6432  2612 pts/0    S+   15:42   0:00 grep --color=auto 2191
andrew@nas:/etc/systemd/system$ sudo systemctl stop rclone-ebooks-crypt
andrew@nas:/etc/systemd/system$ sudo ps aux | grep 2191
andrew      2191  0.1  0.1 742824 58880 ?        Ssl  15:39   0:00 /usr/bin/rclone mount gdriveebooks-crypt: /mnt/rclone/gebooks/books --allow-other --buffer-size 256M --dir-cache-time 1000h --log-level INFO --log-file /var/log/rclone/books-mount.log --poll-interval 15s --timeout 1h --umask 002 --rc --rc-addr 127.0.0.1:5584
andrew     36470  0.0  0.0   6432  2500 pts/0    S+   15:42   0:00 grep --color=auto 2191

andrew@nas:/etc/systemd/system$ sudo journalctl -u rclone-ebooks-crypt.service
-- Logs begin at Sun 2020-09-27 11:31:42 PDT, end at Thu 2020-10-01 15:54:31 PDT. --
Sep 27 11:32:17 nas systemd[1]: Stopping RClone Service...
Sep 27 11:32:17 nas systemd[1]: rclone-ebooks-crypt.service: Succeeded.
Sep 27 11:32:17 nas systemd[1]: Stopped RClone Service.
-- Reboot --
Sep 27 11:32:36 nas systemd[1]: Starting RClone Service...
Sep 27 11:32:37 nas systemd[1]: Started RClone Service.
Oct 01 15:33:45 nas systemd[1]: Stopping RClone Service...
Oct 01 15:33:45 nas systemd[1]: rclone-ebooks-crypt.service: Succeeded.
Oct 01 15:33:45 nas systemd[1]: Stopped RClone Service.
Oct 01 15:33:45 nas systemd[1]: rclone-ebooks-crypt.service: Found left-over process 2036 (rclone) in control group while starting unit. Ignoring.
Oct 01 15:33:45 nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Oct 01 15:33:45 nas systemd[1]: Starting RClone Service...
Oct 01 15:33:45 nas systemd[1]: rclone-ebooks-crypt.service: Main process exited, code=exited, status=1/FAILURE
Oct 01 15:33:45 nas systemd[1]: rclone-ebooks-crypt.service: Failed with result 'exit-code'.
Oct 01 15:33:45 nas systemd[1]: Failed to start RClone Service.
Oct 01 15:33:50 nas systemd[1]: rclone-ebooks-crypt.service: Scheduled restart job, restart counter is at 1.
Oct 01 15:33:50 nas systemd[1]: Stopped RClone Service.
Oct 01 15:33:50 nas systemd[1]: rclone-ebooks-crypt.service: Found left-over process 2036 (rclone) in control group while starting unit. Ignoring.
Oct 01 15:33:50 nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Oct 01 15:33:50 nas systemd[1]: Starting RClone Service...
Oct 01 15:33:50 nas systemd[1]: rclone-ebooks-crypt.service: Main process exited, code=exited, status=1/FAILURE
Oct 01 15:33:50 nas systemd[1]: rclone-ebooks-crypt.service: Failed with result 'exit-code'.
Oct 01 15:33:50 nas systemd[1]: Failed to start RClone Service.
Oct 01 15:33:55 nas systemd[1]: rclone-ebooks-crypt.service: Scheduled restart job, restart counter is at 2.
Oct 01 15:33:55 nas systemd[1]: Stopped RClone Service.
Oct 01 15:33:55 nas systemd[1]: rclone-ebooks-crypt.service: Found left-over process 2036 (rclone) in control group while starting unit. Ignoring.
Oct 01 15:33:55 nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Oct 01 15:33:55 nas systemd[1]: Starting RClone Service...
Oct 01 15:33:56 nas systemd[1]: rclone-ebooks-crypt.service: Main process exited, code=exited, status=1/FAILURE
Oct 01 15:33:56 nas systemd[1]: rclone-ebooks-crypt.service: Failed with result 'exit-code'.
Oct 01 15:33:56 nas systemd[1]: Failed to start RClone Service.
Oct 01 15:34:01 nas systemd[1]: rclone-ebooks-crypt.service: Scheduled restart job, restart counter is at 3.
Oct 01 15:34:01 nas systemd[1]: Stopped RClone Service.
Oct 01 15:34:01 nas systemd[1]: rclone-ebooks-crypt.service: Found left-over process 2036 (rclone) in control group while starting unit. Ignoring.
Oct 01 15:34:01 nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Oct 01 15:34:01 nas systemd[1]: Starting RClone Service...
Oct 01 15:34:01 nas systemd[1]: rclone-ebooks-crypt.service: Main process exited, code=exited, status=1/FAILURE
Oct 01 15:34:01 nas systemd[1]: rclone-ebooks-crypt.service: Failed with result 'exit-code'.
Oct 01 15:34:01 nas systemd[1]: Failed to start RClone Service.
Oct 01 15:34:06 nas systemd[1]: rclone-ebooks-crypt.service: Scheduled restart job, restart counter is at 4.
Oct 01 15:34:06 nas systemd[1]: Stopped RClone Service.
Oct 01 15:34:06 nas systemd[1]: rclone-ebooks-crypt.service: Found left-over process 2036 (rclone) in control group while starting unit. Ignoring.
Oct 01 15:34:06 nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Oct 01 15:34:06 nas systemd[1]: Starting RClone Service...
Oct 01 15:34:06 nas systemd[1]: rclone-ebooks-crypt.service: Main process exited, code=exited, status=1/FAILURE
Oct 01 15:34:06 nas systemd[1]: rclone-ebooks-crypt.service: Failed with result 'exit-code'.
Oct 01 15:34:06 nas systemd[1]: Failed to start RClone Service.
Oct 01 15:34:11 nas systemd[1]: rclone-ebooks-crypt.service: Scheduled restart job, restart counter is at 5.
Oct 01 15:34:11 nas systemd[1]: Stopped RClone Service.
Oct 01 15:34:11 nas systemd[1]: rclone-ebooks-crypt.service: Found left-over process 2036 (rclone) in control group while starting unit. Ignoring.
Oct 01 15:34:11 nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Oct 01 15:34:11 nas systemd[1]: Starting RClone Service...
Oct 01 15:34:11 nas systemd[1]: rclone-ebooks-crypt.service: Main process exited, code=exited, status=1/FAILURE
Oct 01 15:34:11 nas systemd[1]: rclone-ebooks-crypt.service: Failed with result 'exit-code'.
Oct 01 15:34:11 nas systemd[1]: Failed to start RClone Service.
Oct 01 15:34:16 nas systemd[1]: rclone-ebooks-crypt.service: Scheduled restart job, restart counter is at 6.
Oct 01 15:34:16 nas systemd[1]: Stopped RClone Service.
Oct 01 15:34:16 nas systemd[1]: rclone-ebooks-crypt.service: Found left-over process 2036 (rclone) in control group while starting unit. Ignoring.
Oct 01 15:34:16 nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Oct 01 15:34:16 nas systemd[1]: Starting RClone Service...
Oct 01 15:34:17 nas systemd[1]: rclone-ebooks-crypt.service: Main process exited, code=exited, status=1/FAILURE
Oct 01 15:34:17 nas systemd[1]: rclone-ebooks-crypt.service: Failed with result 'exit-code'.
Oct 01 15:34:17 nas systemd[1]: Failed to start RClone Service.
(Repeating lines)
Oct 01 15:39:06 nas systemd[1]: Starting RClone Service...
Oct 01 15:39:06 nas systemd[1]: rclone-ebooks-crypt.service: Main process exited, code=exited, status=1/FAILURE
Oct 01 15:39:06 nas systemd[1]: rclone-ebooks-crypt.service: Failed with result 'exit-code'.
Oct 01 15:39:06 nas systemd[1]: Failed to start RClone Service.
Oct 01 15:39:11 nas systemd[1]: rclone-ebooks-crypt.service: Scheduled restart job, restart counter is at 63.
Oct 01 15:39:11 nas systemd[1]: Stopped RClone Service.
Oct 01 15:39:11 nas systemd[1]: rclone-ebooks-crypt.service: Found left-over process 2036 (rclone) in control group while starting unit. Ignoring.
Oct 01 15:39:11 nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Oct 01 15:39:11 nas systemd[1]: Starting RClone Service...
Oct 01 15:39:11 nas systemd[1]: rclone-ebooks-crypt.service: Main process exited, code=exited, status=1/FAILURE
Oct 01 15:39:11 nas systemd[1]: rclone-ebooks-crypt.service: Failed with result 'exit-code'.
Oct 01 15:39:11 nas systemd[1]: Failed to start RClone Service.
Oct 01 15:39:16 nas systemd[1]: rclone-ebooks-crypt.service: Scheduled restart job, restart counter is at 64.
Oct 01 15:39:16 nas systemd[1]: Stopped RClone Service.
Oct 01 15:39:16 nas systemd[1]: rclone-ebooks-crypt.service: Found left-over process 2036 (rclone) in control group while starting unit. Ignoring.
Oct 01 15:39:16 nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Oct 01 15:39:16 nas systemd[1]: Starting RClone Service...
Oct 01 15:39:16 nas systemd[1]: rclone-ebooks-crypt.service: Main process exited, code=exited, status=1/FAILURE
Oct 01 15:39:16 nas systemd[1]: rclone-ebooks-crypt.service: Failed with result 'exit-code'.
Oct 01 15:39:16 nas systemd[1]: Failed to start RClone Service.
Oct 01 15:39:21 nas systemd[1]: rclone-ebooks-crypt.service: Scheduled restart job, restart counter is at 65.
Oct 01 15:39:21 nas systemd[1]: Stopped RClone Service.
Oct 01 15:39:21 nas systemd[1]: rclone-ebooks-crypt.service: Found left-over process 2036 (rclone) in control group while starting unit. Ignoring.
Oct 01 15:39:21 nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Oct 01 15:39:21 nas systemd[1]: Starting RClone Service...
Oct 01 15:39:21 nas systemd[1]: rclone-ebooks-crypt.service: Main process exited, code=exited, status=1/FAILURE
Oct 01 15:39:21 nas systemd[1]: rclone-ebooks-crypt.service: Failed with result 'exit-code'.
Oct 01 15:39:21 nas systemd[1]: Failed to start RClone Service.
Oct 01 15:39:27 nas systemd[1]: rclone-ebooks-crypt.service: Scheduled restart job, restart counter is at 66.
Oct 01 15:39:27 nas systemd[1]: Stopped RClone Service.
Oct 01 15:39:27 nas systemd[1]: rclone-ebooks-crypt.service: Found left-over process 2036 (rclone) in control group while starting unit. Ignoring.
Oct 01 15:39:27 nas systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Oct 01 15:39:27 nas systemd[1]: Starting RClone Service...
Oct 01 15:39:27 nas systemd[1]: rclone-ebooks-crypt.service: Main process exited, code=exited, status=1/FAILURE
Oct 01 15:39:27 nas systemd[1]: rclone-ebooks-crypt.service: Failed with result 'exit-code'.
Oct 01 15:39:27 nas systemd[1]: Failed to start RClone Service.
Oct 01 15:39:32 nas systemd[1]: rclone-ebooks-crypt.service: Stop job pending for unit, delaying automatic restart.
Oct 01 15:39:37 nas systemd[1]: rclone-ebooks-crypt.service: Stop job pending for unit, delaying automatic restart.
Oct 01 15:39:37 nas systemd[1]: rclone-ebooks-crypt.service: Got notification message from PID 2036, but reception only permitted for main PID which is currently not known
Oct 01 15:39:38 nas systemd[1]: Stopped RClone Service.
-- Reboot --
Oct 01 15:39:58 nas systemd[1]: Starting RClone Service...
Oct 01 15:40:00 nas systemd[1]: Started RClone Service.
Oct 01 15:42:51 nas systemd[1]: Stopping RClone Service...
Oct 01 15:42:51 nas systemd[1]: rclone-ebooks-crypt.service: Succeeded.
Oct 01 15:42:51 nas systemd[1]: Stopped RClone Service.

rclone-ebooks-crypt.service

[Unit]
Description=RClone Service
Wants=network-online.target
Before=docker.service
After=network-online.target

[Service]
Type=notify
Environment=RCLONE_CONFIG=/home/andrew/.config/rclone/rclone.conf
KillMode=none
RestartSec=5
ExecStart=/usr/bin/rclone mount gdriveebooks-crypt: /mnt/rclone/gebooks/books \
--allow-other \
--buffer-size 256M \
--dir-cache-time 1000h \
--log-level INFO \
--log-file /var/log/rclone/books-mount.log \
--poll-interval 15s \
--timeout 1h \
--umask 002 \
--rc \
--rc-addr 127.0.0.1:5584

ExecStop=/bin/fusermount -uz /mnt/rclone/gebooks/books
Restart=on-failure
User=andrew
Group=andrew

[Install]
WantedBy=multi-user.target

Log file: /var/log/rclone/books-mount.log

2020/10/01 15:39:59 NOTICE: Serving remote control on http://127.0.0.1:5563/
2020/10/01 15:39:59 NOTICE: Serving remote control on http://127.0.0.1:5584/
2020/10/01 15:40:00 NOTICE: Serving remote control on http://127.0.0.1:5563/
2020/10/01 15:40:05 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:40:10 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:40:15 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:40:20 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:40:26 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:40:31 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:40:36 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:40:41 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:40:47 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:40:52 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:40:57 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:41:02 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:41:08 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:41:13 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:41:18 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:41:23 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:41:29 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:41:34 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:41:39 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:41:44 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:41:49 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:41:55 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:42:00 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use
2020/10/01 15:42:05 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5563: bind: address already in use

[gdriveebooks]
type = drive
client_id = OMIT.apps.googleusercontent.com
client_secret = OMIT
token = {"access_token":"","token_type":"Bearer","refresh_token":"","expiry":"2020-10-01T16:42:29.277101958-07:00"}
team_drive = DriveFolderID

[gdriveebooks-crypt]
type = crypt
remote = gdriveebooks:/eBooks/
password = OMIT
password2 = OMIT1

Animosity022 · October 1, 2020, 11:47pm

The way fusermount works is it is a lazy unmount so you have to stop IO to that mount point and it'll unmount.

With killmode none (which you should use), it just times out eventually as I'd guess you have IO still hitting / accessing the mountpoint.

You'd want to stop the services hitting the mount before stopping rclone. I have those dependencies and Requires in my systemd setup.

random404 · October 2, 2020, 12:48am

I have this issue since forever too.

I just fix it with

ExecStartPre=-/bin/fusermount -uz /YOUR/FOLDER

Don't remove the -

Animosity022 · October 2, 2020, 1:19am

If no IO is hitting the mount, the first fusermount just finishes and you can remount. Running multiple fusermounts should not change it.

Do you normally have IO hitting the mount still ?

random404 · October 2, 2020, 8:46am

If I'm restarting the rclone mount then I don't care if there's something writing to it or not. I want it to just stop and done.

I always restart my mounts with this execstartpre command and it just works

Animosity022 · October 2, 2020, 9:42am

Right, I understand that but fuse won't give up the mount and running that command won't do anything.

Here's my mount:

Here's my with some proc using the mount:

The second fusermount does nothing as the mount isn't found:

The mount is still active until I remove my processes from hitting the mount.

Once I leave the process I had using the mountpoint, it closes from the first fusermount

CraftyClown · October 2, 2020, 1:00pm

I'll chime in here, as @Animosity022 and @random404 know I have recently been experiencing exactly the same issue, which is quickly solved by killing the processes and then restarting the mount, however it's an annoyance.

I believe mine is solely down to IO from docker containers causing fusermount to incorrectly un-mount and leave processes behind.
@Animosity022 I know I said it was happening with docker disabled, but I believe I was fooled by the fact I was still having the issues following a full system reboot, but this was probably due to the left over processes from when Docker had been running.

@random404 your workaround doesn't work for me as there is nothing to unmount.

@lkno do you happen to have any docker containers accessing your mount? also you didn't say, but is this a new issue?

Animosity022 · October 2, 2020, 1:07pm

Killing the mount with processes on it leaves the the mount point in an error state:

felix@gemini:~$ ps -ef | grep rclone | grep test
felix    2251500 2220729  0 09:03 pts/0    00:00:00 rclone mount gcrypt: /home/felix/test
felix@gemini:~$ date
Fri 02 Oct 2020 09:03:53 AM EDT
felix@gemini:~$ kill 2251500
felix@gemini:~$ date
Fri 02 Oct 2020 09:04:08 AM EDT
felix@gemini:~$

and you'd see

felix@gemini:~/test$ ls
mounted  Movies  TV
felix@gemini:~/test$ date
Fri 02 Oct 2020 09:03:50 AM EDT
felix@gemini:~/test$ ls
ls: cannot open directory '.': Transport endpoint is not connected
felix@gemini:~/test$ date
Fri 02 Oct 2020 09:04:07 AM EDT
felix@gemini:~/test$

Which is why killmode=none is in the script as it should not be killed and the proper requirements need to be made to stop processes from accessing the mount.

So nothing is going 'wrong' as you need to build the requirements to stop services/processes from accessing the mount. You can't unmount a file system that has processes accessing it.

When my mount stops via systemd, all things relating to the mount stop as well as they require the rclone.service to be running so in my case fusermount always works.

CraftyClown · October 2, 2020, 1:46pm

Yes, absolutely and in my case I also have the mount set up as a requirement of docker, however the issue continues, so I do believe my problem is docker related.
I'm currently testing running the rclone mount from a container to see if that alleviates the issue... but that's another story all together

random404 · October 2, 2020, 9:49pm

All I can tell is that I used lots of issues till I added that command in my systemd.

So what is the recommended way to restart the mount ? If I'm manually restarting the mount, then I want to force it and I don't care if there is stuff using it already

random404 · October 2, 2020, 10:02pm

My command works for my setup. Maybe because it's read-only so fusermount lets me force it anyway?

However, there is still a more extreme solution it seems:

Animosity022 · October 2, 2020, 10:12pm

Think of a fuse mount as any other mount or a mounted disk on Windows.

Before you'd unmount those, you'd have to stop any processes or application that are accessing the mount point.

I do that by having a rclone and mergerfs rolled up to a gmedia service.

In any service that uses them (Sonarr/Radarr/Plex/etc), I have this in the service file:

Requires=gmedia.service
Wants=gmedia.service

So if I was to stop gmedia, it would stop all the services that require it.

You can see if something is using a file system by lsof as root and it will show open file handles:

root@gemini:~# lsof /GD
lsof: WARNING: can't stat() fuse.rclone file system /home/felix/test
      Output information may be incomplete.
COMMAND   PID USER   FD   TYPE DEVICE   SIZE/OFF                 NODE NAME
mergerfs 2213 root   16r   REG   0,51 2495736874 10580154092584230383 /GD/TV/M-A-S-H/M-A-S-H - S09E01.mkv
root@gemini:~#

So in this case for me to umount that, I'd have to stop mergrerfs, which in turn is being used by Plex/etc.

The fusermount tries to umount but it can't release the final call to rclone until there is nothing using the file system.

lkno · October 4, 2020, 12:30am

@lkno do you happen to have any docker containers accessing your mount? also you didn't say, but is this a new issue?

Yes, and I think this is the main problem.

You can't (as far as I know) call out to specific containers within docker, only docker as a whole. And restarting all docker services is probably not the most convenient method here.

It's been an issue for a while tbh. I just have ignored it and restarted the system.
I think the way forward is to try and use rclone in docker, but, haven't found a good solution to that yet.

CraftyClown · October 4, 2020, 10:49am

I literally did the exact same thing this week and pushed my rclone install into a container and it has fixed all my issues with systemd and left over processes. I'm using the official rclone docker.

You can control individual containers using Portainer, if you wanted to test which are causing the issue with the mount, but to be honest with you, if you're comfortable using docker then it's a no brainer doing the same with rclone

Animosity022 · October 4, 2020, 11:57am

It doesn't fix it as it just masks the issue since you 'rip' out out the mount by stopping the container.

It's has the ability to mask an issue rather than fixing the root of the problem.

As I've said a few times, it's best to figure out what application/processes are preventing rclone from stopping.

If you check for open files on the mount before running your stop, it's not that hard to figure out.

Dockers add a layer on top of everything and you pay a performance cost for that along with the complexity of managing another layer and you have to be careful to make sure to run official dockers are you are dependent on another person to make sure things work properly in terms of versions/dependencies.

lkno · October 6, 2020, 1:57am

What specific container did you use, or did you role your own?

lkno · October 6, 2020, 2:01am

From my experience, this usually happens because the network faults and can't maintain the connection to the drive mount, causing mergerfs to drop it from it's file system, which then subsequently drops it from whatever Docker Host is being used. Then when it attempts to come back online, mergerfs is stuck with stale file pointers to that.

Once it fails again and everything drops off, I will report back with a full lsof to show.

Even then, rclone shouldn't be completely unresponsive to other mounts if a single mount is failing/looping like above.

Animosity022 · October 6, 2020, 2:08am

What do you mean by that? You lose internet? A network based mount losing it's network would cause some odd things to happen.

CraftyClown · October 7, 2020, 8:36pm

I used the official rclone container on the download page

random404 · October 8, 2020, 12:17am

If you don't care just install psmisc and run

fuser -k folder

Add -9 after -k if you want to really kill everything using it. My mount is read only so it's safe to kill everything using it.