Running as systemd daemon fails with large vfs cache

What is the problem you are having with rclone?

When restarting an rclone mount via systemd, the service doesn't finish starting until the vfs queue fully enumerates. For context, I restarted the service while there were a large number of items in the cache which were previously queued for upload. When troubleshooting, I tailed the log file on the right and a terminal prompt to restart the service on the left. When I issued systemctl start rclone.service, the command hung while the log started listing all the files still in the vfs cache, noting which items were being queued up. However, the start command never actually finishes, so while rclone is listing vfs queue, systemd times out. The bug here, I believe, is that systemd never actually gets a successful start message from rclone until after the vfs queue list is made. I believe this to be true, since I can manually run the rclone mount+vfs command in the foreground with no issues.

What is your rclone version (output from rclone version)

1.55.0

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Linux 64 bit - Debian

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone.service

[Unit]
Description=Encrypted Vault rclone Service
Wants=network-online.target
After=network-online.target

[Service]
Type=notify
#Environment=RCLONE_CONFIG=/opt/rclone/rclone.conf
KillMode=none
RestartSec=5
ExecStart=/usr/bin/rclone mount c-vault: /mnt/cvault \
--allow-other \
--attr-timeout 1000h \
--buffer-size 64M \
--dir-cache-time 20m \
--log-level INFO \
--log-file /opt/rclone/logs/cvault-service.log \
--poll-interval 30s \
--use-mmap \
--tpslimit 10 \
--tpslimit-burst 10 \
--cache-dir=/bufferpool/rclone-cache-cvault \
--vfs-cache-mode writes \
--transfers 1 \
--bwlimit "08:00,1M:off 23:59,off" \
--timeout 1h \
--umask 002 \
--user-agent vaultapp \
--rc \
--rc-addr 127.0.0.1:5573
ExecStop=/bin/fusermount -uz /mnt/cvault
Restart=on-failure
User=root
Group=root

[Install]
WantedBy=multi-user.target

A log from the command with the -vv flag

After running systemctl start rclone.service:

2021/05/21 14:28:32 INFO  : Starting bandwidth limiter at 1M:offBytes/s
2021/05/21 14:36:03 INFO  : Starting transaction limiter: max 10 transactions/s with burst 10
2021/05/21 14:36:03 NOTICE: Serving remote control on http://127.0.0.1:5573/
2021/05/21 14:36:06 INFO  : path/to/file: vfs cache: queuing for upload in 5s
...
...
...
(100s of more vfs queue entries)
...
...
2021/05/21 14:30:07 INFO  : Starting bandwidth limiter at 1M:offBytes/s
2021/05/21 14:30:07 INFO  : Starting transaction limiter: max 10 transactions/s with burst 10
2021/05/21 14:30:07 Failed to start remote control: start server failed: listen tcp 127.0.0.1:5573: bind: address already in use

My systemctl settings cause the service to restart on failure after 5 seconds, which means the service failed after 90 seconds. However, the rclone service is still running (verified with htop) and the port is still in use.

To troubleshoot, I run the command manually in the foreground and everything seems to work just fine.

2021/05/21 14:36:03 INFO  : Starting bandwidth limiter at 1M:offBytes/s
2021/05/21 14:36:03 INFO  : Starting transaction limiter: max 10 transactions/s with burst 10
2021/05/21 14:36:03 NOTICE: Serving remote control on http://127.0.0.1:5573/
2021/05/21 14:36:06 INFO  : path/to/file: vfs cache: queuing for upload in 5s
...
...
...
(100s of more vfs queue entries)
...
...
2021/05/21 14:39:07 INFO  : last/file/to/upload: vfs cache: queuing for upload in 5s

As you can see from the real timestamps, the vfs queue list took 3 minutes to complete when manually running the command in the foreground.

Do you have a full debug log you run the systemctl start?

Can you share that?

Can you share the journalctl output for the service as well?

May 21 14:25:19 dal-prod-media02 systemd[1]: Starting Encrypted Vault rclone Service...

May 21 14:26:49 dal-prod-media02 systemd[1]: rclone.service: Start operation timed out. Terminating.

May 21 14:26:49 dal-prod-media02 systemd[1]: rclone.service: Failed with result 'timeout'.

May 21 14:26:49 dal-prod-media02 systemd[1]: Failed to start Encrypted Vault rclone Service.

May 21 14:26:54 dal-prod-media02 systemd[1]: rclone.service: Service RestartSec=5s expired, scheduling restart.

May 21 14:26:54 dal-prod-media02 systemd[1]: rclone.service: Scheduled restart job, restart counter is at 1.

May 21 14:26:54 dal-prod-media02 systemd[1]: Stopped Encrypted Vault rclone Service.

May 21 14:26:54 dal-prod-media02 systemd[1]: rclone.service: Found left-over process 935 (rclone) in control group while starting unit. Ignoring.

May 21 14:26:54 dal-prod-media02 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.

Used to have the same issue. Doesn't happen when running rcd as systemd unit and issuing a mount command later.
A workaround for this is telling systemd to wait longer until it quits waiting for a signal from rclone using this option under [Service]: TimeoutStartSec

1 Like

can you explain this a little further? I'm very interested about your setup.

Aye, here's a couple of systemd units, one for rclone rcd and another for rclone rc mount:

$ cat rclone-rcd.service
[Unit]
Description=rclone rcd
Wants=network-online.target
After=network-online.target

[Service]
Type=notify
TimeoutSec=300
Restart=always
RestartSec=10
ExecStart=/usr/bin/rclone rcd \
        --config /root/.config/rclone/rclone.conf \
        --cache-dir /srv/storage/rclone-cache \
        --use-mmap \
        --buffer-size 100M \
        --drive-stop-on-upload-limit \
        --drive-stop-on-download-limit \
        --rc-addr 127.0.0.1:5080 \
        --rc-user user \
        --rc-pass supersecurepasswordthatnobodywilleveruess \
        --rc-web-gui \
        --rc-web-gui-no-open-browser

[Install]
WantedBy=default.target
$ cat rclone-backups.service
[Unit]                                                                                                                                                                                                            Description=rclone backups
BindsTo=rclone-rcd.service
After=rclone-rcd.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=-/usr/bin/rclone rc --rc-addr 127.0.0.1:5080 --rc-user user --rc-pass supersecurepasswordthatnobodywilleveruess mount/unmount \
        mountPoint=/srv/shares/backups/
ExecStartPre=-/bin/fusermount -u /srv/shares/backups
ExecStartPre=-/bin/sleep 10
ExecStart=/usr/bin/rclone rc --rc-addr 127.0.0.1:5080 --rc-user user --rc-pass supersecurepasswordthatnobodywilleveruess  mount/mount \
        fs=my-remote:backups \
        mountPoint=/srv/shares/backups/ \
        mountOpt='{ \
                "AllowNonEmpty": false, \
                "AllowOther": true, \
                "AllowRoot": false, \
                "AsyncRead": true, \
                "AttrTimeout": 2000000000, \
                "Daemon": false, \
                "DaemonTimeout": 0, \
                "DebugFUSE": false, \
                "DefaultPermissions": true, \
                "ExtraFlags": [], \
                "ExtraOptions": [], \
                "MaxReadAhead": 131072, \
                "NetworkMode": false, \
                "NoAppleDouble": true, \
                "NoAppleXattr": false, \
                "VolumeName": "", \
                "WritebackCache": true \
        }' \
        vfsOpt='{ \
                "CacheMaxAge": 21600000000000, \
                "CacheMaxSize": 1073741824, \
                "CacheMode": 1, \
                "CachePollInterval": 60000000000, \
                "CaseInsensitive": false, \
                "ChunkSize": 31457280, \
                "ChunkSizeLimit": 209715200, \
                "DirCacheTime": 600000000000, \
                "DirPerms": 504, \
                "FilePerms": 438, \
                "GID": 0, \
                "NoChecksum": false, \
                "NoModTime": false, \
                "NoSeek": false, \
                "PollInterval": 60000000000, \
                "ReadAhead": 50000000, \
                "ReadOnly": false, \
                "ReadWait": 20000000, \
                "UID": 0, \
                "Umask": 63, \
                "WriteBack": 300000000000, \
                "WriteWait": 1000000000 \
        }'
ExecStop=-/usr/bin/rclone rc --rc-addr 127.0.0.1:5080 --rc-user user --rc-pass supersecurepasswordthatnobodywilleveruess mount/unmount \
        mountPoint=/srv/shares/backups/
ExecStop=-/bin/fusermount -u /srv/shares/backups

[Install]
WantedBy=default.target

Pay attention to the trailing slashes in some places. No idea why they are necessary in some cases, but I've had to use them...
Also you can see that there's plenty of attempts to unmount a share. This is because rclone really doesn't feel like unmounting them sometimes. It can take several attempts, and I have no idea why.
Your file paths may differ. These are valid on Ubuntu 20.04 and 18.04.
You may have to read the code to figure out what some of the options mean. I certainly had to do that.
Also things like umask is in decimal (when you use rclone mount your specify it in octal), timestamps are in nanoseconds or whatever Go uses, etc.
This isn't easy, but it certainly makes things a bit cleaner and allows you to use the web GUI to monitor activity from more than one mount point at a time.

I had the same issue. Just add TimeoutSec=1500 to your systemd

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.