Webdav deadlock from high default timeout

npr · March 15, 2024, 11:20pm

What is the problem you are having with rclone?

High default timeout value causes webdav to deadlock if the backend fails

Run the command 'rclone version' and share the full output of the command.

$ rclone version
rclone v1.65.2

os/version: ubuntu 22.04 (64 bit)
os/kernel: 5.15.0-100-generic (x86_64)
os/type: linux
os/arch: amd64
go/version: go1.21.6
go/linking: static
go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Webdav backend via MEGA webdav server

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

Systemd mount with options:

Options=rw,_netdev,allow_other,args2env,config=/root/.config/rclone/rclone.conf,timeout=30s,async-read,use-mmap,transfers=6,multi-thread-streams=0,disable-http2,uid=1000,gid=1000,cache-dir=/mnt/other/rclone,fast-list,vfs-cache-max-age=8760h,vfs-cache-max-size=512G,vfs-cache-mode=full,buffer-size=64M,vfs-read-ahead=256M,max-read-ahead=128K

The rclone config contents with secrets removed.

[mdav]
type = webdav
url = http://127.0.0.1:4443/secret

A log from the command with the `-vv` flag

Its just some attr checks and ChunkedRead's on the file.
I deleted the log as it was 2GB.

If i run a seek with ffmpeg, this causes a chunked read from the webdav.
But the mega-cmd dav wont return anything, cause the node is bugged.
This in turn causes the read to restart every once in a while.
But also causes ffmpeg to be unkillable in linux, and become a zombie after a while.

It should be noted that its a docker container that reads from the webdav backend, and said container becomes the zombie.

I found that i could work around this by reducing the timeout from 5 minutes to 30 seconds, which simply causes it to fail after a while.

ncw · March 16, 2024, 12:09pm

Operating systems (Linux, Windows, macOS, etc) expect their disks to be 100% reliable. Your hard disk will only return an error to the OS when it has a hardware propblem, and even then the OS will retry it many times to try to fix it.

Rclone tries to retry errors too and has long timeouts to maximise the chance of things succeeding as things go downhill very quickly if rclone starts returning errors to the OS. This gives the symptoms you see.

You can use fusermount -z -u /path/to/mnt to unmount rclone and kill off anything that is going on. This doesn't always work but it is pretty good. Sometimes you will end up with unkillable processes stuck in D state though and a reboot is the only way of getting rid of those.

This means likely it would have failed after 5 minutes, but if the 30 second timeout works for you then please use it.

system · April 15, 2024, 12:10pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.