WebDAV with persistent connections?

What is the problem you are having with rclone?

hey! been using rclone to sync stuff to a WebDAV server, which mostly works great :+1: love how flexible this thing is!

but, I've noticed that it keeps opening a new http session / tcp connection for every request, which causes a bit of a slowdown... And it looks like opening new connections doesn't parallelize, so for really small files you'll get the same transfer speed regardless of how many threads you use (so --transfers 1 --checkers 1 runs at the same speed as --transfers 8 --checkers 8 for instance).

the same thing happens if you run rclone on both ends, so i'll provide an example with that.

am I right in guessing that connection:keep-alive / session reuse is not supported over WebDAV? or perhaps I forgot an important flag somewhere?

Run the command 'rclone version' and share the full output of the command.

rclone v1.63.0-beta.6851.3affba6fa
- os/version: fedora 37 (64 bit)
- os/kernel: 6.1.15-200.fc37.x86_64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.20.2
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Rclone WebDAV

The command you were trying to run (eg rclone copy /tmp remote:tmp)

  • config: ~/bin/rclone config create dav0 webdav url=http://127.0.0.1:3939 vendor=other
  • server: ~/bin/rclone serve webdav . --addr :3939
  • client: time ~/bin/rclone sync . dav0: --transfers 1 --checkers 1

The rclone config contents with secrets removed.

[dav0]
type = webdav
url = http://127.0.0.1:3939
vendor = other

A log from the command with the -vv flag

Rclone should be using persistent http connections in the client at least. Not so sure about the server but I think it should be also.

Looking at your wireshark pic I see the same port number repeated so there is some connection re-use but not much - I agree. Rclone sets the number of connections to hold on to to --checkers + --transfers + 1 I think, so there should only be 3 persistent connections.

I think this is more likely to be a server problem than a client problem.

When I try a test with a nextcloud instance rclone opens --transfers connections and keeps re-using them.

When I try a test with the rclone serve webdav I see exactly what you see.

I checked the server code - we set the idle timeout to 60 seconds, so the connections should hang around that long for re-use.

It looks like the server closes the connections after every two transactions. Or at least the connections get closed...

2023/03/23 10:54:07 http connection: 127.0.0.1:54450 state=new
2023/03/23 10:54:07 http connection: 127.0.0.1:54450 state=active
2023/03/23 10:54:07 DEBUG : /1000files/ferejej3gux/raqulof3fin/pase/vufubeq7/wine/padumin0l/pikuton7f: OpenFile: flags=O_RDWR|O_CREATE|O_TRUNC, perm=-rw-rw-rw-
2023/03/23 10:54:07 DEBUG : 1000files/ferejej3gux/raqulof3fin/pase/vufubeq7/wine/padumin0l/pikuton7f: Open: flags=O_RDWR|O_CREATE|O_TRUNC
2023/03/23 10:54:07 DEBUG : 1000files/ferejej3gux/raqulof3fin/pase/vufubeq7/wine/padumin0l: Added virtual directory entry vAddFile: "pikuton7f"
2023/03/23 10:54:07 DEBUG : 1000files/ferejej3gux/raqulof3fin/pase/vufubeq7/wine/padumin0l/pikuton7f: >Open: fd=1000files/ferejej3gux/raqulof3fin/pase/vufubeq7/wine/padumin0l/pikuton7f (w), err=<nil>
2023/03/23 10:54:07 DEBUG : /1000files/ferejej3gux/raqulof3fin/pase/vufubeq7/wine/padumin0l/pikuton7f: >OpenFile: fd=1000files/ferejej3gux/raqulof3fin/pase/vufubeq7/wine/padumin0l/pikuton7f (w), err=<nil>
2023/03/23 10:54:07 DEBUG : 1000files/ferejej3gux/raqulof3fin/pase/vufubeq7/wine/padumin0l: Added virtual directory entry vAddFile: "pikuton7f"
2023/03/23 10:54:07 DEBUG : Local file system at /tmp/webdav: File to upload is small (10 bytes), uploading instead of streaming
2023/03/23 10:54:07 DEBUG : 1000files/ferejej3gux/raqulof3fin/pase/vufubeq7/wine/padumin0l/pikuton7f: md5 = e010fdf08d4dccdf6a7aaa27644bd8e5 OK
2023/03/23 10:54:07 INFO  : 1000files/ferejej3gux/raqulof3fin/pase/vufubeq7/wine/padumin0l/pikuton7f: Copied (new)
2023/03/23 10:54:07 DEBUG : 1000files/ferejej3gux/raqulof3fin/pase/vufubeq7/wine/padumin0l: Added virtual directory entry vAddFile: "pikuton7f"
2023/03/23 10:54:07 INFO  : /1000files/ferejej3gux/raqulof3fin/pase/vufubeq7/wine/padumin0l/pikuton7f: PUT from 127.0.0.1:54450
2023/03/23 10:54:07 http connection: 127.0.0.1:54450 state=idle
2023/03/23 10:54:07 http connection: 127.0.0.1:54450 state=closed

Interestingly when I ran the client with -vv --dump bodies (which will tend to serialize the connections) it re-uses the connection perfectly.

Can you use your wireshark skills to find out whether the client or the server initiates the close of the TCP connection? That would be useful information.

Thanks for looking into this!

The curious thing is I first noticed this beavior when running rclone against my own webdav server, which I think is fairly compliant -- at least davfs2 did everything over one connection when I used that instead of rclone... but maybe davfs2 is just very forgiving :slight_smile:

I recall checking which side initiated the connection shutdown, and I'm 99% sure it was rclone-client -- but I'll double-check that as soon as I'm home. I have not tried running davfs2 against rclone-server, so I want to try that as well (unless you beat me to it!)

That kind of makes sense. It is either a bug in the Go standard library server code (not impossible) or it is the client doing something weird.

Annoying that the bug disappears when using -vv --dump bodies as that is my main debugging tool!

I'd be interested in your double check and davfs vs rclone server as I still don't know whether to investigate the server or the client!

alright, so while running an rclone sync against my own server, I noticed that rclone-client will disconnect when receiving a response to a MKCOL or PUT if the server response has a non-zero content length. The RFC is a bit ambiguous on whether this is permitted, so I've gone ahead and removed the response bodies from my server. One problem down :>

but, looks like that's not all -- rclone-client may randomly panic-close a connection when it receives a 207 Multi-Status response -- it sends the server a tcp packet with the RST flag (the FIN flag would have been a normal shutdown request). I can't tell why this is; the response headers and body is identical to all the other ones, and sometimes it happens in bursts for a handful of files.

there are some more peculiarities to that last issue,

  • no warning or anything when it happens, not even in -vv
  • it happens much more often over HTTPS than over HTTP
  • like you mentioned, it goes away if you add --dump bodies to the client
  • and it also goes away if you simulate a 10ms latency to the network; sudo tc qdisc replace dev lo root netem delay 10ms (remove it with sudo tc qdisc delete dev lo root)

could be a race perhaps? I've done all the tests with --transfers 1 --checkers 1 for simplicity, but it behaves similarly with other values too

regarding davfs2 - it successfully does all uploads (PUT, UNLOCK) over a single session when copying files to an rclone webdav server. There's a batch of initial calls (HEAD, MKCOL, LOCK) which are done on separate connections, but it behaves the same for all webdav servers... maybe LOCK is an expensive operation on some servers and that's why they did it that way? Just guessing...

Ah, that is very useful info.

If you read this bit of the Go docs, that will start to make sense

If the returned error is nil, the Response will contain a non-nil Body which the user is expected to close. If the Body is not both read to EOF and closed, the Client's underlying RoundTripper (typically Transport) may not be able to re-use a persistent TCP connection to the server for a subsequent "keep-alive" request.

So if rclone is sent stuff in a body that it doesn't read then the connection can't be re-used.

Luckily this is all abstracted through an internal library and adding a bit of draining the bodies there appears to have fixed the problems with the rclone server at least.

This is one bit of the puzzle I'm not sure about. I think the http library can drain http connections on its own (but I can't find that bit of code). Maybe it only does that if they have been idle for a while.

Please give this a go:

v1.63.0-beta.6867.abbf80afb.fix-webdav-keepalive on branch fix-webdav-keepalive (uploaded in 15-30 mins)

Nice, this is looking much better!

The random disconnects would happen even when the server replied without a body (content-length 0), but the beta you linked seems to have fixed those as well, so everything regarding connection reuse looks solid now :>

However, I am surprised to say there was no performance gain. There might be some hints in this wireshark screenshot which was taken with rclone on both ends, and running the client with --transfers 8 --checkers 8,

  • when rclone-client receives a server response, it seems to wait for "exactly" 0.01 seconds before it sends the next request
  • it doesn't parallelize over multiple connections even with --transfers 8; instead it multiplexes all the requests on one TCP connection, running them all in series

I want to see if I can figure out what's causing the 0.01 sec delay, but I don't think I'll have a good shot at the multithreading...

Also let me know if you prefer to handle the remaining issues in a different thread or place :>

Glad that fix worked - it will help with all backends which use lib/http.

This is the rclone pacer doing its job. This is to stop rclone overloading the server. It can be tweaked though.

This sounds like a bug unless all the files are very short. Can you try with bigger files?

Aha, yep that's it -- reducing minSleep to 0 makes it as fast as I hoped for :>

And that was also why the files didn't parallelize, since they were small enough to finish uploading well within 1ms.

Guess that's it then, thanks again :+1:

I've merged the keepalive fix to master now which means it will be in the latest beta in 15-30 minutes and released in v1.63

If you want to make a PR to make minsleep configurable you could copy these ones from the google drive backend

  --drive-pacer-burst int             Number of API calls to allow without sleeping (default 100)
  --drive-pacer-min-sleep Duration    Minimum time to sleep between API calls (default 100ms)

They don't need to be configurable in most backends but webdav has a lot of different providers.

Yeah, I'll do that!

Burst doesn't seem to be part of the default pacer however -- would it be alright to only expose min-sleep as a setting? or would it be preferable to add the burst parameter to the default pacer, so that other backends can use it as well?

Just adding min sleep is fine :slightly_smiling_face:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.