Rclone as docker volume, docker-compose often times out

What is the problem you are having with rclone?

I'm trying to use rclone with Docker in order to use B2 as a backend for a number of docker containers. I am running under docker-compose with the docker plugin.

In general rclone is working great and doing exactly what I want. However, often after a docker-compose down and docker-compose up I'm getting timeouts. The only way for me to correct this has been to restart the docker service, disable the rclone plugin, delete the plugin state, enable the plugin and try again. Often this takes multiple attempts before it comes right.

How this looks from a high level:

 docker-compose up
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

Run the command 'rclone version' and share the full output of the command.

rclone v1.53.3-DEV
- os/arch: linux/amd64
- go version: go1.15.4

Also tried:

rclone v1.58.0
- os/version: ubuntu 21.10 (64 bit)
- os/kernel: 5.13.0-40-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.17.8
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Backblaze B2

The command you were trying to run (eg rclone copy /tmp remote:tmp)

docker-compose up with a volume defined thusly:

  mediatv:
    driver: rclone:latest
    driver_opts:
      remote: 'tv-encrypted:media/tv'
      allow_other: 'true'
      vfs_cache_mode: full
      poll_interval: 0

The rclone config contents with secrets removed.

[media-unencrypted]
type = b2
account = xxx
key = xxx
hard_delete = true

[media-encrypted]
type = crypt
remote = media-unencrypted:s-media
filename_encryption = standard
directory_name_encryption = true
password = xxx
password2 = xxx

A log from the command with the -vv flag

I'm trying to get some better logging, but here is a starting point:

Apr 24 03:41:59 vultr sudo[2230]:     root : TTY=pts/0 ; PWD=/root/media-repo ; USER=root ; COMMAND=/usr/bin/docker plugin rm rclone/docker-volume-rclone:amd64
Apr 24 03:42:34 vultr dockerd[854]: time="2022-04-24T03:42:34Z" level=error msg="2022/04/24 03:42:34 DEBUG : Setting --config \"/data/config/rclone.conf\" from environment variable RCLONE_CONFIG=\"/data/config/rclone.conf\"" plugin=c8868bf7acec078406d4fc0d6d8969bbf4a3e407784a3df167d6df6fff61e465
Apr 24 03:42:34 vultr dockerd[854]: time="2022-04-24T03:42:34Z" level=error msg="2022/04/24 03:42:34 DEBUG : rclone: Version \"v1.58.0\" starting with parameters [\"rclone\" \"serve\" \"docker\" \"-vv\"]" plugin=c8868bf7acec078406d4fc0d6d8969bbf4a3e407784a3df167d6df6fff61e465
Apr 24 03:42:34 vultr dockerd[854]: time="2022-04-24T03:42:34Z" level=error msg="2022/04/24 03:42:34 DEBUG : Using config file from \"/data/config/rclone.conf\"" plugin=c8868bf7acec078406d4fc0d6d8969bbf4a3e407784a3df167d6df6fff61e465
Apr 24 03:42:36 vultr dockerd[854]: time="2022-04-24T03:42:36Z" level=error msg="2022/04/24 03:42:36 INFO  : Serving unix socket: /run/docker/plugins/rclone.sock" plugin=c8868bf7acec078406d4fc0d6d8969bbf4a3e407784a3df167d6df6fff61e465
Apr 24 03:44:24 vultr dockerd[854]: time="2022-04-24T03:44:24.672894924Z" level=warning msg="error while unmounting volume media-repo_configsonarr: Post \"http://%2Frun%2Fdocker%2Fplugins%2Fc8868bf7acec078406d4fc0d6d8969bbf4a3e407784a3df167d6df6fff61e465%2Frclone.sock/VolumeDriver.Unmount\": context deadline exceeded"
Apr 24 03:44:24 vultr dockerd[854]: time="2022-04-24T03:44:24.693061330Z" level=warning msg="error while unmounting volume media-repo_confighydra: get media-repo_confighydra: error while checking if volume \"media-repo_confighydra\" exists in driver \"rclone:latest\": Post \"http://%2Frun%2Fdocker%2Fplugins%2Fc8868bf7acec078406d4fc0d6d8969bbf4a3e407784a3df167d6df6fff61e465%2Frclone.sock/VolumeDriver.Get\": context deadline exceeded"
Apr 24 03:44:24 vultr dockerd[854]: time="2022-04-24T03:44:24.787109519Z" level=error msg="Handler for POST /v1.41/containers/32932ae32294546325654005eaa79584850caa4aaf506bed040df2fc98c1ad1c/start returned error: error while mounting volume '': get media-repo_confignzb: error while checking if volume \"media-repo_confignzb\" exists in driver \"rclone:latest\": Post \"http://%2Frun%2Fdocker%2Fplugins%2Fc8868bf7acec078406d4fc0d6d8969bbf4a3e407784a3df167d6df6fff61e465%2Frclone.sock/VolumeDriver.Get\": context deadline exceeded"
Apr 24 03:44:24 vultr dockerd[854]: time="2022-04-24T03:44:24.915752115Z" level=warning msg="error while unmounting volume media-repo_configjellyfin: get media-repo_configjellyfin: error while checking if volume \"media-repo_configjellyfin\" exists in driver \"rclone:latest\": Post \"http://%2Frun%2Fdocker%2Fplugins%2Fc8868bf7acec078406d4fc0d6d8969bbf4a3e407784a3df167d6df6fff61e465%2Frclone.sock/VolumeDriver.Get\": context deadline exceeded"
Apr 24 03:45:24 vultr dockerd[854]: time="2022-04-24T03:45:24.853375525Z" level=error msg="Handler for POST /v1.41/containers/3db762aac5d372a03d2ebe45ab3e0909e7c34e720707355ad8e01be51329acfc/start returned error: error while mounting volume '': Post \"http://%2Frun%2Fdocker%2Fplugins%2Fc8868bf7acec078406d4fc0d6d8969bbf4a3e407784a3df167d6df6fff61e465%2Frclone.sock/VolumeDriver.Mount\": context deadline exceeded"
Apr 24 03:45:24 vultr dockerd[854]: time="2022-04-24T03:45:24.923545695Z" level=error msg="Handler for POST /v1.41/containers/create returned error: get media-repo_media: error while checking if volume \"media-repo_media\" exists in driver \"rclone:latest\": Post \"http://%2Frun%2Fdocker%2Fplugins%2Fc8868bf7acec078406d4fc0d6d8969bbf4a3e407784a3df167d6df6fff61e465%2Frclone.sock/VolumeDriver.Get\": context deadline exceeded"

Extra Information

Docker Version:

Client: Docker Engine - Community
 Version:           20.10.14
 API version:       1.41
 Go version:        go1.16.15
 Git commit:        a224086
 Built:             Thu Mar 24 01:48:02 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.14
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.15
  Git commit:       87a90dc
  Built:            Thu Mar 24 01:45:53 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.11
  GitCommit:        3df54a852345ae127d1fa3092b95168e4a88e2f8
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Linux version:

Linux vultr 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.4 LTS
Release:	20.04
Codename:	focal

Also tried

Linux vultr 5.13.0-40-generic #45-Ubuntu SMP Tue Mar 29 14:48:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

and

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 21.10
Release:	21.10
Codename:	impish

I've spent quite a few hours trying to run this down and haven't had much luck as of yet. I'm assuming the rclone daemon is dying and docker isn't switching sockets, but I'm yet to find any real evidence or log that suggests as much. If there is additional information I can gather that would be useful, let me know.

This error means that some go code was expecting something to be complete in a certain time and it didn't complete in that time.

I don't think those timeouts are coming from rclone though so are there timeouts in the docker code you can increase?

Maybe this one?

consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

Does it take 60 seconds to give that error?

i'm facing the same issue here, but with swarm instead of compose. this seems to happen randomly without warning, and is very frustrating to fix.

Yep it takes 60 seconds to give the error. I've tried changed timeouts, but no dice.

I'm pretty sure it has something to do with rclone plugin bringing the mount up in a timely fashion, leading to a timeout in docker. That just aligns with what I've observed, but it's difficult for me to pin down. The logs don't allude to anything going awry until the timeout occurs.

I've ended up ditching the docker plugin and have had a lot more success. After reading more about what others are doing I found using the rclone mount directly as I intended was probably not a wise approach anyway. I now have mergefs setup with a local directory acting as the ingestion point, with an rclone mount setup primarily for reads. On a cron schedule I move to the cloud using rclone move. All my reliability issues seem to have disappeared, perhaps I was hitting the IO a bit hard.

Thanks for trying that. I think we'll have to see if @ivandeex had any thoughts as I've run out of ideas!

Looks like a deadlock/race provoked by timings/interactions of:

  • the go-routine serving the unix domain socket on behalf of rclone plugin in its container
  • go-routines implementing the rclone vfs workers
  • fuse kernel module
  • docker daemon passing the new mount record between contexts of the plugin container and mount consumer containers
  • docker-compose spinning up volumes and containers in order, talking to the docker daemon

I do not expect to have this soup of moving parts investigated/resolved in short term. It's a complex task. If you are unlucky to hit the race pattern in your environment I suggest staying with your workarounds for now (sigh).

The rclone team including me have observed similar plugin deadlocks provoked by go stress tests on github ci/cd runners. Captured in the below ticket:

I have finally worked out why the mount tests deadlock - it is to do with accessing the mount from the same process that is serving the mount - this can cause a deadlock.

It appears that this is a known problem in bazil/fuse#264 or the go runtime golang/go#21014

The solution is to run the mount in a subprocess when we are doing our end to end mount tests which seems like it should be possible.

Exactly how that is related to the this problem, I'm not sure, but it is a clue!

1 Like