Rclone mount (Hetzner storage) md5 errors

What is the problem you are having with rclone?

I have an rclone SFTP mount of a Hetzner storage box that is fine for ~hours until it suddenly isn't. Specifically, the mount remains usable for read/write but there are rampant md5 checksum errors.

Run the command 'rclone version' and share the full output of the command.

rclone v1.59.1
- os/version: ubuntu 22.04 (64 bit)
- os/kernel: 5.15.0-46-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.18.5
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Hetzner Storage Box

The command you were trying to run (eg rclone copy /tmp remote:tmp)

/usr/local/bin/rclone mount \
    --cache-dir=/var/cache/rclone \
    --config=/etc/rclone.conf \
    --allow-other \
    --rc \
    --rc-addr=<addr> \
    --rc-enable-metrics \
    --vfs-cache-mode=writes \
    remote: /mnt/apps

The rclone config contents with secrets removed.

[remote]
type = sftp
host = <user>.your-storagebox.de
user = <user>
port = 23
key_file = /etc/ssh/ssh_host_ed25519_key
shell_type = unix
md5sum_command = md5 -r
sha1sum_command = sha1 -r

(last three lines added automatically by rclone)

A log from the command with the -vv flag

This has happened a few times without -vv, I updated the service to add the flag, waited 18 hours and...

Aug 31 11:20:30 rclone[1850421]: DEBUG : pacer: low level retry 10/10 (error couldn't initialise SFTP: ssh: subsystem request failed)
Aug 31 11:20:30 rclone[1850421]: ERROR : <app1/path1/>: Dir.Stat error: List failed: dirExists: couldn't initialise SFTP: ssh: subsystem request failed
Aug 31 11:20:30 rclone[1850421]: ERROR : IO error: List failed: dirExists: couldn't initialise SFTP: ssh: subsystem request failed
Aug 31 11:20:30 rclone[1850421]: DEBUG : <app1/path1/>: >Lookup: node=<nil>, err=List failed: dirExists: couldn't initialise SFTP: ssh: subsystem request failed
[lots of normal debug output for file access lasting ~20 seconds]
Aug 31 11:20:49 rclone[1850421]: DEBUG : app1/path2/file: vfs cache: starting upload
Aug 31 11:20:49 rclone[1850421]: DEBUG : sftp://user@user.your-storagebox.de:23/: Shell path "/home/app1/path2/file"
Aug 31 11:20:49 rclone[1850421]: DEBUG : sftp://user@user.your-storagebox.de:23/: Running remote command: md5 -r /home/app1/path2/file
Aug 31 11:20:49 rclone[1850421]: ERROR : app1/path2/file: Failed to calculate dst hash: failed to calculate md5 hash: failed to run "md5 -r /home/app1/path2/file": : ssh: command md5 -r /home/app1/path2/file failed
Aug 31 11:20:49 rclone[1850421]: ERROR : app1/path2/file: corrupted on transfer: md5 hash differ "<some hash>" vs ""
Aug 31 11:20:49 rclone[1850421]: INFO  : app1/path2/file: Removing failed copy
Aug 31 11:20:49 rclone[1850421]: ERROR : app1/path2/file: vfs cache: failed to upload try #1, will retry in 10s: vfs cache: failed to transfer file from cache to remote: corrupted on transfer: md5 hash differ "<some hash>" vs ""

At this point, the md5 command seems to work intermittently - I don't see any pattern to which files or directories fail. Eventuallly some of my services start falling over, likely due to inconsistencies between rclone's cache and the remote.

Bouncing the rclone service manually clears things up for a while (no md5 failures), but the problem always returns within a day.

hello and welcome to the forum,

i also have a rclone mount using hetzner storagebox, always use a debug log.

i have never seen that or experience your issue.

fwiw, in the forum, often that turns out to be an issue with networking

Standard networking, this server has successfully served >1 TB of traffic in the past two weeks. Everything looks fine on that front, happy to check any specific suggestions you have!

Server is running fully up-to-date Ubuntu 22.04, in case that helps.

Hetzner support has indicated that I'm exceeding their allowed number of connections, which is 10. I had mistakenly thought that one running instance of rclone = 1 connection. Which setting(s) should I tune downward to reduce the number of connections?

It might be this:

--sftp-concurrency int

Got it from the flags page - Global Flags

64 is a lot more than 10! I've dropped this to 4, hopefully that does the trick! I also see --transfer, does that multiplex or create multiple connections?

SFTP is also not a requirement. Hetzner supports: FTP, FTPS, SAMBA, CIFS, SFTP, SCP, SSH (limited commands), rsync, and WebDAV. Are any of those a better transport protocol for rclone?

might tweak
--tpslimit

and maybe
https://rclone.org/sftp/#sftp-disable-concurrent-reads
https://rclone.org/sftp/#sftp-disable-concurrent-writes

Connections are still hitting the limit with --sftp-concurrency=2. I'm now trying tpslimit=2 as well.

rclone seems to use 1 ssh connection per --transfers and 1 ssh connection per --checkers (and maybe some extra for the checksum commands).

I would therefore also try with --transfers=1.

Mounts don't use --checkers, but they do use (one or more?) sftp connections to refresh the directory cache, so increasing --dir-cache-time may also help. The default is 5m.