Serve sftp: upload fails ("Illegal seek", SSH_FX_FAILURE)

What is the problem you are having with rclone?

Uploading a file using the rclone SFTP backend to the rclone serve sftp server fails with "Illegal seek" (SSH_FX_FAILURE).

What is your rclone version (output from rclone version)

v1.56.2 or v1.57.0-beta.5823.da8f9be84

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Debian 10 Buster, 64 bit

Which cloud storage system are you using? (eg Google Drive)

Local Filesystem

The command you were trying to run (eg rclone copy /tmp remote:tmp)

Server & Client

wget https://downloads.rclone.org/v1.56.2/rclone-v1.56.2-linux-amd64.zip
unzip rclone-v1.56.2-linux-amd64.zip
cd rclone-v1.56.2-linux-amd64/

Server only

mkdir server-folder
./rclone serve sftp server-folder --no-auth --log-file rclone-server.log -vvv

Client only

mkdir client-folder
fallocate -l 100MB client-folder/100MB.test
./rclone copy --sftp-host 127.0.0.1 --sftp-port 2022 --sftp-ask-password client-folder :sftp: --log-file rclone-client.log -vvv

For the beta version I updated with

./rclone selfupdate --beta

and then repeated the commands above.

The rclone config contents with secrets removed.

No config

A log from the command with the -vvv flag

hello and welcome to the forum,

might be helpful to see debug log of the rclone serve sftp

not sure the exact cause, but i would add these flags and see what happens.

The program/application writing is trying to seek the file while writing so it needs:

2021/11/01 13:49:44 ERROR : 100MB.test: WriteFileHandle.Write: can't seek in file without --vfs-cache-mode >= writes

cache mode writes on to work.

hi,
2021/11/01 13:49:44 ERROR : 100MB.test: WriteFileHandle.Write: can't seek in file without --vfs-cache-mode >= writes

i looked at both logs files, did not found that?
both log files are for the rclone copy, not rclone serve sftp

It's in the server log in pastebin:

hmm, the OP posted two rclone logs flles both for the rclone copy from gist.githubusercontent.com
i cannot find the pastebin link?

From the link in the OP, gist not pastebin. I've shared the screen and a line from the log. It's there so not sure what to say.

sorry, my mistake, did not realize multiple logs were posted per link.

Thank you both for the quick responses and the suggestions. I can confirm that the suggested workarounds of either adding --sftp-disable-concurrent-writes to the client or --vfs-cache-mode writes to the server seem to work on my test setup.

I still believe this could be a bug as the documentation for rclone serve sftp explicitly states:

Note that the default of "--vfs-cache-mode off" is fine for the rclone sftp backend, but it may not be with other SFTP clients.

So it shouldn't be needed and it would be great if the rclone sftp backend and the rclone sftp serve command would work together out of the box.

1 Like

It's not a really easy answer as if you stop something useful in the client, it would make it slower by default for other SFTP servers out there as well if it was a default.

Not sure I'd say it's a bug either as you can just set the flags to meet your needs.

In this case, if you are using both rclone for the client and server, you gotta pick a spot to handle those writes. Doesn't matter where.

This is unfortunate, and I'm sure it used to work. In fact if I test it with v1.55.1 it does work but not with v1.56.2

However the upstream sftp library re-wrote the file upload code and I suspect that is what broke it.

Can you open a new issue on Github about this and I'll see if we can make the two interoperate properly again!

Thanks

@ncw Thank you for taking a look at it. I can create an issue on GitHub. In the meantime I discovered an additional problem when using serve sftp with the --stdio option (I have included the details below). Maybe this is related to the underlying code changes? The files are corrupted on transfer and rclone is therefore Removing failed copy. Let me know if I should include both problems in the same GitHub issue or create a separate one.

What is the problem you are having with rclone?

When using serve sftp with the --stdio option the files are corrupted on transfer, rclone is therefore Removing failed copy and retries until it runs out of attempts. Some files are transferred successfully in the end, some are not. When using the serve sftp command directly without --stdio this problem doesn't seem to occur.

What is your rclone version (output from rclone version)

v1.57.0-beta.5823.da8f9be84

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Debian 10 Buster, 64 bit

Which cloud storage system are you using? (eg Google Drive)

Local Filesystem

The command you were trying to run (eg rclone copy /tmp remote:tmp)

Server & Client

cd /home/debian/rclone
ssh-keygen -t ed25519 -f test-ed25519 -N "" -C test-ed25519

Server

mkdir /home/debian/rclone/server-folder
cat test-ed25519.pub
vi /home/debian/.ssh/authorized_keys

restrict,command="/home/debian/rclone/rclone serve sftp --log-file /home/debian/rclone/rclone-server-stdio.log --stdio /home/debian/rclone/server-folder -vvv" ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICmP10QPLoM7zou9G8zVW+GHzUYa7X9b9JbRB4Kxg7w5 test-ed25519

Client

mkdir /home/debian/rclone/client-folder
vi /home/debian/.config/rclone/rclone.conf

[sftp-test]
type = sftp
host = 127.0.0.1
user = debian
disable_concurrent_writes = true
key_file = /home/debian/rclone/test-ed25519
md5sum_command = none
sha1sum_command = none

fallocate -l 100MB /home/debian/rclone/client-folder/100MB-1.test
fallocate -l 100MB /home/debian/rclone/client-folder/100MB-2.test
fallocate -l 100MB /home/debian/rclone/client-folder/100MB-3.test
fallocate -l 100MB /home/debian/rclone/client-folder/100MB-4.test
fallocate -l 100MB /home/debian/rclone/client-folder/100MB-5.test
fallocate -l 100MB /home/debian/rclone/client-folder/100MB-6.test
fallocate -l 100MB /home/debian/rclone/client-folder/100MB-7.test
fallocate -l 100MB /home/debian/rclone/client-folder/100MB-8.test
fallocate -l 100MB /home/debian/rclone/client-folder/100MB-9.test

./rclone copy /home/debian/rclone/client-folder sftp-test: --log-file /home/debian/rclone/rclone-client.log -vvv

A log from the command with the -vvv flag

Can you try running the server with --vfs-cache-mode writes. If it works then it is probably the same problem in a different disguise.

Can you try also using rclone v1.55.x as the client (leave the server version alone) with --stdio to see if that works too?

I have attached the logs below:

Unfortunately GitHub truncated the server logs as they were too long. Let me know if something important is missing.

The corrupted on transfer problem is present in all 4 cases.

Could the problem be related to the number of file transfers that are run in parallel (default: --transfers 4). When running rclone serve sftp as the server only 1 server is responsible for handling all 4 connections. When running the --stdio variant OpenSSH starts up rclone serve sftp --stdio 4 times.

With --transfers 1 set on the client the corrupted on transfer problem doesn't occur:

OK that is super interesting...

Running 4 different copies of the rclone server is leading to different views of the state of the filing system.

There is a certain amount of smoke and mirrors going on when you use an rclone server (the VFS layer of rclone) when you are writing a file. Rclone knows it is writing a file and will return the status of the file it is writing, rather than the status of the file on the filing system. This is to allow for the file being written asynchronously.

However when running multiple servers the client rclone is choosing indiscriminately which server to send commands to so it might get a server in which that file hasn't been fully written to disk yet and since it wasn't the server doing the writing then it doesn't have the smoke and mirrors to return the expected size of the file and hence the corrupted on transfer: sizes differ XXX vs YYY errors.

We really only want one rclone server running at once otherwise we'll get these problems. One rclone server is quite cabable of running multiple streams...

Maybe we can think of a different way to solve this. What problem are you trying to solve with using --stdio? Of not having a permanently resident rclone server? Or something else?

Thank you for the explanation. That makes sense to me and is indeed very interesting.

My use case is the following: I want to upload files from machine A (behind NAT) to machine B (server with public IP). As a SSH server is running on machine B anyway the initial idea was to add rclone serve sftp --stdio to the authorized_keys file and start the upload on machine A. This way no additional port is needed and the SFTP client is confined to a defined folder. The disadvantage is that sha1sum/md5sum are not available because of the restrict in authorized_keys and with the discussed workarounds of --sftp-disable-concurrent-writes and --transfers 1 this also is rather slow. As an alternative I setup a separate rclone serve sftp on machine B that machine A can reach via a VPN. This still needs --sftp-disable-concurrent-writes as a workaround but with --transfers 9 I can reach the full throughput of the connection, so this would be fine. The solution I'm using now is to run rclone serve sftp on machine A and download the files using a rclone client on machine B via the VPN. This removes the need of a bugfix for me but maybe it’s still relevant to other users.

Should I create the two issues on GitHub ([1] --sftp-disable-concurrent-writes necessary due to underlying changes in SFTP library and [2] --transfers 1 necessary when running rclone serve sftp –stdio)?
Additionally or alternatively I can prepare a pull request to update the documentation with these workarounds. Let me know what you think.

Thank you for your explanation.

Yes, we definitely need that one and that should be fixable I hope.

I can't think of an easy way to fix that, so I think we'll have to put that in the documentation for rclone serve sftp --stdio for the moment.

Really we want a no caching at all mode for the VFS, but that will play very badly with remote storage...

For future reference: I have now created the issue for [1] and the pull request for [2].

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.