Thank you both for the quick responses and the suggestions. I can confirm that the suggested workarounds of either adding --sftp-disable-concurrent-writes to the client or --vfs-cache-mode writes to the server seem to work on my test setup.
I still believe this could be a bug as the documentation for rclone serve sftp explicitly states:
Note that the default of "--vfs-cache-mode off" is fine for the rclone sftp backend, but it may not be with other SFTP clients.
So it shouldn't be needed and it would be great if the rclone sftp backend and the rclone sftp serve command would work together out of the box.
It's not a really easy answer as if you stop something useful in the client, it would make it slower by default for other SFTP servers out there as well if it was a default.
Not sure I'd say it's a bug either as you can just set the flags to meet your needs.
In this case, if you are using both rclone for the client and server, you gotta pick a spot to handle those writes. Doesn't matter where.
@ncw Thank you for taking a look at it. I can create an issue on GitHub. In the meantime I discovered an additional problem when using serve sftp with the --stdio option (I have included the details below). Maybe this is related to the underlying code changes? The files are corrupted on transfer and rclone is therefore Removing failed copy. Let me know if I should include both problems in the same GitHub issue or create a separate one.
What is the problem you are having with rclone?
When using serve sftp with the --stdio option the files are corrupted on transfer, rclone is therefore Removing failed copy and retries until it runs out of attempts. Some files are transferred successfully in the end, some are not. When using the serve sftp command directly without --stdio this problem doesn't seem to occur.
What is your rclone version (output from rclone version)
v1.57.0-beta.5823.da8f9be84
Which OS you are using and how many bits (eg Windows 7, 64 bit)
Debian 10 Buster, 64 bit
Which cloud storage system are you using? (eg Google Drive)
Local Filesystem
The command you were trying to run (eg rclone copy /tmp remote:tmp)
Unfortunately GitHub truncated the server logs as they were too long. Let me know if something important is missing.
The corrupted on transfer problem is present in all 4 cases.
Could the problem be related to the number of file transfers that are run in parallel (default: --transfers 4). When running rclone serve sftp as the server only 1 server is responsible for handling all 4 connections. When running the --stdio variant OpenSSH starts up rclone serve sftp --stdio 4 times.
Running 4 different copies of the rclone server is leading to different views of the state of the filing system.
There is a certain amount of smoke and mirrors going on when you use an rclone server (the VFS layer of rclone) when you are writing a file. Rclone knows it is writing a file and will return the status of the file it is writing, rather than the status of the file on the filing system. This is to allow for the file being written asynchronously.
However when running multiple servers the client rclone is choosing indiscriminately which server to send commands to so it might get a server in which that file hasn't been fully written to disk yet and since it wasn't the server doing the writing then it doesn't have the smoke and mirrors to return the expected size of the file and hence the corrupted on transfer: sizes differ XXX vs YYY errors.
We really only want one rclone server running at once otherwise we'll get these problems. One rclone server is quite cabable of running multiple streams...
Maybe we can think of a different way to solve this. What problem are you trying to solve with using --stdio? Of not having a permanently resident rclone server? Or something else?
Thank you for the explanation. That makes sense to me and is indeed very interesting.
My use case is the following: I want to upload files from machine A (behind NAT) to machine B (server with public IP). As a SSH server is running on machine B anyway the initial idea was to add rclone serve sftp --stdio to the authorized_keys file and start the upload on machine A. This way no additional port is needed and the SFTP client is confined to a defined folder. The disadvantage is that sha1sum/md5sum are not available because of the restrict in authorized_keys and with the discussed workarounds of --sftp-disable-concurrent-writes and --transfers 1 this also is rather slow. As an alternative I setup a separate rclone serve sftp on machine B that machine A can reach via a VPN. This still needs --sftp-disable-concurrent-writes as a workaround but with --transfers 9 I can reach the full throughput of the connection, so this would be fine. The solution I'm using now is to run rclone serve sftp on machine A and download the files using a rclone client on machine B via the VPN. This removes the need of a bugfix for me but maybe it’s still relevant to other users.
Should I create the two issues on GitHub ([1] --sftp-disable-concurrent-writes necessary due to underlying changes in SFTP library and [2] --transfers 1 necessary when running rclone serve sftp –stdio)?
Additionally or alternatively I can prepare a pull request to update the documentation with these workarounds. Let me know what you think.