SSH handshake failure for some but not all files

First, thanks for the great tool and taking the time to read.

I am having an intermittent SFTP issue with rclone. Seemingly random files in a sync process are giving me an error (NewFs: couldn't connect SSH: ssh: handshake failed). I'm syncing a directory on SFTP. The sync command:

rclone -vv sync sftp:/Dir/ ~/dir/

This command will move the majority of the files, but some directories and some files will not transfer. After getting a failure it seems like I get locked out for awhile (couple of minutes) where telnet sftp_IP 22 connections are refused. If I run that command from another machine I can connect. So it seems like there is a temporary IP blacklisting happening.

Even though the original command gave the error: "Failed to create file system for "sftp:/Dir/sub_dir": NewFs: couldn't connect SSH: ssh: handshake failed", I can sync the files that fail by running the command above on the sub-directories, for example:

rclone -vv sync sftp:/Dir/sub_dir ~/dir/sub_dir

This behavior is the same on my local machine and from an EC2 instance. It's also the same when just doing ls instead of sync. I don’t have access to the remote SFTP command line, but I do have access to tech support, so I’m hoping that I can get some advice about what might be happening on their side.

Config:
[sftp_connection]
type = sftp
host = <sftp_IP>
user = USERNAME
pass = SOME HASH
key_file_pass = SOME OTHER HASH

Excerpt from the log:

2019/06/14 21:17:46 DEBUG : File_1.xlsx: Unchanged skipping
2019/06/14 21:17:46 DEBUG : File_2.txt: Size and modification time the same (differ by 0s, within tolerance 1s)
2019/06/14 21:17:46 DEBUG : File_2.txt: Unchanged skipping
2019/06/14 21:17:46 DEBUG :File_3.xlsx: Size and modification time the same (differ by 0s, within tolerance 1s)
2019/06/14 21:17:46 DEBUG : File_3.xlsx: Unchanged skipping
2019/06/14 21:17:47 ERROR : Dir_1: error reading source directory: List failed: dirExists: couldn't connect SSH: ssh: handshake failed: read tcp an_IP:54298->sftp_IP:22: read: connection reset by peer
2019/06/14 21:17:47 ERROR : Dir 2: error reading source directory: List failed: dirExists: couldn't connect SSH: ssh: handshake failed: read tcp an_IP:54300->sftp_IP:22: read: connection reset by peer
2019/06/14 21:17:47 ERROR : Dir_3: error reading source directory: List failed: dirExists: couldn't connect SSH: ssh: handshake failed: read tcp an_IP:54304->sftp_IP:22: read: connection reset by peer
...

I suspect the sftp server is blocking you for overuse. Try reducing --transfers and --checkers to reduce the total number of connections rclone makes. By default rclone will make up to 12 sftp connections (transfers + checkers) which may be too many.

Thank you @ncw , I had not fully understood the significance of the checker and transfer parameters as worker counts for those processes. The --checkers argument worked, I reduced to checkers to 4 (from the default 8) and I was able to ls the full directory with no errors. I had tried reducing connections with the MaxSessions setting in sshd_config to no avail.

Just for anyone reading in the future: in my case the remote SFTP has a connection limit of 4, so while checkers set to 4 works for ls it fails for sync because there are checkers and transfers workers for the sync process. This means I had to add the --transfers argument and have the checkers and transfers add to 4. Depending on the application you can tune the ratio; for example we check often and transfer rarely, so I think I'll end up with checkers=3 and transfers=1 (the default is 8/4).

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.