Vsftpd vs rclone - Part 2

What is the problem you are having with rclone?

I am sure it's not duplicate of Downloads fail from remote server - Error 426 failure writing network stream

426 Failure writing network stream. also happens while transferring files, usually when the target filesystem supports multi-thread copy (e.g. local)

The error has originally happened while connecting to seedboxes that use misconfigured vsftpd (especially strict_ssl_read_eof=YES)
For your reference, here's the content of /etc/vsftpd.conf of the server described below:

listen=NO
listen_ipv6=YES
anonymous_enable=YES
local_enable=NO

write_enable=NO
chroot_local_user=YES

dirmessage_enable=YES
use_localtime=YES
xferlog_enable=YES
connect_from_port_20=NO
secure_chroot_dir=/var/run/vsftpd/empty
pam_service_name=vsftpd
listen_port=12345
user_sub_token=nobody
nopriv_user=ftp
anon_root=/home/lesmi/tmp/vsftpd_test
rsa_cert_file=/etc/ssl/private/vsftpd.pem
rsa_private_key_file=/etc/ssl/private/vsftpd.pem

ssl_enable=YES
allow_anon_ssl=YES
force_local_data_ssl=YES
force_local_logins_ssl=YES
ssl_tlsv1=YES
ssl_sslv2=NO
ssl_sslv3=NO
require_ssl_reuse=YES
ssl_ciphers=HIGH
strict_ssl_read_eof=YES

Run the command 'rclone version' and share the full output of the command.

$ rclone version
rclone v1.62.0
- os/version: ubuntu 20.04 (64 bit)
- os/kernel: 5.11.0-1027-oracle (aarch64)
- os/type: linux
- os/arch: arm64 (ARMv8 compatible)
- go/version: go1.20.2
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

My vsftpd setup to reliably reproduce the error

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy -P --ftp-host 205.185.113.189 --ftp-port 12345 --ftp-user anonymous --ftp-pass cUNHf_mdiQCKlipXVzZ2qM6o49U :ftp: ~/test1

The rclone config contents with secrets removed.

[example]
type = ftp
host = 205.185.113.189
port = 12345
user = anonymous
pass = cUNHf_mdiQCKlipXVzZ2qM6o49U

I'm glad to provide the credentials of my setup - the server is running only for this report, therefore not production thing
You can use it to test and reproduce this error, all the files consist of random bytes. You are even free to leak the files there

A log from the command with the -vv flag

Great bug report - thank you :slight_smile:

So it looks like that we are getting 426 Failure writing network stream. errors while reading parts of the file.

This looks similar to several reports in the forums.

Does it always fail on that part? Or does it sometimes fail on different parts?

Maybe rclone isn't retrying this error and if it did that would fix it up?

I don't know whether this is an rclone problem or a vsftp problem. But I note that vsftp seems very lightly maintained at the moment.

If it is an rclone problem then the fix will be in the upstream library GitHub - jlaffaye/ftp: FTP client package for Go - You'll see I've made lots of contributions there over the years!

BTW did you try

  --ftp-disable-tls13   Disable TLS 1.3 (workaround for FTP servers with buggy TLS)

Always fails on the same part/offset (but for different file)

Absolutely

Looks like rclone is stalling with this option?

2023-03-14 16:45:40 DEBUG : test1: multi-thread copy: stream 2/4 (1073741824-2147483648) size 1Gi starting
2023-03-14 16:45:40 DEBUG : test12: Starting multi-thread copy with 4 parts of size 512Mi
2023-03-14 16:45:40 DEBUG : test12: multi-thread copy: stream 4/4 (1610612736-2147483648) size 512Mi starting
2023-03-14 16:45:40 DEBUG : test12: multi-thread copy: stream 2/4 (536870912-1073741824) size 512Mi starting
2023-03-14 16:45:40 DEBUG : test12: multi-thread copy: stream 1/4 (0-536870912) size 512Mi starting
2023-03-14 16:45:40 DEBUG : test12: multi-thread copy: stream 3/4 (1073741824-1610612736) size 512Mi starting
2023-03-14 16:45:40 DEBUG : test10: Starting multi-thread copy with 4 parts of size 512Mi
2023-03-14 16:45:40 DEBUG : test10: multi-thread copy: stream 4/4 (1610612736-2147483648) size 512Mi starting
2023-03-14 16:45:40 DEBUG : test10: multi-thread copy: stream 2/4 (536870912-1073741824) size 512Mi starting
2023-03-14 16:45:40 DEBUG : test10: multi-thread copy: stream 3/4 (1073741824-1610612736) size 512Mi starting
2023-03-14 16:45:40 DEBUG : test10: multi-thread copy: stream 1/4 (0-536870912) size 512Mi starting
2023-03-14 16:45:40 DEBUG : test11: Starting multi-thread copy with 4 parts of size 1Gi
2023-03-14 16:45:40 DEBUG : test11: multi-thread copy: stream 4/4 (3221225472-4294967296) size 1Gi starting
2023-03-14 16:45:40 DEBUG : test11: multi-thread copy: stream 3/4 (2147483648-3221225472) size 1Gi starting
2023-03-14 16:45:40 DEBUG : test11: multi-thread copy: stream 2/4 (1073741824-2147483648) size 1Gi starting
2023-03-14 16:45:40 DEBUG : test11: multi-thread copy: stream 1/4 (0-1073741824) size 1Gi starting
2023-03-14 16:45:40 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-14 16:45:40 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-14 16:45:40 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-14 16:45:40 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-14 16:45:40 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-14 16:45:40 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-14 16:45:40 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-14 16:45:40 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-14 16:45:40 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-14 16:45:40 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-14 16:45:40 DEBUG : ftp://205.185.113.189:12345: dial("tcp","205.185.113.189:58759")
2023-03-14 16:45:40 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-14 16:45:40 DEBUG : ftp://205.185.113.189:12345: > dial: conn=*fshttp.timeoutConn, err=<nil>
2023-03-14 16:45:41 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-14 16:46:57 DEBUG : test1: multi-thread copy: stream 4/4 (3221225472-4294967296) size 1Gi finished
2023-03-14 16:47:57 DEBUG : ftp://205.185.113.189:12345: closing 1 unused connections
Transferred:            1 GiB / 60 GiB, 2%, 0 B/s, ETA -
Transferred:            0 / 20, 0%
Elapsed time:      5m35.7s
Transferring:
 *                                         test1: 25% /4Gi, 0/s, 1144900h19m39s
 *                                        test10:  0% /2Gi, 0/s, -
 *                                        test11:  0% /4Gi, 0/s, -
 *                                        test12:  0% /2Gi, 0/s, -

Is there any randomness? Does it fail on different files each time?

Retrying is probably the easiest fix, and I'm not quite sure why rclone isn't retrying - I think it should be.

Strange... I would have thought that option would be harmless unless SSL has been configured to ignore non TLS13 requests.

Maybe one of these options forces it

4GiB files on the server always fail with the error, 2GiB ones rarely fail

IMO yes. I'm not sure why it doesn't retry too

I think any or some of these involve about whether it accepts TLS or not

allow_anon_ssl=YES
force_local_data_ssl=YES
force_local_logins_ssl=YES

Here is what is happening - these two logs tell the whole story

2023/03/14 15:44:37 DEBUG : test1: multi-thread copy: stream 1/4 (0-1073741824) size 1Gi finished
2023/03/14 15:44:37 DEBUG : test1: multi-thread copy: stream 1/4 failed: 1 error occurred:

The first log is produced here indicating that we read all the data successfully and are returning nil from the function

The second log is produced on a defer in the same function

Which is a puzzle, until we notice

So what is happening is that all the data is being read correctly, however there is an error on Close() of the stream which rclone is returning.

Rclone doesn't retry an error on Close() and so fails the stream.

Digging into it further, this error on close comes from this function in the FTP backend

And it should already be ignoring error 426

(Error 426 is ftp.StatusTransfertAborted which seems to be spelled wrong in the library!)

So why isn't that error being ignored? What is the type of the error here? Maybe it is wrapped or maybe it isn't a 426 error?

Can you investigate further?

Thanks for digging in, I'll do it

So I made modifications like this:

diff --git a/backend/ftp/ftp.go b/backend/ftp/ftp.go
index 8dd520efb..9c9ee712a 100644
--- a/backend/ftp/ftp.go
+++ b/backend/ftp/ftp.go
@@ -319,10 +319,13 @@ func (dl *debugLog) Write(p []byte) (n int, err error) {
 func isRetriableFtpError(err error) bool {
 	switch errX := err.(type) {
 	case *textproto.Error:
+		fs.Errorf(err, "ftp error code (isRetriableFtpError): %d", errX.Code)
 		switch errX.Code {
 		case ftp.StatusNotAvailable, ftp.StatusTransfertAborted:
 			return true
 		}
+	default:
+		fs.Errorf(err, "ftp error returned (isRetriableFtpError): %v", err)
 	}
 	return false
 }
@@ -479,6 +482,9 @@ func (f *Fs) putFtpConnection(pc **ftp.ServerConn, err error) {
 				_ = c.Quit()
 				return
 			}
+			fs.Errorf(f, "ftp error code (putFtpConnection): %d", tpErr.Code)
+		} else {
+			fs.Errorf(f, "ftp error returned (putFtpConnection): %v", err)
 		}
 	}
 	f.poolMu.Lock()
@@ -623,10 +629,13 @@ func (f *Fs) Shutdown(ctx context.Context) error {
 func translateErrorFile(err error) error {
 	switch errX := err.(type) {
 	case *textproto.Error:
+		fs.Errorf(err, "ftp error code (translateErrorFile): %d", errX.Code)
 		switch errX.Code {
 		case ftp.StatusFileUnavailable, ftp.StatusFileActionIgnored:
 			err = fs.ErrorObjectNotFound
 		}
+	default:
+		fs.Errorf(err, "ftp error returned (translateErrorFile): %v", err)
 	}
 	return err
 }
@@ -635,10 +644,13 @@ func translateErrorFile(err error) error {
 func translateErrorDir(err error) error {
 	switch errX := err.(type) {
 	case *textproto.Error:
+		fs.Errorf(err, "ftp error code (translateErrorDir): %d", errX.Code)
 		switch errX.Code {
 		case ftp.StatusFileUnavailable, ftp.StatusFileActionIgnored:
 			err = fs.ErrorDirNotFound
 		}
+	default:
+		fs.Errorf(err, "ftp error returned (translateErrorDir): %v", err)
 	}
 	return err
 }
@@ -927,12 +939,15 @@ func (f *Fs) mkdir(ctx context.Context, abspath string) error {
 	f.putFtpConnection(&c, err)
 	switch errX := err.(type) {
 	case *textproto.Error:
+		fs.Errorf(f, "ftp error code (mkdir): %d", errX.Code)
 		switch errX.Code {
 		case ftp.StatusFileUnavailable: // dir already exists: see issue #2181
 			err = nil
 		case 521: // dir already exists: error number according to RFC 959: issue #2363
 			err = nil
 		}
+	default:
+		fs.Errorf(f, "ftp error returned (mkdir): %v", err)
 	}
 	return err
 }
@@ -1169,10 +1184,13 @@ func (f *ftpReadCloser) Close() error {
 	// See: https://github.com/rclone/rclone/issues/3445#issuecomment-521654257
 	switch errX := err.(type) {
 	case *textproto.Error:
+		fs.Errorf(f.f, "ftp error code (Close): %d", errX.Code)
 		switch errX.Code {
 		case ftp.StatusTransfertAborted, ftp.StatusFileUnavailable, ftp.StatusAboutToSend:
 			err = nil
 		}
+	default:
+		fs.Errorf(f.f, "ftp error returned (Close): %v", err)
 	}
 	return err
 }
@@ -1249,10 +1267,13 @@ func (o *Object) Update(ctx context.Context, in io.Reader, src fs.ObjectInfo, op
 	if err != nil {
 		switch errX := err.(type) {
 		case *textproto.Error:
+			fs.Errorf(o, "ftp error code (Update): %d", errX.Code)
 			switch errX.Code {
 			case ftp.StatusRequestedFileActionOK:
 				err = nil
 			}
+		default:
+			fs.Errorf(o, "ftp error returned (Update): %v", err)
 		}
 	}
 	if err != nil {

I got this log:

2023-03-15 12:04:24 ERROR : ftp://205.185.113.189:12345: ftp error returned (Close): 1 error occurred:                                                 
        * 426 Failure writing network stream.                                                                                                          
2023-03-15 12:04:24 DEBUG : test13: multi-thread copy: stream 1/4 failed: 1 error occurred:                                                            
        * 426 Failure writing network stream.                                                                                                          
2023-03-15 12:04:24 ERROR : ftp://205.185.113.189:12345: ftp error returned (Close): 1 error occurred:                                                 
        * 426 Failure writing network stream.                                                                                                          
2023-03-15 12:04:24 DEBUG : test13: multi-thread copy: stream 3/4 failed: context canceled                                                             
2023-03-15 12:04:24 ERROR : ftp://205.185.113.189:12345: ftp error returned (Close): 1 error occurred:                                                 
        * 426 Failure writing network stream.                                                                                                          
2023-03-15 12:04:24 DEBUG : test13: multi-thread copy: stream 2/4 failed: context canceled  
2023-03-15 12:04:24 ERROR : test13: Failed to copy: 1 error occurred:                                                                                  
        * 426 Failure writing network stream.                                                                                                          

As it explains it went the ftp error returned (Close) clause, which isn't the case for textproto.Error
So it requires something another I guess? I'll have to uncover this error

Ended up with this patch:

diff --git a/backend/ftp/ftp.go b/backend/ftp/ftp.go
index 8dd520efb..3ab731163 100644
--- a/backend/ftp/ftp.go
+++ b/backend/ftp/ftp.go
@@ -10,11 +10,13 @@ import (
 	"net"
 	"net/textproto"
 	"path"
+	"reflect"
 	"runtime"
 	"strings"
 	"sync"
 	"time"
 
+	"github.com/hashicorp/go-multierror"
 	"github.com/jlaffaye/ftp"
 	"github.com/rclone/rclone/fs"
 	"github.com/rclone/rclone/fs/accounting"
@@ -1169,10 +1171,30 @@ func (f *ftpReadCloser) Close() error {
 	// See: https://github.com/rclone/rclone/issues/3445#issuecomment-521654257
 	switch errX := err.(type) {
 	case *textproto.Error:
+		fs.Errorf(f.f, "ftp error code (Close): %d", errX.Code)
 		switch errX.Code {
 		case ftp.StatusTransfertAborted, ftp.StatusFileUnavailable, ftp.StatusAboutToSend:
 			err = nil
 		}
+	case *multierror.Error:
+		for idx, errY := range errX.Errors {
+			switch errZ := errY.(type) {
+			case *textproto.Error:
+				fs.Errorf(f.f, "ftp multi error code (Close): %d", errZ.Code)
+				switch errZ.Code {
+				case ftp.StatusTransfertAborted, ftp.StatusFileUnavailable, ftp.StatusAboutToSend:
+					err = nil
+				}
+			default:
+				if err != nil {
+					fs.Errorf(f.f, "ftp multi error returned (Close): %d %v %v", idx, reflect.TypeOf(errY), errY)
+				}
+			}
+		}
+	default:
+		if err != nil {
+			fs.Errorf(f.f, "ftp error returned (Close): %v %v", reflect.TypeOf(err), err)
+		}
 	}
 	return err
 }

Now I see:

2023-03-15 12:18:19 DEBUG : test20: multi-thread copy: stream 3/4 (1073741824-1610612736) size 512Mi finished
2023-03-15 12:18:20 ERROR : ftp://205.185.113.189:12345: ftp multi error code (Close): 426
2023-03-15 12:18:20 DEBUG : test20: multi-thread copy: stream 4/4 (1610612736-2147483648) size 512Mi finished
2023-03-15 12:18:21 DEBUG : test20: multi-thread copy: stream 2/4 (536870912-1073741824) size 512Mi finished
2023-03-15 12:18:21 ERROR : ftp://205.185.113.189:12345: ftp multi error code (Close): 426
2023-03-15 12:18:22 DEBUG : test20: multi-thread copy: stream 1/4 (0-536870912) size 512Mi finished
2023-03-15 12:18:22 ERROR : ftp://205.185.113.189:12345: ftp multi error code (Close): 426
2023-03-15 12:18:22 DEBUG : test20: Finished multi-thread copy with 4 parts of size 512Mi
2023-03-15 12:18:22 INFO  : test20: Multi-thread Copied (new)
2023-03-15 12:18:22 DEBUG : test6: Starting multi-thread copy with 4 parts of size 512Mi
2023-03-15 12:18:22 DEBUG : test6: multi-thread copy: stream 4/4 (1610612736-2147483648) size 512Mi starting
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: dial("tcp","205.185.113.189:35363")
2023-03-15 12:18:22 DEBUG : test6: multi-thread copy: stream 1/4 (0-536870912) size 512Mi starting
2023-03-15 12:18:22 DEBUG : test6: multi-thread copy: stream 2/4 (536870912-1073741824) size 512Mi starting
2023-03-15 12:18:22 DEBUG : test6: multi-thread copy: stream 3/4 (1073741824-1610612736) size 512Mi starting
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: > dial: conn=*fshttp.timeoutConn, err=<nil>
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: Connecting to FTP server
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: dial("tcp","205.185.113.189:12345")
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: > dial: conn=*fshttp.timeoutConn, err=<nil>
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: dial("tcp","205.185.113.189:12345")
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: > dial: conn=*fshttp.timeoutConn, err=<nil>
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: dial("tcp","205.185.113.189:12345")
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: > dial: conn=*fshttp.timeoutConn, err=<nil>
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: dial("tcp","205.185.113.189:60756")
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: > dial: conn=*fshttp.timeoutConn, err=<nil>
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: dial("tcp","205.185.113.189:20585")
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: dial("tcp","205.185.113.189:6489")
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: > dial: conn=*fshttp.timeoutConn, err=<nil>
2023-03-15 12:18:22 DEBUG : ftp://205.185.113.189:12345: > dial: conn=*fshttp.timeoutConn, err=<nil>
2023-03-15 12:18:26 DEBUG : test4: multi-thread copy: stream 3/4 (1073741824-1610612736) size 512Mi finished
2023-03-15 12:18:26 ERROR : ftp://205.185.113.189:12345: ftp multi error code (Close): 426
2023-03-15 12:18:27 DEBUG : test4: multi-thread copy: stream 4/4 (1610612736-2147483648) size 512Mi finished

Did it! Integer 416!
Also I'm seeing previously failing files are successfully transferred now

The major cause was the error being wrapped in multierror.Error, giving a weird message (i.e. containing a break line)
Looks like the fight is going to end, but how should it be fixed properly then? Duplicating the same code like the patch is a bad idea

Well done!

You should be able to look in the multierror by using errors.As I think. Something like this (untested)

@@ -1167,8 +1167,8 @@ func (f *ftpReadCloser) Close() error {
 	// mask the error if it was caused by a premature close
 	// NB StatusAboutToSend is to work around a bug in pureftpd
 	// See: https://github.com/rclone/rclone/issues/3445#issuecomment-521654257
-	switch errX := err.(type) {
-	case *textproto.Error:
+	var errX *textproto.Error
+	if errors.As(err, &errX) {
 		switch errX.Code {
 		case ftp.StatusTransfertAborted, ftp.StatusFileUnavailable, ftp.StatusAboutToSend:
 			err = nil

PS Looked through the ftp backend code and there is another place we are using errors.As and lots of places we aren't. We should probably use it everywhere. It would be easy enough to factor into a function

// Return a *textproto.Error if err contains one or nil otherwise
func textprotoError(err error) (errX *textproto.Error) {
    if errors.As(err, &errX) {
        return errX
    }
    return nil
}

Then use it like

if errX := textprotoErr(err); errX != nil {
// do stuff with errX
}

That looks a promising fix, nice

Do you want to make a PR for this?

Yes, it shouldn't take too long to fix as there are only 8 occurrences in the code

Thank you.

Also did you see [FTP] torrent file copy fails · Issue #6793 · rclone/rclone · GitHub which looks very similar. The op reports that 1.62 no longer connects also which is odd.

This definitively fixes the 426 error on that issue, but I don't think it fixes the TLS error thing

edit: let's go ftp: properly extract `textproto.Error` from the errors returned by `github.com/jlaffaye/ftp` by Lesmiscore · Pull Request #6851 · rclone/rclone · GitHub

1 Like

Thank you - I've merged that now :slight_smile:

1 Like

This fix is released in v1.62.2 - thank you @Lesmiscore

It along with --ftp-disable-tls13 fixed #6793 which is great!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.