FTP does not play well with --files-from

What is the problem you are having with rclone?

I am trying to copy a list of files from a FTP to the local disk and it does not play well with '--files-from-raw' option.

It triggers a full listing for each file contained in the file used as filter which simply do not scale at all when you have hundreds of thousands of files:

  • MakeListR(filter.go) calls NewObject(ctx, remote) for each object of input file
  • NewObject(ftp.go) calls findItem(ftp.go)
  • findItem(ftp.go) calls List on the parent directory

I'm not sure why we perform a full list here (using MLST). If I send a raw FTP command to the server I can return only the file I want:

Commande :	MLST /path/to/redacted/file.log.gz
Réponse :     	250-Begin
Réponse :     	 type=file;size=58394517;modify=20220317011011;UNIX.mode=0664;UNIX.uid=0;UNIX.gid=0;unique=0g0; /path/to/redacted/file.log.gz
Réponse :     	250 End.

Digging the forum I found a similar issue that was closed without resolution:

Run the command 'rclone version' and share the full output of the command.

rclone v1.59.0-beta.5996.3a89ee1bb
- os/version: darwin 11.5.2 (64 bit)
- os/kernel: 20.6.0 (x86_64)
- os/type: darwin
- os/arch: amd64
- go/version: go1.16.2
- go/linking: dynamic
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Source: FTP
Target: Local

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone --files-from-raw files.txt ftp:/ .

The rclone config contents with secrets removed.

[ftp]
type = ftp
host = **redacted**
user = **redacted**
pass = **redacted**

A log from the command with the -vv flag

There is nothing to see in debug mode, I had to add more logs in rclone to see what was going on.

The command "LIST /path/to/file" does not seems to be working.

Since we need modtime + size, I wonder if the listing (in case MLST is not supported) could be replaced by 2 commands:

  • SIZE
  • MDTM

To retrieve information on the remote file... the annoying part being that it would not be helpful to distinguish files & directories :thinking:

I also noticed that the command MLST is not available in the FTP dependency used :frowning:

Yes, what this needs is for NewObject to be using a more efficient way of finding a file than listing its parent directory. We try to implement this where possible in backends.

The FTP library will use MLSD for listing if it is available - I don't know how that behaves if given a file?

Though MLST looks better since it doesn't need a data connection.

I would be interested in a patch to fix this.

Note that currently we have our own fork of the FTP library so we can add things to it...


You can also turn your --files-from-raw into an --include-from file and rclone will then just list each directory once.

Tried modifying the code to call .List() on the file directly (with MLSD being used) and did not seem to work.

Looking at the corresponding RFC it looks like it is by design:

The MLST and MLSD commands each allow a single optional argument.
This argument may be either a directory name or, for MLST only, a
file name.

Using SIZE/MDTM is not a solution because there is nothing to distinguish file vs directory (as far as I can tell).

Ideally I'd like to keep using --files-from-raw because this is used in a common script for me and the source is not always FTP and I'm not 100% sure of the tiny changes in behavior if I change this.

I've seen that it uses a forked version of the FTP library, I was thinking of maybe adding a getEntryInfo method which would use MLST underneath and probably raise an error if not available.

Something like this: Comparing rclone:dev...cogniteev:feat/get-entry-info · rclone/ftp · GitHub

I'm also not sure how I could do to just forbid the default behavior that is to do a full listing in findItem without relying on a forked version of rclone forever. It seems to be that relying on a list to find a single item is broken by design, unless you deal with really tiny use cases.

I stumbled into this problem because I had a copy task that latest 14h+ without actually having copied anything because it did not finished "loading" the 17k files I gave as an input.

And there is not a single line of debug anywhere around this thus you really don't know what is going on (unless you use "--dump headers" - if you find it - but even then you do not know why it list the FTP in a loop)

Ah, OK.

That code looks nice.

Traditionally that call is called Stat - to find info about a single object and will be analogous to os.Stat in the Go standard library.

We can get this working I'm sure and get it into rclone proper.

Rclone needs to know if Stat is supported so it can fallback to doing the listing.

We have func (c *ServerConn) IsTimePreciseInList() bool which returns the status of mlst support which is OK apart from the name to check if MLST works.

The other alternative is for the Stat call to return an exported sentinel error. so define var ErrNotSupported = ... outside the function and return that, then the user can check the error against NotSupported and do something specific. That would probably be the best idea.

What I suggest you do is try to make a PR with the above changes s/GetEntryInfo/Stat/ and ErrNotSupported to the jlaffaye/ftp repo - the author appears to be active at the moment. We can pull that into our fork also.

That's all part of the fun :slight_smile:

I was able to make GetEntryInfo with something like this:

func (f *Fs) findItem(ctx context.Context, remote string) (entry *ftp.Entry, err error) {
	// defer fs.Trace(remote, "")("o=%v, err=%v", &o, &err)
	fullPath := path.Join(f.root, remote)
	if fullPath == "" || fullPath == "." || fullPath == "/" {
		// if root, assume exists and synthesize an entry
		return &ftp.Entry{
			Name: "",
			Type: ftp.EntryTypeFolder,
			Time: time.Now(),
		}, nil
	}

	c, err := f.getFtpConnection(ctx)
	if err != nil {
		return nil, fmt.Errorf("findItem: %w", err)
	}

	entry, err = c.GetEntryInfo(fullPath)
	f.putFtpConnection(&c, err)

	if err != nil {
		return nil, translateErrorFile(err)
	}

	if entry != nil {
		f.entryToStandard(entry)
	}

	return entry, nil
}

Traditionally that call is called Stat - to find info about a single object and will be analogous to os.Stat in the Go standard library.

Make sense but I did not go this way because STAT is actually a ral FTP command and I thought that It might be confusing, but I can rename it.

What I suggest you do is try to make a PR with the above changes s/GetEntryInfo/Stat/ and ErrNotSupported to the jlaffaye/ftp repo - the author appears to be active at the moment. We can pull that into our fork also.

Done:

And PR updated for rclone also:

Great work!

I just went through all our outstanding PRs to jlaffaye/ftp and discovered that they've all been merged so I will revert rclone back to the upstream library.

If you want to prepare a patch for rclone which uses the new feature, then you can use a replace github.com/jlaffaye/ftp => /path/to/local/checkout line in go.mod to do local testing. This can't be finalised until your patch is merged upstream (or in our fork) but you can get going with it if you want!

Can you open a new issue on Github about this and summarise where you've got to so far (and put a link to this thread) we should really be doing software dev on Github so we can link the references there! Thank you.