Directory recursion optimisation affects non-root filters

What is the problem you are having with rclone?

It appears to me that the directory recursion optimisation is overly eager in some cases.

I encountered this in a more complex case, but a minimal example is the following:
Given I have an S3 bucket (let's name it dh-test-bucket) with one single object:

test/aaa/testfile.csv

and then I run (after signing in with aws sso login)

rclone copy --dump headers --s3-provider=AWS --s3-region=eu-west-1 --s3-env-auth --checksum --include "aaa/testfile.csv" :s3:dh-test-bucket rctest/

I do not match/copy any file.
The filtering docs say:

If the filter pattern starts with a / then it only matches at the top level of the directory tree, relative to the root of the remote (not necessarily the root of the drive). If it does not start with / then it is matched starting at the end of the path/file name but it only matches a complete path element - it must match from a / separator or the beginning of the path/file.)

I understand this to imply that --include aaa/testfile.csv should match test/aaa/testfile.csv.

However, it appears that a directory filter rule is being created (see logs below) that excludes all directories except the ones matching aaa.

So now, since we only query/list the root of the bucket, we only have a "directory" test to match against, which is excluded.

If I use the --fast-list flag, the filtering works as expected (since the whole tree is listed initially).

It appears to me that the directory recursion optimisation is a bit too eager here?
My not-entirely-thought-through take would be that any pattern that doesn't start with a /, and thus should match at the end of the path, cannot be used to limit the directories to traverse.

The docs state that the optimisation does not apply when using --fast-list, but it also states:

Directory recursion optimisation may affect performance, but normally not the result.

Also, in the docs about --fast-list (can only post two links... rclone.org/docs/#fast-list), it says:

[...] or you can allow rclone to use ListR where it would normally choose not to do so due to higher memory usage, using the --fast-list option. Rclone should always produce identical results either way.

I see basically two options:

  1. This is a bug in how the directory filters get created for these "non-root-includes"
  2. This works as intended. Then, however, the docs should be updated (at least some of the sections that I quoted from)

Run the command 'rclone version' and share the full output of the command.

rclone v1.68.1
- os/version: ubuntu 24.04 (64 bit)
- os/kernel: 5.15.153.1-microsoft-standard-WSL2 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.23.1
- go/linking: dynamic
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

AWS S3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

$ aws sso login  # to get access
$ rclone copy --dump headers --s3-provider=AWS --s3-region=eu-west-1 --s3-env-auth --checksum --include "aaa/testfile.csv" :s3:dh-test-bucket rctest/
$ rclone copy --fast-list --dump headers --s3-provider=AWS --s3-region=eu-west-1 --s3-env-auth --checksum --include "aaa/testfile.csv" :s3:dh-test-bucket rctest/

The rclone config contents with secrets removed.

no config, all configured as arguments

A log from the command with the -vv flag

The relevant parts are (used --dump filters)

2024/10/15 18:21:33 NOTICE: Automatically setting -vv as --dump is enabled
--- start filters ---
--- File filter rules ---
+ (^|/)aaa/testfile\.csv$
- ^.*$
--- Directory filter rules ---
+ (^|/)aaa/$
- ^.*$
--- end filters ---
[...]
2024/10/15 18:21:34 DEBUG : test: Excluded
1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.