How can I stop rclone from parsing/listing excluded/filtered directories on a sync command?

Why does rclone parse/list content of directories that are excluded (filter) on a sync command? In my mind it should just ignore the directory, both on the source and destination.

I.e I have a --filter-from that includes - /some/dir/path/something.sparsebundle/**

Why does rclone use resources/time to list what's inside the excluded directory, and then match it?

2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75cf: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75ce: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75cd: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75cc: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75cb: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75ca: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75c9: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75c8: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75c7: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75c6: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75c5: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75c4: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75c3: Excluded (Path Filter)

Why not just match the directory

2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle: Excluded (Path Filter)

and move on to the next (instead of listing/comparing the thousands of files that are there)?

My use case is a large dataset with many files of different sizes, but with a more or less fixed folder structure. It's all ready synced 1000 of times, but has now become big enough that I need to take measures to cut down the time (and some dropbox errors error reading source directory: <!DOCTYPE html>).

One way (or at least I thought) would be to just use regex to ignore a "path type" to limit the amount of files rclone chews through.

Did not really think they are relevant to this question. Not even the version (with in reason) should be important here – this is more of a "design" / feature question than anything.

If you think any specific information will help to expand on my problem/question, I'm happy to contribute.

Rclone should be doing this for the local backend at least.

If you provide more info such as full command line, redacted config, rclone version , contents of filter from file, we can have a go at working out why it isn't.

Sure thing, if you think it helps. From what you said, this is from remote to remote, specifically dropbox to google drive (non-encrypted to crypt).

Modified some of it, but "function wise" it should be correct.

Version:

rclone v1.67.0
- os/version: debian 12.6 (64 bit)
- os/kernel: 6.1.0-21-amd64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.22.4
- go/linking: static
- go/tags: none

Command:

rclone sync dropbox:/ gdrive-crypt-sa:/last_snapshot --backup-dir=gdrive-crypt-sa:/archive/2024/2024-06-28_19:18:39 --log-file=logfile.log  --config /path/to/rclone.conf --filter-from filter_rules --fast-list --transfers 12 --checkers 12 --dropbox-chunk-size 128M --multi-thread-chunk-size 4G --dropbox-batch-mode sync --tpslimit 8 --tpslimit-burst 8 --drive-chunk-size 128M --create-empty-src-dirs --dump filters --log-level DEBUG

Filters

--- start filters ---
--- File filter rules ---
- (^|/)\.AppleDouble/.*$
- (^|/)\.Spotlight-[^/]*/.*$
- (^|/)\.Spotlight-V100/.*$
- (^|/)\.TemporaryItems/.*$
- (^|/)\.Trashes/.*$
- (^|/)\.Trash/.*$
- (^|/)\.\$EXTEND/.*$
- (^|/)[^/]*\.idlk$
- (^|/)\.DS_Store$
- (^|/)\._\.DS_Store$
- (^|/)\.metadata$
- (^|/)\.localized$
- (^|/)\.com\.apple\.timemachine\.supported$
- (^|/)\._[^/]*$
- (^|/)'~\$[^/]*'$
- (^|/)Thumbs\.db$
- (^|/)\.dbfseventsd$
- (^|/)\.fseventsd$
- (^|/)\.\$QUOTA$
- (^|/)~\$[^/]*$
- ^some/dir/path/something.sparsebundle/.*$
- ^some/other/dir/[^/]\+/\(2009\|201[0-9]\|202[0-1]\)/\.[^/]*$
+ ^Directory/.*$
+ ^Directory2/.*$
- ^.*$
--- Directory filter rules ---
- (^|/)\.AppleDouble/.*$
- (^|/)\.Spotlight-[^/]*/.*$
- (^|/)\.Spotlight-V100/.*$
- (^|/)\.TemporaryItems/.*$
- (^|/)\.Trashes/.*$
- (^|/)\.Trash/.*$
- (^|/)\.\$EXTEND/.*$
- ^some/dir/path/something.sparsebundle/.*$
+ ^Directory/.*$
+ ^Directory2/.*$
- ^.*$
--- end filters ---

Log excerpt:

2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75cf: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75ce: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75cd: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75cc: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75cb: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75ca: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75c9: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75c8: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75c7: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75c6: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75c5: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75c4: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75c3: Excluded (Path Filter)

Thanks.

This is caused by --fast-list that does a recursive listing from the root and filters afterwards.

Try without --fast-list

1 Like

Thanks. Will do.

What are the reasoning behind --fast-list not respecting filters?

Fast list uses a different API which recursively lists everything under a directory. There is no option to that API to tell it to ignore directories.

This section in the docs should probably note that if you are using filters then --fast-list will recurse all directories and filter afterwards which may be less efficient.

Fancy sending a docs update?

Yes, that I get, but shouldn't the filters be respected when rclone is parsing the list from the --fast-list API call? i.e ignoring all files/dirs inside an ignored directory?

What makes you think the filters are being ignored?

The logs say the files were excluded which looks correct to me.

No, unless I'm missing something.

filter:
- ^some/dir/path/something.sparsebundle/.*$

and then:

2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75cf: Excluded (Path Filter)
2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle/bands/75ce: Excluded (Path Filter)
(…)

My expected result would be to not "parse" through it at all? Behave the same way as without --fast-list ie.

2024/06/28 19:20:38 DEBUG : /some/dir/path/something.sparsebundle: Excluded

I think this is working as designed. The files were excluded like the filter says.

- ^some/dir/path/something.sparsebundle/.*$

In rclone speak that filter says exclude any files which start with some/dir/path/something.sparsebundle/. Rclone filters only refer to files, not directories - the fact that they cause directories to be skipped when not using --fast-list is an optimization.

Using --fast-list causes the skipping directory optimization not to be available hence the listing of each file as Excluded.

I hope that makes sense!