I am horribly confused about the logic of rule-order in --filter-from files, despite having read the relevant rclone documentation. Perhaps someone could help me out.
The documentation says:
Each file is matched in order [what does that mean?], starting from the top [of the rules in the file?], against the rule in the list until it finds a match. The file is then included or excluded according to the rule type.
How does Rclone proceed as it moves from the filter file (that file being the file that contains the filter-from rules)? Does it start with a set of files to be included, where that set contains all files in the folder (together with its subfolder) at issue? If it does, then each rule in the filter file will expand or restrict that set. Except, hold on, it can't expand the set, because the set is already maximally large - unless, of course, one of the rules has performed a restriction . .
You see how confused I am. Add in the fact that rclone follows, but not exactly, rsync's own - seemingly underdocumented - syntax, and the whole thing becomes a horror.
Here's what I am trying to do. For some directory - call it dir - which is specified in the rclone command (as against in the filter file), I wish to:
(1) include all top-level items (files and directories) that start with . except (1a) some particular ones, let's say .bad1 and .`bad2' ;
(2) include, recursively, everything matched by rule 1;
(3) exclude all else.
Thanks. (Note that in the example I gave I wish to exclude those two items with 'bad' in thei name. That's why I choose those names) However, I still do not understand. I think the trick to find a description of the mechanics that is all of pithy, clear and accurate (though it need not be comprehensive).
Here is the code and filter file I am working with. I am running the backup at the moment but I am unsure whether it works. (Unsure, because I have ended up with an item in the backup that should not be there; yet, another rclone job may have run - automatically - and put it there, even though my system is meant to be setup to run only one job at a time.)
Here is some bash that is part of a process of passing arguments to rclone.
Use the same filter with rclone ls that will tell you whether it is working or not without running a backup.
Comments on your filter from file
if you want to exclude a directory and its contents you need dir/**
if you want stuff to only happen at the root, then prefix with /
your rules starting with ./Trash will not be working - it will be looking for files called Trash in a directory called . (which it will never find), you either want Trash/** (to exclude a directory called Trash and its contents everywhere) or /Trash/** (to exclude a directory called Trash and its contents at the root only.
you don't appear to have included any files at the root as your includes don't include them
Thank you. However, I still lack an illuminating, pithy explanation of the matching logic (though what you wrote was some help).
Two specific matters remain unclear to me.
(1) I take it that directories are matched only if the relevant rule ends /** - yes?
(2) '[Y]ou don't appear to have included any files at the root as your includes don't include them.' Please state the correct rule to match everything within the backup's path that is such that its full path starts with a . . (For that is the rule I want and I tire of guessing, even if I can test with the ls command that you have.) Thank you.
It seems that few people understand rsync's logic or indeed rclone's. However, the following from man rsync seems a tolerably clear account.
The filter rules allow for flexible selection of which files to transfer (include) and which
files to skip (exclude). The rules either directly specify include/exclude patterns or they
specify a way to acquire more include/exclude patterns (e.g. to read them from a file).
As the list of files/directories to transfer is built, rsync checks each name to be trans‐
ferred against the list of include/exclude patterns in turn, and the first matching pattern is
acted on: if it is an exclude pattern, then that file is skipped; if it is an include pattern
then that filename is not skipped; if no matching pattern is found, then the filename is not
Rsync builds an ordered list of filter rules as specified on the command-line. Filter rules
have the following syntax:
You have your choice of using either short or long RULE names, as described below. If you use
a short-named rule, the ’,’ separating the RULE from the MODIFIERS is optional. The PATTERN
or FILENAME that follows (when present) must come after either a single space or an underscore
(_). Here are the available rule prefixes:
exclude, - specifies an exclude pattern.
include, + specifies an include pattern.
Here is how I understand that. The default is to include all items (files and directories, the latter including their files, and recursively).The filter rules modify that set: exclude rules reduce it; include rules increase it (when it can be increased, i.e. is not at its maximally large stage). The rules operate upon the set in the order in which they are given. Imagine that the filter file: starts with an exclude rule, call that rule X; proceeds immediately to an include rule, call it Y; and next had, as its last rule, an exclude rule, call it Z. In that case, rsync (and rclone) will modify the initial, maximal set by applying X to it, then by applying Y to it, and then by applying Z to it.
As to the syntax - all the stuff with slashes and astericks - the rcync manual proceeds to describe that (though rclone's syntax is slightly different)?
Thank you. However, you need to appreciate the following.
Rclone's documentation says, in effect: 'rclone works like rsync, only with these differences; or rather that's how include and exclude flags/files work; and here's how the filter rules work.
Yet, consider the following.
Few people understand rsync's logic and indeed it is not that easy to grasp that logic properly (for instance, I do not think the explanations given by people other than me in this thread are good).
How rclone modifies that logic (or at least syntax) is described tersely (e.g. 'Rclone always does a wildcard match so \ must always escape a \ ')
Rclone's two (or more?) stage explanation (that's includes-and-excludes, now here is filtering) adds a further level of complexity.
I do not mean to be abbraisive. Rather, I mean to convice that unless the documentation is improved then - at least for people who do not understand rsync well, already - the result is liable to be utter confusion and much wasted time.
EDIT: and I still don't understand whether every mention - in an rclone filter rule - of a directory (as against a file) needs some slashes and whatnot after it.
However, I have now come to think that actually I do not understand the syntax or even the logic. So (1) Is my account above - the one starting, 'Here is how I understand that' correct, or not? Also, and again: (2) how does the /** symbol work? Thanks.
EDIT: I see myself now that my account was incorrect, because, as you say: 'if you match something too broad at the top, it won't continue down the rule set as it first match stops the chain.' So, am I right in thinking the following? Everything starts off included, but then . . Actually, I am unsure about 'everything starts off included' and I am unsure about the 'then'. Please give a clear, fairly non-technical description of the algorithm. That's all I am asking (and I have have some background in computers and in logic so I don't think the problem is me being dim).
On /**: yes, I had seen that documentation. Does every line ending with a folder need that suffix? More precisely: if I wish to include some folder (be it a specific folder or a 'wildcarded' range of folders), or if I wish to exclude some folder (with the same qualifications), do I need to append /*** to it?
'You have neither affirmed nor denied whether the set of files to be operated upon starts as all files in the destination path.' Sorry, but you still have not done that, so far as I can tell. You write: 'Regardless of a filter, it operates on the files in the destination path that you provide.' The 'it' here is the filter, right? But does that answer my question? I don't see how. Here is why. Suppose I supply an empty filter-from file. If passed such a file, and a destination path. will rclone include everything in the destination path, or exclude it? I take it - from the rsync manual - that the answer is: include it.
Perhaps the fundamental problem in all this is as follows. The rsync manual describes rsync's logic in terms of expanding and constricting an initially maximal set of files. That idea is fairly clear. However, rclone's documentation - and what you've written - presents filtering in a different way, namely, in terms of the stopping after matches. I find that latter idea less clear than the idea in rsync's manual. Also, it is unhelpful to describe rclone's logic in a different manner to the way rsync's manual describes rsync's logic; for, rclone is (in a certain way) based on rsync.
Finally: I've little idea what (^|/)[^/]*\.jpg$ means, though I suppose that rclone's documentation will tell me.
if it matches and it was an include rule, include it and finish
if it matches and it was an exclude rule, exclude it and finish
if you get to the end, include the file
Note that this says nothing about directories. Rclone does not filter on directories only on file paths. Hence to include a directory you need /path/to/dir/** which matches all files under /path/to/dir
There is an issue to make /path/to/dir/ the same as /path/to/dir/** which we'll do eventually but this is merely a shorthand syntax. If you write /path/to/dir rclone will think you are talking about a file.
I wrote the filter documentation, but I've had lots of contributions to make it clearer and I'd welcome yours too
That is a regular expression - the one that is used for the match. The file globs are converted into regular expressions which if you know them will tell you exactly what matches and if you don't they look like someone bashed the keyboard with their head
Let filter-file be the filter file (the name of which is passed to rclone).
Let rules be the set of rules in filter-file.
Let rule be a (any) single rule within rules.
Let destination be the destination path passed to rclone.
Let item be some (any) file or directory. EDIT: and if the item is a directory then it should be expressed in any rule as item/**.
Let backup be the set of files to be backed-up.
Set backup to contain each item in destination.
For each item within destination
For each rule, moving serially through rules
If rule matches the full path of item then
if rule is an include rule then add item to backup (unless it is in there already).
if rule is an exclude rule then remove item from backup (if it was in there).
Proceed to the next item.
My explanation did contain a concept of a backup set.
@ncw did respond affirmatively to the following sentence of mine. 'You have neither affirmed nor denied whether the set of files to be operated upon starts as all files in the destination path' in the affirmative. For, he said (with admittedly some unclarity): 'Yes initially all files are passed to the filter'.
I am sorry to be increasingly direct but I feel entitled to decent answers to my reasonable questions. Also, and frankly: I did not in fact in this instance ask you.