Filter based on path/file character length?

Hi,

I'm syncing two remotes, where one has a 255 character path/file length limit, and would like to just ignore the files/paths that are too long.

Is it possible to make filter rule that does that?

I think I could do something like ^\w{5,10} (just an example pulled of internet matching words between 5 and 10 characters), but I'm not sure how to go about making a rule that works regardless if it's a filename, directory name or both.

Here is an example of a path/file (I'm syncing encrypted):

2020/11/04 07:55:51 ERROR : 
ag2j4insftm8urj2973jke1ku8/ue7lj1bpra3g9lf1vhf1o1n4ck/nma8d04np79kp72jpm2k80jgr172jkj9ptfp6r1il3u3tiiu89o5p29t7rm8kq9d1t73gu38k5nnmdpfcf93jau1dir675p7otitoqg/oitsf258um5f1e0gn0jj0nnbr3qv89frv6qbdepa12lj5ukon152fad77v381ije4epj433hmk4mh08bnfknn8mdk5qib81lu8lj1g5apbe7atmnaopgublvhm4h6fjr5mpfbs3qnl3fb8nimav4q49jnmvmsiv1pvif2bv06btpi0lvcje3g30948d008apoma9qcn4dqid7vck984gguniqv8u4932t4aep6gtqcgoq3u4hntbt1p5c6p4ui4b: 
Failed to copy: HTTP error 400 (400 Bad Request) returned body: "{\"code\":400,\"message\":\"JFS file names may not be longer than 255 characters\",\"cause\":\"\",\"error_id\":\"InvalidArgumentException\",\"x-id\":\"323262574767\"}"

Perhaps (hopefully) there is a different approach to work around your challenge, but I will just try to answer the question about filtering:

General regex syntax is not supported in rclone, your example ^\w{5,10} will not work. Rclone uses the following kind of patterns:

The patterns used to match files for inclusion or exclusion are based on "file globs" as used by the unix shell.

Though, typical regex character classes are supported:

A [ and ] together make a character class, such as [a-z] or [aeiou] or [[:alpha:]] .

So the corresponding to \w would be [[:word:]]. To match a generic path/file string, would you not want to include any character, such as punctuation, not just [0-9A-Za-z_]? I would at least use [[:graph:]]. But still there are a lot of characters, with accents etc, that will not be included. So why not include any character:

A ? matches any character except a slash / .

To mach any character, including path separator, we could use the following syntax:

A { and } define a choice between elements.

So the following will match any character in a generic path string:

{/,?}

Now, I don't think you can specify number of repititions with rclone. One very, very, naive approach would be to generate an include file matching all possibilities from 1 to 255 characters long path:

/?{?,/}
/?{?,/}{?,/}
/?{?,/}{?,/}{?,/}
/?{?,/}{?,/}{?,/}{?,/}
/?{?,/}{?,/}{?,/}{?,/}{?,/}
/?{?,/}{?,/}{?,/}{?,/}{?,/}{?,/}
/?{?,/}{?,/}{?,/}{?,/}{?,/}{?,/}{?,/}
etc...

Every line begins with / to always match from start of the path (from top level of your directory tree), and then with an ? since the path must contain at least one character. Then repeating {?,/} to match either characters or path separators x number of times from 1 to 254, to match any path from 1 to 255 characters. (possibly -1 if first / of paths should be counted in?).

Using Powershell you can generate a filter file with all 254 lines like this:

1..254 | % { '/?'+('{?,/}'*$_) } | Out-File myfilterfile.txt -Encoding utf8

Test it with rclone ls command, and see which files would be included:

rclone ls jfs:/Test --include-from myfilterfile.txt

Or probably more revealing: Which files would be excluded:

rclone ls jfs:/Test --exclude-from myfilterfile.txt

2 Likes

Thank you for an extremely well written/explained post. Much appreciated!

If it's not possible to do it a smarter way, I think I'm just going to bite the bullet and go through and rename the directories/files in the source. Thankfully it's a doable amount of files.

1 Like

I did have the idea that I should leave a "backdoor" into the glob patterns so you could write a regular expression as understood by the Go standard library.

I haven't figured out what the backdoor might be but it needs to be an invalid glob pattern or at least something wildly unlikely - maybe {{regexp goes here}} - that syntax could be mixed with glob syntax potentially.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.