Raw filename exclude - best way to exclude specific list of complex file names?

I have a large filesystem (sftp target with ZFS backing--around 1,000,000 files) which I am using "rclone sync" to sync to a crypt target (also sftp with ZFS backing). For approximately 50 files, I get SSH_FX_ERROR which I understand is caused by 255 character filename limit in ZFS (crypt target makes filename longer)

My goal is to maintain a list of these files and exclude them from the rclone sync. All of these filenames are long and gnarly (200+ characters, non-English characters, spaces, periods, parantheses, brackets etc.) which makes them hard to regex. I tried putting list in --exclude-from file, but they weren't excluded, and I believe the reason is the various special characters are treated as regexes as that file contains regexes and not filenames.

I found --files-from-raw which inputs raw file names, but that is inclusion not exclusion. Is there a method to exclude just filenames without any regexes? Is there a way to put raw filenames in --exclude-from file? Or an easy way to generate an --exclude-from file from a list of raw filenames?

Not sure directly in rclone but you can use python to regex-escape them. It may be worth trying out.

Though as I think about it, I don't think they are regex as much as it is glob patterns, Reading filtering rules you may be able to write a script to apply the escapes and see if that works. You can always test it with ls and --include-from

fwiw, to create/test regex with python, i use https://pythex.org/

Thanks for suggestions. With a bit of trial and error, I came up with sed one-liner which escapes all of the glob pattern characters. Not pretty and not sure if it is 100% complete but it did successfully exclude all of the problematic files for me.

rclone sync -v --exclude-from <(sed 's/[.\*^$()+?{|]/\\&/g;s/[][]/\\&/g' exclude.txt)   ....

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.