Documentation update: no linebreaks in file names

What is the problem you are having with rclone?

Linux allows most characters in file names, including line breaks. The rclone documentation states that only NUL and / are not supported. However, rclone additionally does not handle line breaks in file names correctly. This is hardly a bug as line breaks really should not be used in filenames; nevertheless I suggest to clarify in the documentation.

PS: I use a deliberately awkward filename to test my own scripts.

What is your rclone version (output from rclone version)

v1.55.0

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Linux 5.8.0-48-generic #54~20.04.1-Ubuntu x86_64 GNU/Linux

Which cloud storage system are you using? (eg Google Drive)

local

The command you were trying to run (eg rclone copy /tmp remote:tmp)

buero:/tmp/rclone$ # create unusual and disrecommended, yet allowed file with a multiline filename
buero:/tmp/rclone$ declare testfile='multiline
> filename'
buero:/tmp/rclone$ touch "${testfile}"
buero:/tmp/rclone$ # confirm that the file was created
buero:/tmp/rclone$ ls -n multiline*
-rw-rw-r-- 1 1000 1000 0 Apr   5 10:52 'multiline'$'\n''filename'
buero:/tmp/rclone$ # confirm that 'ls' handles the file correctly
buero:/tmp/rclone$ declare reassigned_filename="$( ls multiline* )"
buero:/tmp/rclone$ ls -n "${reassigned_filename}"
-rw-rw-r-- 1 1000 1000 0 Apr   5 10:52 'multiline'$'\n''filename'
buero:/tmp/rclone$ # test how rclone handles this file
buero:/tmp/rclone$ rclone version | head -1
rclone v1.55.0
buero:/tmp/rclone$ rclone lsf -vv ./
<7>DEBUG : Using config file from "/home/xxx/.config/rclone/rclone.conf"
<7>DEBUG : rclone: Version "v1.55.0" starting with parameters ["/opt/rclone/rclone" "lsf" "-vv" "./"]
<7>DEBUG : rclone: systemd logging support activated
<7>DEBUG : Creating backend with remote "./"
<7>DEBUG : fs cache: renaming cache item "./" to be canonical "/tmp/rclone"
multiline␊filename
<7>DEBUG : 2 go routines active
buero:/tmp/rclone$ declare reassigned_filename="$( rclone lsf ./ )"
buero:/tmp/rclone$ ls -n "${reassigned_filename}"
ls: cannot access 'multiline'$'\342\220\212''filename': No such file or directory

The rclone config contents with secrets removed.

[mygdrive]
type = drive
scope = drive
export_formats = odt,ods,odp,svg
token = {"access_token":"xxx","token_type":"Bearer","refresh_token":"xxx","expiry":"2021-04-03T14:20:08.76658435+02:00"}
root_folder_id = xxx
client_id = xxx.apps.googleusercontent.com
client_secret = xxx

[encryptgdrive]
type = crypt
remote = mygdrive:Backups
filename_encryption = standard
directory_name_encryption = true
password = xxx
password2 = xxx

A log from the command with the -vv flag

buero:/tmp/rclone$ rclone lsf -vv ./
<7>DEBUG : Using config file from "/home/xxx/.config/rclone/rclone.conf"
<7>DEBUG : rclone: Version "v1.55.0" starting with parameters ["/opt/rclone/rclone" "lsf" "-vv" "./"]
<7>DEBUG : rclone: systemd logging support activated
<7>DEBUG : Creating backend with remote "./"
<7>DEBUG : fs cache: renaming cache item "./" to be canonical "/tmp/rclone"
multiline␊filename
<7>DEBUG : 2 go routines active

This seems to work for me

$ testfile='multiline
filename'
$ echo hello > "${testfile}"
$ ls
'multiline'$'\n''filename'
$ rclone lsf .
multiline␊filename
$ rclone cat multiline␊filename
hello
$ rclone copyto multiline␊filename multiline␊filename.copy
$ rclone cat multiline␊filename.copy
hello
$ ls
'multiline'$'\n''filename'  'multiline'$'\n''filename.copy'

I'm not quite sure why your example isn't working.

I guess the question (OP's case) is why rclone lsf is printing the filename

multiline
filename

as multiline␊filename:

$ ls
'multiline'$'\n''filename'
$ rclone lsf .
multiline␊filename

That is because of the encoding scheme in use - Overview of cloud storage systems

Rclone translates the line feed into the unicode character to make it more manageable for logs etc and translates it back again when it writes the file name.

You can turn the encoding off if you want.

So yes, rclone does deal with files with control characters in, but it does use an encoding scheme to make it usable.

The modern ls uses an encoding scheme also

$ ls
'multiline'$'\n''filename'

vs

$ rclone lsf .
multiline␊filename

Each represent a string in C representation "multiline\nfilename"

I don't know whether this needs to be called out more in the docs.

I guess this section in the docs is the relevant part:

The name shown by rclone to the user or during log output will only contain a minimal set of replaced characters to ensure correct formatting and not necessarily the actual name used on the cloud storage.

Well enough documented, probably..

I know I can configure encodings per backend, but is the "Standard" encoding, the one used "to the user or during log output", configurable as well?

Thank you, @albertony and @ncw. This is very helpful and explains. Agreed that the documentation is complete.

As the filenames shown by ls and by rclone lsf are not mutually recognized (see test code below), I can either try to convert in my scripts the ls encoded names to the rclone encoding; or ignore as linebreaks should really not be present in filenames.

buero:/tmp/rclone$ ls
'multiline'$'\n''filename.txt'
buero:/tmp/rclone$ localname="$( ls )"
buero:/tmp/rclone$ rclonename="$( rclone lsf . )"
buero:/tmp/rclone$ echo "${localname}"
multiline
filename.txt
buero:/tmp/rclone$ echo "${rclonename}"
multiline␊filename.txt
buero:/tmp/rclone$ ls -n "${localname}"
-rw-rw-r-- 1 1000 1000 130 Apr 5 16:51 'multiline'$'\n''filename.txt'
buero:/tmp/rclone$ rclone lsf "${localname}"
buero:/tmp/rclone$ ls -n "${rclonename}"
ls: cannot access 'multiline'$'\342\220\212''filename.txt': No such file or directory
buero:/tmp/rclone$ rclone lsf --format stp "${rclonename}"
130;2021-04-05 16:51:17;multiline␊filename.txt

A good question...

It isn't configurable at the moment, but it could be.

I'm not sure it needs to be but I'm open to suggestions :slight_smile:

Or use rclone lsf like you did instead of ls.

Linebreaks in file names aren't that common you might think, but on macOS the finder creates files called Icon␍ all the time (with an actual carriage return ^M or \r) so I admire your extensive testing of file names with strange characters in!

Speaking for myself: I definitely don't need it! :slight_smile: Just wanted to understand what was going on.. Would have ended up with having to make the replacement configurable also, if the intention was to be able to get rclone lsf output match that of ls.. :no_bicycles:

1 Like

@ncw In my script to accelerate copy / sync, I indeed use rclone lsf; no problem expected with linebreaks and other strange filenames, as per your explanations of the encoding.

FYI - My next step is allowing tar files as local source. With multiple 10+ GB tar files, I cannot do daily uploads of the entire archives to the remote backup solution. Instead, I intend to tar --list --verbose archive.tar the local files and rclone lsf remote: the remote files; then identify which files require upload (based on timestamp and size); then extract (only) modified files from the tar archive and upload them. This works, except for strange filenames that are differently encoded by tar --list and by rclone lsf.

:+1:

Yes that could be a problem...

The encoding rclone uses is defined so a sed script could convert one to the other easily enough.

PS I did get half way through writing a tar backend...

Ok. I had to do some reading as I am a Unicode novice (as you will have noted). uconv converts between different normalizations. So I will need to convert both outputs (rclone lsf and tar --list) to the same normalization, e.g. NFC, then make the JOIN in sql. But I need to keep the original filenames as rclone sync and tar --extract will want to be fed with their preferred normalization.

1 Like

For later readers and myself in the future: uconv takes care of the normalization of Unicode letters but not of ASCII control characters (hexcodes 0x01 - 0x1F). These also need to be translated between the outputs of rclone lsf and ls or tar --list: 0xE2 0x90 0x8? = 0x0? and 0xE2 0x90 0x9? = 0x1?.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.