to_code=UTF-8-MAC for crypt doesn't work with umlauts

What is the problem you are having with rclone?

When using crypt with google drive to mount a directory all files with umlauts don't appear, e.g. 'Steuererklärung.pdf'.

What is your rclone version (output from rclone version)

rclone v1.55.1

  • os/type: darwin
  • os/arch: arm64
  • go/version: go1.16.3
  • go/linking: dynamic
  • go/tags: cmount

Which OS you are using and how many bits (eg Windows 7, 64 bit)

BigSur arm64

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

In this case I can't see my 'Steuererklärung.pdf'

rclone cmount -vv genc: ~/mount/genc --read-only -o modules=iconv,from_code=UTF-16,to_code=UTF-8-MAC

(-o option is the default, just added for clarity)

With this command I can see it

rclone cmount -vv genc: ~/mount/genc --read-only -o modules=iconv,from_code=UTF-16,to_code=UTF-8

Not sure whether this has unintended consequences. What else does UTF-8-MAC do?

The rclone config contents with secrets removed.

Standard google drive config + Standard crypt config in subfolder.

A log from the command with the -vv flag

2021/05/12 23:17:26 DEBUG : Using config file from "/Users/xyz/.config/rclone/rclone.conf"
2021/05/12 23:17:26 DEBUG : rclone: Version "v1.55.1" starting with parameters ["rclone" "cmount" "-vv" "genc:" "/Users/xyz/mount/genc" "--read-only"]
2021/05/12 23:17:26 DEBUG : Creating backend with remote "genc:"
2021/05/12 23:17:26 DEBUG : Creating backend with remote "mygoogledrive:encrypted"
2021/05/12 23:17:27 DEBUG : Mounting on "/Users/xyz/mount/genc" ("genc")
2021/05/12 23:17:27 DEBUG : Adding "-o modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC" for macOS
2021/05/12 23:17:27 DEBUG : Encrypted drive 'genc:': Mounting with options: ["-o" "attr_timeout=1" "-o" "fsname=genc:" "-o" "subtype=rclone" "-o" "max_readahead=131072" "-o" "atomic_o_trunc" "-o" "ro" "-o" "volname=genc" "-o" "noappledouble" "-o" "modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC"]
2021/05/12 23:17:27 DEBUG : Encrypted drive 'genc:': Init: 
2021/05/12 23:17:27 DEBUG : Encrypted drive 'genc:': >Init: 

Hmmm - how did that file get uploaded? Was it rclone on a mac from a macOS disk or was it a different way?

Can you use rclone lsf to list the file and cut and paste the output here - then I can examine it and see exactly how the UTF is composed!

I would have expected the default with no -o iconv to work as rclone adds what I believe to be the correct version.

Google drive doesn't output UTF-16 - where did the UTF-16 come from?

Same rclone, remote, and machine a few hours earlier via

rclone copy cloud genc:

For testing purposes I did the same now via rclone rcat

echo 'meine steuern'|rclone rcat genc:/testing/'Färbung.pdf'

It's important to note that

rclone ls genc:testing
ABC.pdf
Färbung.pdf
Steuererklärung.pdf
Steuererklärung.txt
Zusätzliche.pdf
zusätzliche erklärung tätigkeitsbeschreibung allgemein.pdf
zusätzliche.pdf

also lists the file. The problem seems to be with the mount option.

ls mount/genc/testing
ABC.pdf
Färbung.pdf
Steuererklärung.pdf
Steuererklärung.txt

I don't really see the pattern yet.

When listing the directory via the mount point I get this error code for the files that are not being listed

2021/05/15 08:17:34 DEBUG : /testing/zusätzliche.pdf: Getattr: fh=0xFFFFFFFFFFFFFFFF
2021/05/15 08:17:34 DEBUG : /testing/zusätzliche.pdf: >Getattr: errc=-2

Do you know what this Getattr and errc is about?
I want to highlight again that genc is a crypt remote -- without crypt I didn't have this issue.

Thanks!

I wonder if the crypt is storing the native macOS encoding...

What happens if you try "-o modules=iconv,from_code=UTF-8-MAC,to_code=UTF-8-MAC"?

No, me neither.

They are all apparently using the same encoding

>>> "Steuererklärung.pdf".encode("utf-8")
b'Steuererkl\xc3\xa4rung.pdf'
>>> "Zusätzliche.pdf".encode("utf-8")
b'Zus\xc3\xa4tzliche.pdf'

Getattr is the OS asking about that file, and the -2 return means file not found I think.

Notice the lower case here - crypt is case sensitive unlike the usual macOS file system.

1 Like

Indeed, that works. Does this have any side-effects? Should it become the default?

Thanks for your help :slight_smile:

Yes, I shouldn't have used it in the example above -- this is unrelated.

Great!

It basically says don't translate file names so the crypt stores macOS native file names.

I think the default which is "modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC" is probably correct for most cloud storage things where you want normal UTF-8 on the remote end.

I suspect if you'd used the default (so don't supply an -o modules=iconv parameter) and uploaded the data through the mount then this would have worked properly.

I think the problem happened using rclone copy which preserved the native macOS encoding into the crypt which is why you need the "-o modules=iconv,from_code=UTF-8-MAC,to_code=UTF-8-MAC".

What you could have done is use the

  --local-unicode-normalization   Apply unicode NFC normalization to paths and filenames

flag when you were using rclone copy and I think then your data would work through rclone mount without any parameters.

Summary: Unicode is complicated! MacOS encodings make it even worse!

2 Likes

Thanks Nick!
Can I use "-o modules=iconv,from_code=UTF-8-MAC,to_code=UTF-8" on non-MAC machines then? Otherwise I may wipe the data and copy it again.

1 Like

Yes I think that should work just fine.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.