I seem to hit a special character encoding issue on macOS using a crypto destination.
Copying the same file once with the destination mounted and then using rclone copy yields two files in the encrypted destination and two cleartext files with the same name.
I've put the full repro and verbose logs in the attached text file here.
I will attempt to paste the useful things back into this post as well. repro-steps.txt (45.6 KB)
Please let me know if I have missed anything.
Thank you for taking the time!
Run the command 'rclone version' and share the full output of the command.
There is some issue for sure. I immediately thought about normalization issues - which would be not perfect but is fact of life. However what is worse is that Tést.txt copied to crypt remote using rclone copy not only has different encrypted name but is not visible in mount. All files are visible when doing rclone lsl:
Thank you for looking into this issue. I really appreciate it!
I forgot to mention, but you already discovered it, that the reason I started debugging this is indeed that I copied files with rclone copy/sync and they didn't show up when mounted. I was worried about data loss.
you could try to uninstall macOS fuse and use https://www.fuse-t.org/ instead - might be temporary workaround - as I really suspect fuse is a problem here
Different crypto is expected as file names are not the same.... welcome to UNICODE world. There is not one é - even if they look the same. More details about this "phenomenon" and related issues can be found here.
And here you are the macOS solution - you have to add -o modules=iconv,from_code=UTF-8,to_code=UTF-8 flag to your mount:
rclone mount crypt: mountPoint -o modules=iconv,from_code=UTF-8,to_code=UTF-8
This is already mentioned in docs. But it seems that nowadays it has to be also added with macFUSE not only with FUSE-T. So maybe it can be made default moving forward.
The lesson here also is that for mission critical data and applications (especially if working cross platforms) it is better to stick to only ASCII characters - this is still reality in 2023.
with at least macFUSE -o modules=iconv,from_code=UTF-8,to_code=UTF-8 makes files visible in Finder but they are not accessible:
file copied to mount directly:
$ cat Tést.txt
123
the same file copied by rclone copy:
ls -l
total 8
-rw-r--r-- 1 kptsky staff 4 Jun 15 07:10 Tést.txt
drwxr-xr-x 1 kptsky staff 0 Jun 15 07:13 test
drwxr-xr-x 1 kptsky staff 0 Jun 15 07:17 test_copy
$ cat Tést.txt
cat: Tést.txt: No such file or directory
so we have an issue with macFUSE
I have uninstalled macFUSE and installed FUSE-T - there is exactly the same problem.
Without -o modules=iconv,from_code=UTF-8,to_code=UTF-8 files copied with rclone copy are not visible. With extra mount flag all files are visible but the same files are not accessible.
rclone mount in macOS seems to be partially broken then:(
This leads me to believe there is some sort of UTF normalization that takes place on the rclone copy side on macOS but only if the file doesn't already exist?
I am trying to understand this issue:) It has been around for very long and would be good to fix it finally. But I am still trying to grasp the problem.
I did more tests - crypt/no crypt, macFUSE/FUSE-T, local/remote - my conclusion is that rclone mount in macOS is simply broken. Some past workarounds like setting iconv in mac fuse are not better that just snake oil - they fix some issues but create new ones. IMHO it is not as simple as fuse problem or rclone problem. There is fundamental issue how these two programs work together on macOS.
None of these issues is present in Linux or Windows - at the same time there are many other programs using macOS fuse working fine. So problem to fix is subtle - which is an issue on its own.
This is something I've spent quite a lot of time on in the past!
The problem is that macOS stores its file names in unicode NFD format rather than the format everyone else uses which is NFC.
This is the difference between the two forms of the Tést.txt file.
All the cloud providers (and in fact everyone else in the entire universe) uses NFC format. This is the é\xc3\xa9 format rather than the NFD format é which is e\xcc\x81. rclone copy goes to some effort to match the two types of normalization up. rclone mount doesn't though.
What -o modules=iconv,from_code=UTF-8,to_code=UTF-8 does is tells fuse not to touch the UTF-8 format rclone uses.
The default here is -o modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC which tells fuse to convert the UTF-8 rclone uses into NFD UTF-8 which macOS likes.
This used to work fine! However I believe that newer macOS don't actually need the NFD form any more or something has changed in macFUSE.
Note in your example above
ls gave the file name as Tést.txt which is 54 65 cc 81 73 74 2e 74 78 74 which is NFD but you typed Tést.txt which is 54 c3 a9 73 74 2e 74 78 74 which is NFC. I think if you'd cut and pasted exactly what you got from ls it would have worked.
That is macOS doing the changing, not rclone.
Maybe rclone should be doing the NFD->NFC itself in rclone mount on macOS so you can use either normalisation.
Anyway this is a can of worms which you thank Apple for!
I have to spend more time to get more understanding/testing:)
It is true that Apple using NFD and everybody else NFC creates some funny problems. And that Apple filesystem actually is not using any normalization - filenames are just a ‘bag of bytes’ - moving problem to user space.
Still I think we can improve it or at least document better. I will report back when I have some facts.
Thank you @ncw and @kapitainsky for taking the time to look into this!
I am so far unable to reproduce this issue with a debugger attached. fs/sync/sync.go @ 886: NoUnicodeNormalization is indeed set to false and I can see both versions of the string being correctly transformed to c3 a9
However, in my original example and in @kapitainsky's quote at the bottom, the un-normalized 'e' + cc 81 slipped through an rclone copy into the crypt somehow.
rclone copy produced 4mvk0n0mss2apim05t4a72hbc4 which decodes to:
My tests so far show that does not matter Linux or macOS using rclone I can create content with NFD or NFC names' encoding. Difference is that Linux does not care - any content works.
macOS rclone mount has no problem dealing with NFC names - either folders or files. No special options are required for FUSE-T. Content is accessible in shell and in Finder.
When there are NFD names then to see them in mount we have to add -o modules=iconv,from_code=UTF-8,to_code=UTF-8 to make content visible. But then there are new problems - NFD files are not accessible in shell and NFC not in Finder... far from usable.
It looks for me that FUSE-T/rclone should convert NFD->NFC to make it work in macOS.