Support for Korean Jaso conversion

When using macOS, Korean file names like 문서.docx became ㅁㅜㄴ ㅅㅓ.docx in Windows and 문서.docx in Linux system (it renders correctly in Linux system but will create a duplicate if a well-named 문서.docx already exists)

Rclone doesn't convert Jaso by its own and will create file system bugs when transferring files from macOS systems. This bug is hard to be identified and diagnosed by non-Korean users, but here's a blog post explain this issue (in Korean):
[맥북/윈도우] 한글 자소 교정기 https://blog.naver.com/PostView.nhn?blogId=moricable&logNo=221583013845.

It indicated that macOS is using NFD (Normalization Form Canonical Decomposition) and Windows is using NFC (Normalization Form Canonical Composition) so it might help creating a conversion system and possibly fix this bug.

Thanks in advance.

Rclone doesn't touch the utf-8 representations of filenames it gets from the OS or cloud backends. macOS delivers to rclone the NFD form and rclone will just pass it straight on.

Previously rclone used to do unicode normalization, but it caused far more problems than it solved.

So I think what you are asking is for the local backend of rclone to do unicode normalization (or a flag for that anyway). Does that sound right?

1 Like

Yes, an optional flag would be perfect. Right now when transferring folders for example 새 폴더 (but it is actually ㅅㅐ ㅍㅗㄹㄷㅓ) to my NAS with 새 폴더 already exist in the destination, it will skip it and mark the folder as duplicate. If an optional flag converts it to how Windows and Linux systems interpret, some sync / move work between macOS and other systems would be easier.

Rclone does use unicode normalization when doing syncing to work out if files already exist, so it sounds like this is working.

If that isn't what you want you can disable with

  --no-unicode-normalization   Don't normalize unicode characters in filenames.

I think that would mean bringing back the unicode normalization in the local backend.

If you want to have an experiment then the code would need to go here

Yes, I just tested to sync some folders w/ files to OneDrive and see how Windows sees them. Here are some screenshots indicating the issue (macOS v Windows):

I'm not sure what it means as I don't have coding experience. As this is a problem only happens to Korean users, maybe I will use other tools to convert the file names first so it won't affect global users and creating more problems to rclone.

Regards,
Seohyun Joo

It is potentially a problem for any macOS user.

However for other cases I've seen the NFD representation of the file names displays properly. So it is something to do with Korean NFD not displaying properly.

Try this with the --local-unicode-normalization and see if it works how you expect.

v1.54.0-beta.4794.db3c724b1.fix-local-utf on branch fix-local-utf (uploaded in 15-30 mins)

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.