Cannot correctly handle filenames with ":" (U+FF1A full-width colon)

What is the problem you are having with rclone?

Rclone cannot correctly handle filenames with ":" (U+FF1A full-width colon, typical in Asian languages like Chinese, Japanese, etc.). Rclone treats it as ":" (half-width colon, typical in Western languages like English).

For example, when uploading a file named A:B.txt, Rclone converts the filename to A:B.txt. This not only can be inconvenient sometimes, but also can cause unwanted outcome sometimes.

Below are detailed steps to show/reproduce the problem. Let's say I have such a text file E:/Test/A:B.txt

  1. If I upload the file (or the whole "Test" directory) to GoogleDrive on the web, the filename is correctly preserved. See screenshot below:
  2. Now if I run the following command:
rclone copy "E:/Test" "GoogleDrive:/Test" --ignore-existing

Expected result: Rclone should have nothing to do, because the txt file already exists on GoogleDrive.
Actual result: Rclone mistakenly uploads a new file named A:B.txt, which becomes a duplicate of the original file. See screenshot below:

What is your rclone version (output from rclone version)

rclone v1.57.0-beta.5664.1409b89f6

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Windows 11, 64 bit

Which cloud storage system are you using? (eg Google Drive)

Google Drive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy "E:/Test" "GoogleDrive:/Test" --ignore-existing

The rclone config contents with secrets removed.

Nothing really to show - my config file only contains credentials.

type = drive
scope = drive
token = {XXXXXXX}

A log from the command with the -vv flag

2021/09/24 21:44:51 DEBUG : rclone: Version "v1.57.0-beta.5664.1409b89f6" starting with parameters ["rclone" "copy" "E://Test" "GoogleDrive:/Test" "--ignore-existing" "-vv"]
2021/09/24 21:44:51 DEBUG : Creating backend with remote "E://Test"
2021/09/24 21:44:51 DEBUG : Using config file from "E:\\Rclone\\rclone.conf"
2021/09/24 21:44:51 DEBUG : fs cache: renaming cache item "E://Test" to be canonical "//?/E://Test"
2021/09/24 21:44:51 DEBUG : Creating backend with remote "GoogleDrive:/Test"
2021/09/24 21:44:52 DEBUG : Google drive root 'Test': 'root_folder_id = 0ANtG9umnZs7kUk9PVA' - save this in the config to speed up startup
2021/09/24 21:44:52 DEBUG : fs cache: renaming cache item "GoogleDrive:/Test" to be canonical "GoogleDrive:Test"
2021/09/24 21:44:52 DEBUG : Google drive root 'Test': Waiting for checks to finish
2021/09/24 21:44:52 DEBUG : Google drive root 'Test': Waiting for transfers to finish
2021/09/24 21:44:53 DEBUG : A:B.txt: md5 = cb08ca4a7bb5f9683c19133a84872ca7 OK
2021/09/24 21:44:53 INFO  : A:B.txt: Copied (new)
2021/09/24 21:44:53 INFO  :
Transferred:              4 B / 4 B, 100%, 3 B/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:         2.2s
2021/09/24 21:44:53 DEBUG : 5 go routines active

Since : is not allowed on Windows it will be replaced when copied from a remote where it is valid, and replaced back when copy to remote. This leads to your case.

You may be able to work around it by setting --backend-encoding described here:

Thanks for your reply, but what you said is actually the opposite of my problem. The local filename is A:B.txt (with U+FF1A, a full-width colon, not the half-with colon : that you are familiar with, you can paste it in a text editor to see the difference.) It's a perfectly legitimate filename, otherwise I wouldn't be able to create it on my Windows machine in the first place. However, Rclone replaces it by A:B.txt which is then not a legitimate filename.

To summarize, Rclone replaces (good filename) by : (bad filename) and causes problem, not the other way around.

I understand your case. But because of the strategy I described: When rclone sees the file with full width colon, it (incorrectly) "assumes" it is because it was originally regular colon and had to be renamed, and therefore converts it to regular colon. (Please excuse my brevity, I'm writing from phone atm.)

Thanks again! Now that I re-read your comments it makes a lot of sense to me. Sorry for misunderstanding what you said, I should have thought more before writing.

Unfortunately, even if I run

rclone copy "E:/Test" "GoogleDrive:/Test" --ignore-existing --drive-encoding None

it still converts the full-width colon to the regular colon. The --drive-encoding flag (as I'm using GoogleDrive) doesn't seem to make any difference :astonished:

I'm not sure, but try changing the local encoding instead.

Yes sir - it solves my problem. Thank you so much for your help! :partying_face: :partying_face: :partying_face:

This should probably go in the FAQ....

We put the encoding in without realising that the characters we were using to encode : ie was used regularly by people (I suspect for the same reasons) so this question comes up fairly regularly.

Thanks Nick. True - It is commonly used in my language so I didn't think of it as a feature / a workaround for another issue in the beginning. Either an entry in FAQ or simply a warning message like Warning: ':' is being converted to ':'. Use XXX to disable it / See YYY for more details. would be helpful for novice users like me to know where to look at.

There is a patch for the FAQ in progress - thanks @albertony

Can I ask you which language that is?

And what do you use the symbol for?

Cool news. I'm using Chinese. In Chinese, every punctuation is full-width, so there is no normal (half-width) punctuation at all. You won't see things like ,.: in Chinese; instead, ,。: are used.

Let's say there is a TV episode called "Episode 6: Second Contact". In English, people (especially the majority who are using Windows) would naturally name the video file as Episode 6 - Second Contact.mkv to avoid the illegal : in Windows filename. But in Chinese, people don't have to do this "proactive renaming" because 第6集:第二类接触.mkv is already an accepted filename in Windows. So it leads to a difference in what people do in common practice.

Thank you for explaining @jiangzhenjerry

I see on Xah Lee's page

A full-width character means the character has the same width as a Chinese character, regardless of font choice.

So I expect that is the reason.

I wish I had known that when I chose the escaping scheme for rclone and I would have chosen something different... Alas it is too late to change now.

Yes. Full-width punctuation is used in Japanese, too. Yeah once the decision was made it cannot be changed, otherwise disrupting users' workflow. Thanks to rclone tree I was able to list all my 100TB files on GoogleDrive and quickly find out the files whose names were altered by Rclone. So after all, no biggie :grinning:

1 Like

Japanese language has similar use cases.
I encountered issues with the unicode variants of : and /
Hence why i'm using the below in my config.

type = local
encoding = Slash,InvalidUtf8

type = drive
encoding = Slash,InvalidUtf8

This results in the right characters on windows, gdrive and linux.

True. There are a few other cases in both Japanese and Chinese, like ! vs .

I wonder if we could add a central variable encode.WideChars = keep|none in lib/encode to make it silently skip all wide-to-half character translations but asymmetrically continue opposite half-to-wide translation, augmented by the RCLONE_WIDE_CHARS=keep environment variable and CLI flag (with settings other than keep reserved for future).
@ncw @jiangzhenjerry Do you think it can make our square-writing users happy?

Thanks. Sounds a great idea for my use case :smiley:

This scheme has the potential for clashes given that multiple files locally may map to one file remotely.

That probably wouldn't be a problem for Windows though, but might be for Linux.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.