I don't know what character set is being used, here; all I see on my screen are squares, which makes me think it's using a font characterset not on my machine.
Hmm, in theory there's a test to stop "invalid characters" from appearing
case runeValue >= 0x100:
// Some random Unicode range; we have no good rules here
thisdir := (dir % 127) + 1
base := int(runeValue - runeValue%256)
newRune := rune(base + (int(runeValue)-base+thisdir)%256)
// If the new character isn't a valid UTF8 char
// then don't rotate it. Quote it instead
if !utf8.ValidRune(newRune) {
_, _ = result.WriteRune(obfuscQuoteRune)
_, _ = result.WriteRune(runeValue)
} else {
_, _ = result.WriteRune(newRune)
}
So it looks like the Go library isn't flagging it as bad. I'm not sure what we can do about that...
UPDATE: [U+D7A5].txt file is successfully uploaded to the Google drive.
I think it's valid, but it can't only be used in Dropbox.
U+D7A5 seems to be a character that can not be used as a file name in the Dropbox. I have just uploaded a text file under the name [U+D7A5].txt in the web Dropbox, and an error occurred that there was an incorrect character in the file name.
$ echo hello | rclone rcat "dropbox:ν₯"
2021/05/01 14:01:44 ERROR : ν₯: Failed to copy: upload failed: Error in call to API function "files/upload": Invalid path: INVALID_PATH
2021/05/01 14:01:44 ERROR : ν₯: Post request rcat error: upload failed: Error in call to API function "files/upload": Invalid path: INVALID_PATH
2021/05/01 14:01:44 Failed to rcat: upload failed: Error in call to API function "files/upload": Invalid path: INVALID_PATH
Where that character is
>>> hex(ord("ν₯"))
'0xd7a5'
According to fileformat this isn't a valid unicode character. However go (see playground) does believe it to be valid.
However the check is quite simple
// ValidRune reports whether r can be legally encoded as UTF-8.
// Code points that are out of range or a surrogate half are illegal.
func ValidRune(r rune) bool {
switch {
case 0 <= r && r < surrogateMin:
return true
case surrogateMax < r && r <= MaxRune:
return true
}
return false
}
Maybe we should be using something a bit more sophisticated like unicode.IsPrint, eg
However I have a little concern here - what is valid and invalid in unicode (not utf8) can and does change beween unicode revisions. Does this mean that the encoding/decoding of files is therefore dependent on unicode version?
Potentially, yes. The obfuscated results could change depending on GoLang version!
Also, it doesn't handle the case where GoLang may support a newer version than the cloud provider, so we'd be back where we started; GoLang thinks it's good but the provider thinks it's bad.
I'm suprised at Dropbox - I would have thought they would go for a forward compatible file nameing scheme, much more like the utf.IsValid check.
@QxXU8Yknm2 I can't think of a good solution to this... You could try name encryption istead of obfuscation? Dropbox can only have 255 char file names so this may not be the best solution.