What's going on with the proposal about base32768 file name encoding?

I noticed that it is discussed in https://forum.rclone.org/t/base32768-to-compress-filename-length/13202.

I do consider this proposal useful and helpful for some cloud services (e.g. OneDrive) with strict file name length limitation, especially when file name encryption is used. Is @Max-Sum still interested in this project? I noticed that many work is done in Commits · Max-Sum/rclone · GitHub, (I haven't try that yet), but there is no PR created.

1 Like

Considering the last commits are from Jan 2021, I'd surmise nothing currently.

This proposal is only useful on providers with strict name limits and on providers which limit the names in unicode characters or UTF-16 groups, not in UTF-8 encoding.

I just tested this with onedrive

// onedrive
maxFileLength = 256 // for 1 byte unicode characters
maxFileLength = 256 // for 2 byte unicode characters
maxFileLength = 202 // for 3 byte unicode characters
maxFileLength = 128 // for 4 byte unicode characters

So onedrive seems to use 2 byte encodings internally so would benefit from base32768 encoding.

Wheras for google drive I get

// drive
maxFileLength = 7754 // for 1 byte unicode characters
maxFileLength = 1292 // for 2 byte unicode characters
maxFileLength = 861 // for 3 byte unicode characters
maxFileLength = 646 // for 4 byte unicode characters

Which is very odd!

I put a beta with this extra test in here

v1.58.0-beta.5857.daafd4416.fix-test-length on branch fix-test-length (uploaded in 15-30 mins)

You can run it with

rclone test -v info --check-length remote:path-to-test-directory

It would be interesting to run on Windows against a windows file system too - on Linux I get

// local
maxFileLength = 255 // for 1 byte unicode characters
maxFileLength = 127 // for 2 byte unicode characters
maxFileLength = 85 // for 3 byte unicode characters
maxFileLength = 63 // for 4 byte unicode characters

Which is what I'd expect but Windows is different, showing that internally things are encoded as UTF-16 maybe? Maybe I picked the wrong example character for the 3 byte UTF-8 sequence as it is probably only a 1 UTF-16 symbol.

// local
maxFileLength = 255 // for 1 byte unicode characters
maxFileLength = 255 // for 2 byte unicode characters
maxFileLength = 255 // for 3 byte unicode characters
maxFileLength = 127 // for 4 byte unicode characters

Correct, on NTFS file system.

Yes that's the case. This proposal seems to be a workaround specifically designed for OneDrive, but I 'm especially interested in it because I' m planning use the OneDrive backend and the encryption feature> <.

I created a issue for this on GitHub https://github.com/rclone/rclone/issues/5801.

I' m also continue working based on the original commit by @Max-Sum on my own branch: https://github.com/tinytangent/rclone/tree/crypt_encoding_base32768

1 Like

I create a pull request for this: Support base64 and base32768 file name encoding for crypt. by tinytangent · Pull Request #5802 · rclone/rclone · GitHub

I believe I should update all the test cases in backend/crypt/cipher_test.go, but that looks like a huge amount of work QAQ...

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.