Couple of questions regarding filename length and filename encoding

I'm testing IDrive e2, which has a maximum length for object names of 254. I'm also considering using OneDrive, limited by 400 chars for entire path and 255 for each segment (filename or folder).

  1. I learned that S3 doesn't have the concept of directories, so the 254 limit counts for the entire path within the bucket, right?
  2. I use crypt. OneDrive is case insensitive, so I know I can't or shouldn't use base64 for filename encoding. I'm looking to use base32768 to have the less chance to be affected by filename limit, but I'm not sure which providers are suitable for that. As I said, I'm currently interested in IDrive e2 (S3) and OneDrive. Are these two suitable for this encoding?

Thanks in advance.

For full rclone path: bucket:dir1/dir2/file

max bucket name length = 63
max file, dir1 or dir2 names' length = 254
dir1/dir2/file = iDrive docs do not specify it but usually for S3 storage it is 1024 - if you use extremely long path I suggest you test it yourself to get exact number

You can validate it yourself:

$ rclone test info --check-length --check-base32768 onedrive:test_info

// onedrive
maxFileLength = 256 // for 1 byte unicode characters
maxFileLength = 256 // for 2 byte unicode characters
maxFileLength = 202 // for 3 byte unicode characters
maxFileLength = 128 // for 4 byte unicode characters
base32768isOK = true // make sure maxFileLength for 2 byte unicode chars is the same as for 1 byte characters
$ rclone test info --check-length --check-base32768 iDrive:test/test_info

// iDrive
maxFileLength = 255 // for 1 byte unicode characters
maxFileLength = 127 // for 2 byte unicode characters
maxFileLength = 85 // for 3 byte unicode characters
maxFileLength = 63 // for 4 byte unicode characters
base32768isOK = true // make sure maxFileLength for 2 byte unicode chars is the same as for 1 byte characters

Both remotes do not have any issues storing any of base32768 characters (base32768isOK = true)

For onedrive 2 bytes unicode characters (as in base32768 encoding) are counted as 1 so base32768 will provide significant length savings - you might be actually able to store path longer than 400 chars limit - for 1 byte characters' names:

400 * 15 / 8 = 750

It is not the case for iDrive. Max file name length is only 127 for base32768 characters.
iDrive S3 is case sensitive so base64 encoding will work.

But for iDrive S3 I would still consider using base32768 because it encodes 15 bits per character vs 6 bits for base64 and

127 * 15 > 6 * 255
1905 > 1530

8 bits per char:
1905 / 8 > 1530 / 8
238 > 191

This maths does not include padding etc. but clearly even half length base32768 encodes more characters than base64

Thanks.

From what I understood, it's 254 for path + filename. There's no real directory in S3, folders are implicitly handled by using the delimiter /, but inside a bucket the structure is flat, without hierarchy, so strictly speaking the object name is the entire path (after the bucket). ref

I was scared to use this command because of the warning: "this can create undeletable files and other hazards - use with care". Thanks for running it and confirming both are base32768 compatible, but:

is this right? "make sure maxFileLength for 2 byte unicode chars is the same as for 1 byte characters". Well, it's not the same, 255 ≠ 127, so base32768isOK should be false, right?

If not 100% sure I suggest you simply test as it is very easy to validate any of your assumptions in this case.

Here you are example of 500+ characters long path on iDrive - so your understanding is not right...

$ rclone mkdir iDrive:test2/12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij

$ rclone copy long123456long123456long123456long123456long123456long123456long123456long123456long123456long123456.txt iDrive:test2/12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij

rclone ls iDrive:test2
        4 12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij/long123456long123456long123456long123456long123456long123456long123456long123456long123456long123456.txt

Yes it can create some mess indeed - this is what test bucket/folder is for... you can delete all bucket/folder later.

No.

base32768isOK indicates that remote can use any of the characters used in this encoding. Nothing to do with max length.

I think this rclone test output should be reworded as now it implies that base32768 is only useful when 2 bytes characters are counted as 1... And it is not the case.

I'm having issues uploading some files, and after a few tests I concluded that base32768 is the cause of the error, for some reason I don't know.

So I created a test bucket and ran the same command as you:

rclone test info --check-length --check-base32768 tst:tst/test_info

Got a SlowDown warning, so tried again appending --upload-wait 1s. Got SlowDown again, but at the end I got:

// tst
maxFileLength = 143 // for 1 byte unicode characters
maxFileLength = 71 // for 2 byte unicode characters
maxFileLength = 47 // for 3 byte unicode characters
maxFileLength = 35 // for 4 byte unicode characters
base32768isOK = false // make sure maxFileLength for 2 byte unicode chars is the same as for 1 byte characters

So IDrive isn't compatible with base32768 (base32768isOK = false). But you ran the same in the same provider and got true. How can this be possible? Maybe different bucket settings? Mine: screenshot.

now I tested in a normal remote and got the same results as you:

// itst
maxFileLength = 255 // for 1 byte unicode characters
maxFileLength = 127 // for 2 byte unicode characters
maxFileLength = 85 // for 3 byte unicode characters
maxFileLength = 63 // for 4 byte unicode characters
base32768isOK = true // make sure maxFileLength for 2 byte unicode chars is the same as for 1 byte characters

So that's the difference: I my previous reply I tested in a crypt remote with filename_encoding = base32768, while you tested in a normal remote.

Doc doesn't clarify if test must be done on normal or crypt remote. My guess is that I've done it right (testing on a crypt remote with filename_encoding = base32768), because the false result is in line with upload issues I'm facing.

Test can be run on any remote - its purpose is to test remote's capabilities. base32768 test was added to it only because it was often raised question - does my remote support this encoding?

So of course if you wish you can run it on crypt. And good you did as results are very interesting.

I have run the same but on another remote and crypt using base32... very similar problem.

After a bit of investigation the issue seems to be related to the base32768 test file used by rclone test info :

0062-␀␁␂␃␄␅␆␇␈␉␊␋␌␍␎␏␐␑␒␓␔␕␖␗␘␙␚␛␜␝␞␟.txt

This file does not play well with crypt remotes. Below attempts to copy it to crypt remote - rclone copy . crypt:

For iDrive based crypt (base32768) I have:

3>ERROR : 0062-␀‛␁‛␂‛␃‛␄‛␅‛␆‛␇‛␈‛␉‛␊‛␋‛␌‛␍‛␎‛␏‛␐‛␑‛␒‛␓‛␔‛␕‛␖‛␗‛␘‛␙‛␚‛␛‛␜‛␝‛␞‛␟.txt: Failed to copy: SlowDown: Resource requested is unreadable, please reduce your request rate

but for onedrive based crypt (base32) I have:

<3>ERROR : 0062-␀‛␁‛␂‛␃‛␄‛␅‛␆‛␇‛␈‛␉‛␊‛␋‛␌‛␍‛␎‛␏‛␐‛␑‛␒‛␓‛␔‛␕‛␖‛␗‛␘‛␙‛␚‛␛‛␜‛␝‛␞‛␟.txt: Failed to copy: invalidRequest: parameterIsTooLong: Parameter Exceeds Maximum Length
<3>ERROR : Attempt 1/3 failed with 1 errors and: invalidRequest: parameterIsTooLong: Parameter Exceeds Maximum Length

Clearly both errors are not true - only sign of deeper issue.

It would suggest for me that crypt overlays have some limitation regarding allowed characters... Not something I expected.

Before digging further I would like @ncw to comment - is it something you are aware of? Do crypt remotes have some "forbidden" unicode characters?

And what exactly this issue is?

I had issues with near 1% of files I uploaded in the base32768 remote. For instance, a 267MB video file named Cruzeiro x Goiás - Brasileirão 2014 - Primeiro Tempo Jogo Completo.webm fails to upload:

2023-09-24 02:15:44 ERROR : Cruzeiro x Goiás - Brasileirão 2014 - Primeiro Tempo Jogo Completo.webm: Failed to copy: failed to upload chunk 1 with 280012386 bytes: NoSuchUpload: The specified multipart upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed.
        status code: 404, request id: 1787BDDD68673803, host id:
2023-09-24 02:15:44 ERROR : Attempt 1/3 failed with 1 errors and: failed to upload chunk 1 with 280012386 bytes: NoSuchUpload: The specified multipart upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed.
        status code: 404, request id: 1787BDDD68673803, host id:
2023-09-24 02:16:01 ERROR : Cruzeiro x Goiás - Brasileirão 2014 - Primeiro Tempo Jogo Completo.webm: Failed to copy: failed to upload chunk 1 with 280012386 bytes: NoSuchUpload: The specified multipart upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed.
        status code: 404, request id: 1787BDE160D293A7, host id:
2023-09-24 02:16:01 ERROR : Attempt 2/3 failed with 1 errors and: failed to upload chunk 1 with 280012386 bytes: NoSuchUpload: The specified multipart upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed.
        status code: 404, request id: 1787BDE160D293A7, host id:
2023-09-24 02:16:17 ERROR : Cruzeiro x Goiás - Brasileirão 2014 - Primeiro Tempo Jogo Completo.webm: Failed to copy: failed to upload chunk 1 with 280012386 bytes: NoSuchUpload: The specified multipart upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed.
        status code: 404, request id: 1787BDE550C54B43, host id:
2023-09-24 02:16:17 ERROR : Attempt 3/3 failed with 1 errors and: failed to upload chunk 1 with 280012386 bytes: NoSuchUpload: The specified multipart upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed.
        status code: 404, request id: 1787BDE550C54B43, host id:
Transferred:          288 KiB / 288 KiB, 100%, 2.850 KiB/s, ETA 0s
Errors:                 1 (retrying may help)
Elapsed time:        52.1s
Failed to copy: failed to upload chunk 1 with 280012386 bytes: NoSuchUpload: The specified multipart upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed.
        status code: 404, request id: 1787BDE550C54B43, host id: 

This log is from when I tried changing options to not split upload in chunks. With default settings of 5MB per chunk the error is the same.

The same file uploads fine to other remotes, including crypt with default filename encoding (base32). I tried creating a new bucket/remote with the same config, only removing filename_encoding = base32768, and there was no issue.

More info about this file above that fails to upload in base32768 remote:

  • I tried renaming the file to temp.webm and upload was fine.
  • I have a file of similar name, size, same video encoding and so on (they are both videos from a single sports match split in two halves)... Only one word is different in the name, Segundo instead of Primeiro ("primeiro" means "first" and "segundo" is "second"). This sibling file uploaded just fine.
  • I renamed this other file to the same name that is failing (replacing Segundo by Primeiro), and it failed too. Then I was thinking this exact filename is causing the issue, but...
  • I renamed some other random unrelated video file to that bad filename and upload went fine! So it's not just the filename... there's something more involved.
  • Finally, the last test was to rename the original failing file to the name of its sibling file, replacing the word Primeiro by Segundo and it worked.

So... I have no idea.

  1. Renaming the file works.
  2. But the bad name is only bad when used in some files. Others files renamed to the same name, uploaded to the same path, produces no error on uploading.

For this issue I will create new bug report as it is I think not related to the key problem this thread is covering.

Done:

Thanks. Just to be clear, do you think the cause of the issue I'm facing (from my previous comment) is the same of this issue you reported on the new thread? Because message logs are different... So I'm not sure if I should create a new thread too.

Not sure - but I feel I was too optimistic in thinking that base32768 will work perfectly fine with iDrive. Maybe better to stick with old good base64:slight_smile:

and BTW I can not reproduce your issue:

I have created file named Cruzeiro x Goiás - Brasileirão 2014 - Primeiro Tempo Jogo Completo.webm but I do not see issues like yours. So definitely it is something more going on than simple characters encoding.

I think yes - so we can move from this "Couple of questions.." thread to issues specific ones.

I'm leaning to use filename_encryption = obfuscate instead of any filename_encoding. Filename encoding is not a big deal I guess, it's just to avoid automatic checks based on filename, and obfuscate should be enough. Also obfuscate allows longer names.

As I said, I was also able to upload other files with this name. But there were two files that failed with this name. So somehow file data seems to be related along with file name, these two factors are related.

Just my two cents.

A few months ago, I tried filename encryption (filename_encryption and directory_name_encryption) using OneDrive. Studently, OneDrive protected my account considering that my drive was victim of Ransomware.

Did you make the decision of encrypt filenames after uploading, so that you moved and renamed the files, or were they directly uploaded with encrypted filenames? Never heard of that, I guess there would be a note in OneDrive and/or crypt pages in rclone docs warning about this risk if this were common.

Anyway, thanks for telling us, that's important. Did you manage to unlock the files and keep them with encrypted filenames? It was hard?

No big issue.
It was just a soft-lock from OneDrive. Easy to fix... but I'm not sure if it could happen again.
I disabled that rclone feature, and uploaded the files with the original filename.