Wasabi Does Not Support 4 Byte UTF8 Characters In File Names

What is the problem you are having with rclone?

When Rclone backs up to Wasabi, that service has no support for 4 byte UTF8 characters in file names (such as emojis). It would be great if there was a way to get Rclone to reliably mangle these names or just exclude the characters when syncing.

Run the command 'rclone version' and share the full output of the command.

rclone v1.58.1
- os/version: ubuntu 22.04 (64 bit)
- os/kernel: 5.15.0-25-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.17.9
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

From Google Drive to Wasabi

The command you were trying to run (eg rclone copy /tmp remote:tmp)

#!/bin/bash

RCLONE_CONFIG=/root/.config/rclone/rclone.conf
export RCLONE_CONFIG

rclone config file
while read i; do
        id=$(echo ${i} | jq -r '.["id"]')
        name=$(echo ${i} | jq -r '.["name"]')
        echo "Backing up ${name} drive with id ${id}"
        /usr/bin/rclone sync --checksum --track-renames --log-file=/root/backup.log --log-level=DEBUG Drive: --drive-alternate-export --drive-team-drive $id --drive-root-folder-id "" Wasabi:shared-drive-backup/"$name"
done < <(rclone backend drives Drive: | jq -c '.[]')

echo "Backups Complete"

The rclone config contents with secrets removed.

[Drive]
type = drive
client_id =
client_secret =
scope = drive.readonly
token = 
root_folder_id = 0APBtOQOZ7XT8Uk9PVA

[Wasabi]
provider = Wasabi
env_auth = false
access_key_id = 
endpoint = s3.eu-central-1.wasabisys.com
secret_access_key =

A log from the command with the -vv flag

I know this is only the end of the log but I think it explains the issue.

coding="UTF-8"?>
<Error><Code>NotImplemented</Code><Message>UTF-8 using four byte encodings is not supported.</Message><RequestId>D116ECB1AA149DB3</RequestId><HostId>NoHla3M6ydByzbYKwDLIv4eJDlBcFLu9WSWG/+b9CVHW/lN3PvDruEheHoLoaMuQfd7bqS5GeAeO</HostId></Erro

2022/05/24 12:54:04 ERROR : Notion Backups 2/Export-4f90f08c-ca1f-49cd-b6f6-7391c7a271e6/The GameDev tv Way 33223ec34f214a18bd971222986c889d/Operations Director's Manual cb0290997da84374b993decb6d930e5c/🔊 Bulk Normalisation Of Audio Levels [fa89bf1b86f148bebfe2fc00220490f7.md](http://fa89bf1b86f148bebfe2fc00220490f7.md/): Failed to copy: s3 upload: 501 Not Implemented: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NotImplemented</Code><Message>UTF-8 using four byte encodings is not supported.</Message><RequestId>D116ECB1AA149DB3</RequestId><HostId>NoHla3M6ydByzbYKwDLIv4eJDlBcFLu9WSWG/+b9CVHW/lN3PvDruEheHoLoaMuQfd7bqS5GeAeO</HostId></Error>
2022/05/24 12:54:04 ERROR : S3 bucket shared-drive-backup path Data Exports: not deleting files as there were IO errors
2022/05/24 12:54:04 ERROR : S3 bucket shared-drive-backup path Data Exports: not deleting directories as there were IO errors
2022/05/24 12:54:04 ERROR : Attempt 3/3 failed with 5 errors and: s3 upload: 501 Not Implemented: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NotImplemented</Code><Message>UTF-8 using four byte encodings is not supported.</Message><RequestId>D116ECB1AA149DB3</RequestId><HostId>NoHla3M6ydByzbYKwDLIv4eJDlBcFLu9WSWG/+b9CVHW/lN3PvDruEheHoLoaMuQfd7bqS5GeAeO</HostId></Error>
2022/05/24 12:54:04 INFO  :
Transferred:      829.549 KiB / 829.549 KiB, 100%, 3.761 KiB/s, ETA 0s
Errors:                 5 (retrying may help)
Checks:              6300 / 6300, 100%
Elapsed time:      1m27.1s

2022/05/24 12:54:04 DEBUG : 24 go routines active
2022/05/24 12:54:04 Failed to sync with 5 errors: last error was: s3 upload: 501 Not Implemented: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NotImplemented</Code><Message>UTF-8 using four byte encodings is not supported.</Message><RequestId>D116ECB1AA149DB3</RequestId><HostId>NoHla3M6ydByzbYKwDLIv4eJDlBcFLu9WSWG/+b9CVHW/lN3PvDruEheHoLoaMuQfd7bqS5GeAeO</HostId></Error>

If there are any workarounds I could implement instead, please let me know. If couldn't find a way of using the allowed character filters already build in.

What you could do is wrap the wasabi remote with the crypt backend - that will encode all the file names - that is the first thing that comes to mind. You don't have to encrypt the data if you don't want to.

That is an S3 incompatibility BTW - is it documented somewhere?

Documented here: https://wasabi-support.zendesk.com/hc/en-us/articles/360002101712-Does-Wasabi-support-4-byte-UTF8-characters-

I would rather not actually encrypt the file names as I would like to be able to browse the backups on the Wasabi site. I assume there's no way around that?

Unfortunately rclone doesn't yet have an encoding mode for emojis, so rename the files or use crypt are your only options right now.

There was a thread recently about Dropbox not allowing emojis in file names too.

I have a feeling both might be limitations of the java platform which uses utf-16 internally not utf-8 but I might be wrong about that.

I've switched to using S3 and Glacier as I couldn't face working around this issue.

1 Like