Trying to figure out how to handle file with illegal character

What is the problem you are having with rclone?

With the s3 backend, we have a few files that upload every time we run rclone as it cant seem to handle the illegal character in the filename and the upload is sent but it never shows in the bucket. we used to use the azure bob backend and there was no issue with this same file as far as we know.

i have been looking through unicode options and encoding options but dont think any of the extra options would apply and work for this issue. the standard backend encoding shows InvalidUtf8 is already included. of course we could rename the files in question but I was hopeful to find something in rclone which could auto-convert this character.

ls -lab shows:
\360\237\224\264\ TPUSA\ LIVE.mp4

Run the command 'rclone version' and share the full output of the command.

rclone v1.59.1
- os/version: unknown
- os/kernel: 3.10.105 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.18.5
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

s3 provider / other (oracle)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync -P /source/ oci:bucket/ --ignore-case --delete-excluded --filter-from=/volume1/Storage1/~Software_IT/rclone/filter-avid.txt --transfers=8 --s3-upload-concurrency=8 --size-only --retries 1 --s3-chunk-size 128M -n

The rclone config contents with secrets removed.

[oci]
type = s3
provider = Other
access_key_id = xxx
secret_access_key = xxx
endpoint = https://xxx.compat.objectstorage.us-ashburn-1.oraclecloud.com
location_constraint = us-ashburn-1
acl = private

A log from the command with the -vv flag

2022/09/15 13:00:37 DEBUG : rclone: Version "v1.59.1" starting with parameters ["rclone" "sync" "-P" "/source/" "oci:bucket/" "--ignore-case" "--delete-excluded" "--filter-from=/volume1/Storage1/~Software_IT/rclone/filter-avid.txt" "--transfers=8" "--s3-upload-concurrency=8" "--size-only" "--retries" "1" "--s3-chunk-size" "128M" "-n" "-vv"]
2022/09/15 13:00:37 DEBUG : Creating backend with remote "/volume1/X/source/"
2022/09/15 13:00:37 DEBUG : Using config file from "/root/.config/rclone/rclone.conf"
2022/09/15 13:00:37 DEBUG : Creating backend with remote "oci:source/"
2022/09/15 13:00:37 DEBUG : oci: detected overridden config - adding "{x2mCh}" suffix to name
2022/09/15 13:00:37 DEBUG : fs cache: renaming cache item "oci:source/" to be canonical "oci{x2mCh}:source"
2022-09-15 13:00:37 DEBUG : S3 bucket bucket path: Waiting for checks to finish
2022-09-15 13:00:37 DEBUG : LC_RD_Joy Reid_Nov 11.mp4: Sizes identical
2022-09-15 13:00:37 DEBUG : LC_RD_Joy Reid_Nov 11.mp4: Unchanged skipping
2022-09-15 13:00:37 DEBUG : LC_RD_Joy Reid_Nov 10.mp4: Sizes identical
2022-09-15 13:00:37 DEBUG : LC_RD_Joy Reid_Nov 10.mp4: Unchanged skipping
2022-09-15 13:00:37 NOTICE: � TPUSA LIVE - FRONTLINES DEBUT with Drew Hernandez and Kyle Rittenhouse.mp4: Skipped copy as --dry-run is set (size 1.370Gi)
2022-09-15 13:00:37 DEBUG : S3 bucket bucket path: Waiting for transfers to finish
2022-09-15 13:00:37 DEBUG : Waiting for deletions to finish

Does it appear as a different name?

S3 should be able to take any characters. I suspect an oracle incompatibility here. You could test with Aws s3 to see.

You could try putting the invalid utf-8 encoding on the S3 backend as I don't think it's on by default. Use --s3-encoding to add it (remember to keep the ones already there)

i also suspect an issue on oracles s3 implementation :confused: i will try with direct s3 as a check.

quick check by adding --s3-encoding "Slash,InvalidUtf8,Dot" the same thing happens. i do believe from docs that InvalidUtf8 is default for s3 anyways.

i was also thinking maybe it has to do with chunking as it was broken up into a multipart upload - so i tried without multipart - same result :confused:

i guess either way no one should be using emojis in filenames for godsake...but will see what aws s3 does.

worked as expected in aws s3

Yes you are right - sorry

  --s3-encoding MultiEncoder   The encoding for the backend (default Slash,InvalidUtf8,Dot)

Is it an emoji? Ah yes, it is a :red_circle: Large Red Circle

Quite a few backends have problems with emojis (Dropbox for example). I think there is some technical reason within Java which makes emoji handling hard.

That is what I expected...

I guess an encoding which did something to emojis would be of general interest.

In the mean time you could use crypt if you want to translate the names into something else.