Specifying `unicode_normalization` in `rclone.conf` does get `--local-unicode-normalization` applied

What is the problem you are having with rclone?

The --local-unicode-normalization=true added to the rclone command works fine for file name normalization, whereas the unicode_normalization = true specified in rclone.conf does NOT trigger the same effect.

Here is an example, I have a folder listed as follows to transfer to the remote storage:

/Users/z/Downloads/test
└── 니.txt

If I do not specify (actually it does not matter if it is specified) unicode_normalization = true in rclone.conf:

- unicode_normalization = true

and run command with flag --local-unicode-normalization=true:

rclone copy \
      -P \
+     --local-unicode-normalization=true \
      ~/Downloads/test \
      s3:data/personal-files/Asset/test

Then the test dir and all its files gets successfully transferred, because the file 니.txt with Korean character with NFD normalization in its name gets correctly normalized with NFC before transferring.

However, if I specify the unicode_normalization = true in rclone.conf:

+ unicode_normalization = true

and run command WITHOUT flag --local-unicode-normalization=true:

rclone copy \
      -P \
-     --local-unicode-normalization=true \
      ~/Downloads/test \
      s3:data/personal-files/Asset/test

Then the error occurs, because the remote storage does not handle files with NFD normalized filename.

Run the command 'rclone version' and share the full output of the command.

rclone v1.67.0
- os/version: darwin 14.5 (64 bit)
- os/kernel: 23.5.0 (x86_64)
- os/type: darwin
- os/arch: amd64
- go/version: go1.22.4
- go/linking: dynamic
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

S3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy \
      -P \
      ~/Downloads/test \
      s3:data/personal-files/Asset/test

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

[s3]
type = s3
provider = XXX
access_key_id = XXX
secret_access_key = XXX
endpoint = XXX
acl = bucket-owner-full-control
upload_cutoff = 100Mi
chunk-size = 50Mi
upload_concurrency = 16
list_chunk = 1000000
+ unicode_normalization = true
# the `+` sign above is not in the actual config, used here only for highlighting

A log from the command that you were trying to run with the -vv flag

The test dir to transfer to the remote storage:

/Users/z/Downloads/test
└── 니.txt

Running command with unicode_normalization = true added to the rclone.config:

rclone copy \
      -P -vv \
      ~/Downloads/test \
      s3:data/personal-files/Asset/test
2024/07/22 00:04:46 DEBUG : rclone: Version "v1.67.0" starting with parameters ["rclone" "copy" "-P" "-vv" "/Users/z/Downloads/test" "s3:data/personal-files/Asset/test"]
2024/07/22 00:04:46 DEBUG : Creating backend with remote "/Users/z/Downloads/test"
2024/07/22 00:04:46 DEBUG : Using config file from "/Users/z/.config/rclone/rclone.conf"
2024/07/22 00:04:46 DEBUG : Creating backend with remote "qb:data/personal-files/Asset/test"
2024/07/22 00:04:46 DEBUG : Resolving service "s3" region "us-east-1"
2024/07/22 00:04:52 DEBUG : 니.txt: Need to transfer - File not found at Destination
2024/07/22 00:04:52 DEBUG : S3 bucket data path personal-files/Asset/test: Waiting for checks to finish
2024/07/22 00:04:52 DEBUG : S3 bucket data path personal-files/Asset/test: Waiting for transfers to finish
2024/07/22 00:04:54 ERROR : 니.txt: Failed to copy: object not found
2024/07/22 00:04:54 ERROR : Attempt 1/3 failed with 1 errors and: object not found
2024/07/22 00:04:54 DEBUG : 니.txt: Need to transfer - File not found at Destination
2024/07/22 00:04:54 DEBUG : S3 bucket data path personal-files/Asset/test: Waiting for checks to finish
2024/07/22 00:04:54 DEBUG : S3 bucket data path personal-files/Asset/test: Waiting for transfers to finish
2024/07/22 00:04:54 DEBUG : pacer: low level retry 1/1 (error InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43940AD6C0, host id: )
2024/07/22 00:04:54 DEBUG : pacer: Rate limited, increasing sleep to 10ms
2024/07/22 00:04:54 DEBUG : 니.txt: Received error: InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43940AD6C0, host id:  - low level retry 0/10
2024/07/22 00:04:54 DEBUG : pacer: low level retry 1/1 (error InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D439ABAA6E6, host id: )
2024/07/22 00:04:54 DEBUG : pacer: Rate limited, increasing sleep to 20ms
2024/07/22 00:04:54 DEBUG : 니.txt: Received error: InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D439ABAA6E6, host id:  - low level retry 1/10
2024/07/22 00:04:54 DEBUG : pacer: low level retry 1/1 (error InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43A1C77DBF, host id: )
2024/07/22 00:04:54 DEBUG : pacer: Rate limited, increasing sleep to 40ms
2024/07/22 00:04:54 DEBUG : 니.txt: Received error: InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43A1C77DBF, host id:  - low level retry 2/10
2024/07/22 00:04:54 DEBUG : pacer: low level retry 1/1 (error InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43AB69F13F, host id: )
2024/07/22 00:04:54 DEBUG : pacer: Rate limited, increasing sleep to 80ms
2024/07/22 00:04:54 DEBUG : 니.txt: Received error: InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43AB69F13F, host id:  - low level retry 3/10
2024/07/22 00:04:55 DEBUG : pacer: low level retry 1/1 (error InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43B2E82856, host id: )
2024/07/22 00:04:55 DEBUG : pacer: Rate limited, increasing sleep to 160ms
2024/07/22 00:04:55 DEBUG : 니.txt: Received error: InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43B2E82856, host id:  - low level retry 4/10
2024/07/22 00:04:55 DEBUG : pacer: low level retry 1/1 (error InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43BB2FFC22, host id: )
2024/07/22 00:04:55 DEBUG : pacer: Rate limited, increasing sleep to 320ms
2024/07/22 00:04:55 DEBUG : 니.txt: Received error: InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43BB2FFC22, host id:  - low level retry 5/10
2024/07/22 00:04:55 DEBUG : pacer: low level retry 1/1 (error InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43C44886C1, host id: )
2024/07/22 00:04:55 DEBUG : pacer: Rate limited, increasing sleep to 640ms
2024/07/22 00:04:55 DEBUG : 니.txt: Received error: InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43C44886C1, host id:  - low level retry 6/10
2024/07/22 00:04:55 DEBUG : pacer: low level retry 1/1 (error InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43D7966200, host id: )
2024/07/22 00:04:55 DEBUG : pacer: Rate limited, increasing sleep to 1.28s
2024/07/22 00:04:55 DEBUG : 니.txt: Received error: InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43D7966200, host id:  - low level retry 7/10
2024/07/22 00:04:56 DEBUG : pacer: low level retry 1/1 (error InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43FDD98100, host id: )
2024/07/22 00:04:56 DEBUG : pacer: Rate limited, increasing sleep to 2s
2024/07/22 00:04:56 DEBUG : 니.txt: Received error: InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D43FDD98100, host id:  - low level retry 8/10
2024/07/22 00:04:57 DEBUG : pacer: low level retry 1/1 (error InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D444E9C0460, host id: )
2024/07/22 00:04:57 DEBUG : 니.txt: Received error: InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D444E9C0460, host id:  - low level retry 9/10
2024/07/22 00:04:57 ERROR : 니.txt: Failed to copy: InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D444E9C0460, host id:
2024/07/22 00:04:57 ERROR : Attempt 2/3 failed with 1 errors and: InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E46D444E9C0460, host id:

This looks like a bug to me.

Theoretically the flag unicode_normalization = true in config should have the same effect as the flag --local-unicode-normalization=true in command, let me know if I should open an issue on GitHub.

Thanks.

There is no s3_unicode_normalization option available... hence it has no effect when in config file in s3 remote section (it is ignored).

But there is local-unicode-normalization option available for local remote...

You could add local remote definition to your config file:

[local]
type = local
unicode_normalization = true

and then I think all will work

I tried with it added to the config file, and it works.

It seems that I have misunderstood how this flag should be specified in the config file.

Thanks for the help!

Do you know how get this work in rclone mount?

I tried both adding the flag to the config:

[local]
type = local
unicode_normalization = true

and to the mounting command:

  rclone mount \
    --allow-other \
    --attr-timeout=1s \
    --buffer-size=64M \
    --config=~/.config/rclone/rclone.conf \
    --dir-cache-time=1200h \
    --gid=1000 \
+   --local-unicode-normalization \
    --log-level=INFO \
    --log-file=/tmp/rclone-s3.log \
    --umask=022 \
    --uid=1000 \
    --use-mmap \
    --use-server-modtime \
    --vfs-cache-max-age=24h \
    --vfs-cache-max-size=100G \
    --vfs-cache-mode=full \
    --vfs-fast-fingerprint \
    --vfs-read-chunk-size=128M \
    --vfs-read-chunk-size-limit=off \
    s3: ~/mount/s3

But when I copy the folder in example to the mounted remote, the error happened:

/Users/z/Downloads/test
└── 니.txt

Here is the rclone-s3.log:

2024/07/22 17:25:28 INFO  : vfs cache: cleaned: objects 96 (was 96) in use 0, to upload 0, uploading 0, total size 90.582Gi (was 90.582Gi)
2024/07/22 17:26:28 INFO  : vfs cache: cleaned: objects 97 (was 97) in use 0, to upload 0, uploading 0, total size 90.582Gi (was 90.582Gi)
2024/07/22 17:27:28 INFO  : vfs cache: cleaned: objects 97 (was 97) in use 0, to upload 0, uploading 0, total size 90.582Gi (was 90.582Gi)
2024/07/22 17:28:06 INFO  : s3/test/니.txt: vfs cache: queuing for upload in 5s
2024/07/22 17:28:12 ERROR : s3/test/니.txt: Failed to copy: object not found
2024/07/22 17:28:12 ERROR : s3/test/니.txt: vfs cache: failed to upload try #1, will retry in 10s: vfs cache: failed to transfer file from cache to remote: object not found

Do you know how to get it work in rclone mount as well?

Updates

Note

This mount error is generated on the arch linux server with following rclone version:

rclone v1.67.0
- os/version: arch (64 bit)
- os/kernel: 6.9.10-arch1-1 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.22.5
- go/linking: dynamic
- go/tags: none

Can you try nfsmount instead of mount? It works much better in macOS and as I remember it should normalise NFD to NFC.

I want to make it work on both my macOS and my arch linux server, does nfsmount work better on linux as well?

Try --no-unicode-normalization=false flag with your mount command:

Adding --no-unicode-normalization=false to the rclone mount command still results in the same error.

2024/07/23 05:19:55 INFO  : s3/test/니.txt: vfs cache: queuing for upload in 5s
2024/07/23 05:20:01 ERROR : s3/test/니.txt: Failed to copy: object not found
2024/07/23 05:20:01 ERROR : s3/test/니.txt: vfs cache: failed to upload try #1, will retry in 10s: vfs cache: failed to transfer file from cache to remote: object not found

Does this mean the mount command does not honor the --local-unicode-normalization=true flag at all? Is this expected?

--local-unicode-normalization is local remote specific flag and does nothing when you mount S3 remote

--no-unicode-normalization is global flag and should work. @nielash I think you added this flag? Should it normalise NFD to NFC when set to false?

At least for macOS what should definitely work to convert any NFD to NFC is to use FUSE-T and then pass -o nfc option in your mount command:

rclone mount remote: mountpoint -o nfc 

What I am surprised is why your S3 remote does not support NFD? I have just tested with S3 I have access to and no issue at all with saving file named 니.txt What is your S3 provider?

It should do, because the rclone mount command I use has the flag --config=~/.config/rclone/rclone.conf where it declares

Shouldn't an file operation like copy/move from local file system to VFS horror the local settings?


What surprises me is that the --no-unicode-normalization=false not only doesn't work with the mount, but also doesn't work with normal actions like copy/sync:

rclone \
      copy -Pvv \
      --no-unicode-normalization=false \
      ~/data/test \
      s3:data/personal-files/test
2024/07/23 13:14:49 DEBUG : rclone: Version "v1.67.0" starting with parameters ["rclone" "copy" "-Pvv" "--no-unicode-normalization=false" "/home/z/data/test/니.txt" "s3:data/personal-files/test"]
2024/07/23 13:14:49 DEBUG : Creating backend with remote "/home/z/data/test/니.txt"
2024/07/23 13:14:49 DEBUG : Using config file from "/home/z/.config/rclone/rclone.conf"
2024/07/23 13:14:49 DEBUG : fs cache: adding new entry for parent of "/home/z/data/test/니.txt", "/home/z/data/test"
2024/07/23 13:14:49 DEBUG : Creating backend with remote "s3:data/personal-files/test"
2024/07/23 13:14:49 DEBUG : Resolving service "s3" region "us-east-1"
2024/07/23 13:14:50 DEBUG : 니.txt: Need to transfer - File not found at Destination
2024/07/23 13:14:51 INFO  : S3 bucket data path personal-files/test: Bucket "data" created with ACL "bucket-owner-full-control"
2024/07/23 13:14:52 ERROR : 니.txt: Failed to copy: object not found
2024/07/23 13:14:52 ERROR : Attempt 1/3 failed with 1 errors and: object not found
2024/07/23 13:14:52 DEBUG : 니.txt: Need to transfer - File not found at Destination
2024/07/23 13:14:52 DEBUG : pacer: low level retry 1/1 (error InternalError: We encountered an internal error, please try again.: cause({"Id":"pydio.grpc.data.index.personal","Code":409,"Detail":"Node path already in use","Status":"Conflict"})
        status code: 500, request id: 17E4D34EB6F249D3, host id: )

So is the flag not working at all?


Quotaless

I did some more tests.

So it turns out the --local-unicode-normalization only works on macOS.

Adding unicode_normalization = true to rclone.conf or specifying flag --local-unicode-normalization=true when running rclone copy command on linux, the NFD to NFC normalization won't happen.

However, the global flag --no-unicode-normalization=false does not work on either macOS or linux when performing rclone copy, let alone rclone mount. It seems totally ineffective.

--no-unicode-normalization=false is the default, so it is expected that adding this flag would not change anything.

On the other hand, --no-unicode-normalization=true would be a change (and would disable the unicode normalization).

What is sometimes confusing about this flag is that it defaults to a double-negative (false of something that is already no...) In other words, rclone DOES normalize by default, and if you want to disable this normalization, you need to add --no-unicode-normalization.

Hope that's helpful.

The issue for me is that rclone does not do the normalization.

If I run rclone copy <dir-contains-NFD-named-flies> remote:, it throws errors.

However if I run the conversion command first:

convmv \
  -f UTF-8 \
  -t UTF-8 \
  --nfc -r <dir-contains-NFD-named-flies> \
  --notest

Then run the same rclone copy command that throws errors, it works fine without errors.

An alternate explanation here is that your remote: just does not allow NFD filenames.

When rclone normalizes during copy, it is only for the purpose of comparing the existing names on each side to determine if two files should be considered the same (despite their different names.) It does not proactively convert the filenames when there is otherwise no need to (for example, when the destination directory is empty).

1 Like

And S3 storage I know and use does not care if names are NFD or NFC. So in this case we have some oddball designed to be different:) And cause all sort of trouble.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.