Error copying some files to Azure Blob

What is the problem you are having with rclone?

I have some files that fail to upload to Azure, (best guess due to their filename including some strange characters that azure dislikes). The copy command produces an azure error, so I was wondering if there is something else I could try to get them copied rather than getting them renamed at the source? Something with encoding perhaps?

Run the command 'rclone version' and share the full output of the command.

rclone v1.59.0

  • os/version: ubuntu 20.04 (64 bit)
  • os/kernel: 5.15.0-1014-azure (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.18.3
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Local to Azure Blob

The command you were trying to run (eg rclone copy /tmp remote:tmp)

/usr/bin/rclone --low-level-retries=1 --retries=1 --progress --checkers=1 --transfers=1 --log-file=logfile2.txt --no-traverse --files-from=retry-files2.txt --log-level=DEBUG copy . az-blobteststdarcglrs202101:test-test

The rclone config contents with secrets removed.

[az-blobteststdarcglrs202101]
type = azureblob
account = <redacted>
key = <redacted>
access_tier = Archive
archive_tier_delete = true
chunk_size = 100M

A log from the command with the -vv flag

2022/07/27 08:51:06 DEBUG : rclone: Version "v1.59.0" starting with parameters ["/usr/bin/rclone" "--low-level-retries=1" "--retries=1" "--progress" "--checkers=1" "--transfers=1" "--log-file=logfile2.txt" "--no-traverse" "--files-from=retry-files2.txt" "--log-level=DEBUG" "copy" "." "az-blobteststdarcglrs202101:test-test"]
2022/07/27 08:51:06 DEBUG : Creating backend with remote "."
2022/07/27 08:51:06 DEBUG : Using config file from "/home/dbsyncuser/.config/rclone/rclone.conf"
2022/07/27 08:51:06 DEBUG : fs cache: renaming cache item "." to be canonical "/home/dbsyncuser/testfiles"
2022/07/27 08:51:06 DEBUG : Creating backend with remote "az-blobteststdarcglrs202101:test-test"
2022/07/27 08:51:06 DEBUG : Azure container test-test: Waiting for checks to finish
2022/07/27 08:51:06 DEBUG : Azure container test-test: Waiting for transfers to finish
2022/07/27 08:51:06 ERROR : N<8e>R TROLLFAMILJEN KOM P<8f> MIDDAG Minisaga.doc: Failed to copy: write error: -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, github.com/Azure/azure-storage-blob-go@v0.15.0/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=) =====
Description=failed to unmarshal response body, Details: (none)
   PUT https://blobteststdarcglrs202101.blob.core.windows.net/test-test/N%C2%8ER%20TROLLFAMILJEN%20KOM%20P%C2%8F%20MIDDAG%20Minisaga.doc?blockid=Dv5qqvCvSQ21WLXngnt6OQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%3D%3D&comp=block&timeout=31536001
   Authorization: REDACTED
   Content-Length: [37376]
   User-Agent: [rclone/v1.59.0]
   X-Ms-Client-Request-Id: [45c935d9-06cd-4258-69ea-02971393815a]
   X-Ms-Date: [Wed, 27 Jul 2022 08:51:06 GMT]
   X-Ms-Version: [2020-10-02]
   --------------------------------------------------------------------------------
   RESPONSE Status: 400 Bad Request
   Content-Length: [324]
   Content-Type: [text/html; charset=us-ascii]
   Date: [Wed, 27 Jul 2022 08:51:06 GMT]
   Server: [Microsoft-HTTPAPI/2.0]


xml: (*azblob.storageError).UnmarshalXML did not consume entire <HTML> element
2022/07/27 08:51:06 ERROR : Attempt 1/1 failed with 1 errors and: write error: -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, github.com/Azure/azure-storage-blob-go@v0.15.0/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=) =====
Description=failed to unmarshal response body, Details: (none)
   PUT https://blobteststdarcglrs202101.blob.core.windows.net/test-test/N%C2%8ER%20TROLLFAMILJEN%20KOM%20P%C2%8F%20MIDDAG%20Minisaga.doc?blockid=Dv5qqvCvSQ21WLXngnt6OQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%3D%3D&comp=block&timeout=31536001
   Authorization: REDACTED
   Content-Length: [37376]
   User-Agent: [rclone/v1.59.0]
   X-Ms-Client-Request-Id: [45c935d9-06cd-4258-69ea-02971393815a]
   X-Ms-Date: [Wed, 27 Jul 2022 08:51:06 GMT]
   X-Ms-Version: [2020-10-02]
   --------------------------------------------------------------------------------
   RESPONSE Status: 400 Bad Request
   Content-Length: [324]
   Content-Type: [text/html; charset=us-ascii]
   Date: [Wed, 27 Jul 2022 08:51:06 GMT]
   Server: [Microsoft-HTTPAPI/2.0]


xml: (*azblob.storageError).UnmarshalXML did not consume entire <HTML> element
2022/07/27 08:51:06 INFO  :
Transferred:       36.500 KiB / 36.500 KiB, 100%, 0 B/s, ETA -
Errors:                 1 (retrying may help)
Elapsed time:         0.1s

2022/07/27 08:51:06 DEBUG : 3 go routines active
2022/07/27 08:51:06 Failed to copy: write error: -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, github.com/Azure/azure-storage-blob-go@v0.15.0/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=) =====
Description=failed to unmarshal response body, Details: (none)
   PUT https://blobteststdarcglrs202101.blob.core.windows.net/test-test/N%C2%8ER%20TROLLFAMILJEN%20KOM%20P%C2%8F%20MIDDAG%20Minisaga.doc?blockid=Dv5qqvCvSQ21WLXngnt6OQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%3D%3D&comp=block&timeout=31536001
   Authorization: REDACTED
   Content-Length: [37376]
   User-Agent: [rclone/v1.59.0]
   X-Ms-Client-Request-Id: [45c935d9-06cd-4258-69ea-02971393815a]
   X-Ms-Date: [Wed, 27 Jul 2022 08:51:06 GMT]
   X-Ms-Version: [2020-10-02]
   --------------------------------------------------------------------------------
   RESPONSE Status: 400 Bad Request
   Content-Length: [324]
   Content-Type: [text/html; charset=us-ascii]
   Date: [Wed, 27 Jul 2022 08:51:06 GMT]
   Server: [Microsoft-HTTPAPI/2.0]


xml: (*azblob.storageError).UnmarshalXML did not consume entire <HTML> element
2022/07/27 08:58:53 DEBUG : rclone: Version "v1.59.0" starting with parameters ["/usr/bin/rclone" "--low-level-retries=1" "--retries=1" "--progress" "--checkers=1" "--transfers=1" "--log-file=logfile2.txt" "--no-traverse" "--files-from=retry-files2.txt" "--log-level=DEBUG" "copy" "." "az-blobteststdarcglrs202101:test-test"]
2022/07/27 08:58:53 DEBUG : Creating backend with remote "."
2022/07/27 08:58:53 DEBUG : Using config file from "/home/dbsyncuser/.config/rclone/rclone.conf"
2022/07/27 08:58:53 DEBUG : fs cache: renaming cache item "." to be canonical "/home/dbsyncuser/testfiles"
2022/07/27 08:58:53 DEBUG : Creating backend with remote "az-blobteststdarcglrs202101:test-test"
2022/07/27 08:58:53 DEBUG : Azure container test-test: Waiting for checks to finish
2022/07/27 08:58:53 DEBUG : Azure container test-test: Waiting for transfers to finish
2022/07/27 08:58:53 ERROR : N<8e>R TROLLFAMILJEN KOM P<8f> MIDDAG Minisaga.doc: Failed to copy: write error: -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, github.com/Azure/azure-storage-blob-go@v0.15.0/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=) =====
Description=failed to unmarshal response body, Details: (none)
   PUT https://blobteststdarcglrs202101.blob.core.windows.net/test-test/N%C2%8ER%20TROLLFAMILJEN%20KOM%20P%C2%8F%20MIDDAG%20Minisaga.doc?blockid=qRgoutMRQaeoRB7VI%2BhBOgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%3D%3D&comp=block&timeout=31536001
   Authorization: REDACTED
   Content-Length: [37376]
   User-Agent: [rclone/v1.59.0]
   X-Ms-Client-Request-Id: [4b0fae78-a1b9-4ca0-4120-4d754727992a]
   X-Ms-Date: [Wed, 27 Jul 2022 08:58:53 GMT]
   X-Ms-Version: [2020-10-02]
   --------------------------------------------------------------------------------
   RESPONSE Status: 400 Bad Request
   Content-Length: [324]
   Content-Type: [text/html; charset=us-ascii]
   Date: [Wed, 27 Jul 2022 08:58:53 GMT]
   Server: [Microsoft-HTTPAPI/2.0]


xml: (*azblob.storageError).UnmarshalXML did not consume entire <HTML> element
2022/07/27 08:58:53 ERROR : Attempt 1/1 failed with 1 errors and: write error: -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, github.com/Azure/azure-storage-blob-go@v0.15.0/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=) =====
Description=failed to unmarshal response body, Details: (none)
   PUT https://blobteststdarcglrs202101.blob.core.windows.net/test-test/N%C2%8ER%20TROLLFAMILJEN%20KOM%20P%C2%8F%20MIDDAG%20Minisaga.doc?blockid=qRgoutMRQaeoRB7VI%2BhBOgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%3D%3D&comp=block&timeout=31536001
   Authorization: REDACTED
   Content-Length: [37376]
   User-Agent: [rclone/v1.59.0]
   X-Ms-Client-Request-Id: [4b0fae78-a1b9-4ca0-4120-4d754727992a]
   X-Ms-Date: [Wed, 27 Jul 2022 08:58:53 GMT]
   X-Ms-Version: [2020-10-02]
   --------------------------------------------------------------------------------
   RESPONSE Status: 400 Bad Request
   Content-Length: [324]
   Content-Type: [text/html; charset=us-ascii]
   Date: [Wed, 27 Jul 2022 08:58:53 GMT]
   Server: [Microsoft-HTTPAPI/2.0]


xml: (*azblob.storageError).UnmarshalXML did not consume entire <HTML> element
2022/07/27 08:58:53 INFO  :
Transferred:       36.500 KiB / 36.500 KiB, 100%, 0 B/s, ETA -
Errors:                 1 (retrying may help)
Elapsed time:         0.1s

2022/07/27 08:58:53 DEBUG : 3 go routines active
2022/07/27 08:58:53 Failed to copy: write error: -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, github.com/Azure/azure-storage-blob-go@v0.15.0/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=) =====
Description=failed to unmarshal response body, Details: (none)
   PUT https://blobteststdarcglrs202101.blob.core.windows.net/test-test/N%C2%8ER%20TROLLFAMILJEN%20KOM%20P%C2%8F%20MIDDAG%20Minisaga.doc?blockid=qRgoutMRQaeoRB7VI%2BhBOgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%3D%3D&comp=block&timeout=31536001
   Authorization: REDACTED
   Content-Length: [37376]
   User-Agent: [rclone/v1.59.0]
   X-Ms-Client-Request-Id: [4b0fae78-a1b9-4ca0-4120-4d754727992a]
   X-Ms-Date: [Wed, 27 Jul 2022 08:58:53 GMT]
   X-Ms-Version: [2020-10-02]
   --------------------------------------------------------------------------------
   RESPONSE Status: 400 Bad Request
   Content-Length: [324]
   Content-Type: [text/html; charset=us-ascii]
   Date: [Wed, 27 Jul 2022 08:58:53 GMT]
   Server: [Microsoft-HTTPAPI/2.0]


xml: (*azblob.storageError).UnmarshalXML did not consume entire <HTML> element


adding...
Listing the filename with " ls -la" locally gives this:

-rw-rw-r-- 1 dbsyncuser dbsyncuser 37376 Jul  9  2021 'N'$'\302\216''R TROLLFAMILJEN KOM P'$'\302\217'' MIDDAG Minisaga.doc'

and this is how "retry-files2.txt" it looks in vi:

N<8e>R TROLLFAMILJEN KOM P<8f> MIDDAG Minisaga.doc

What this looks like is that your file names are not encoded in UTF-8 - they are encoded in some other encoding - maybe iso-8859-1 or maybe a windows code page.

I think, reading Naming and Referencing Containers, Blobs, and Metadata - Azure Storage | Microsoft Docs it is likely that those characters can't be stored as-is on azure blob.

What I would do is use the convmv tool to convert the encoding into UTF-8 on the disk.

You could also wrap the azure blob in a crypt backend which will fix the problem in a different way.

I don't think you can use rclone's encoding scheme to fix this unfortunately :frowning:

Thanks for the input @ncw !

I checked with convmv, and the file does actually seem to be UTF-8 already...

convmv -i -f iso-8859-1 -t UTF-8 -r .
Starting a dry run without changes...
Skipping, already UTF-8: ./NR TROLLFAMILJEN KOM P MIDDAG Minisaga.doc

I then tried convering it from UTF-8 to iso-8859-1 as follows:

convmv --notest -i -f UTF-8 -t iso-8859-1 -r .
mv "./NR TROLLFAMILJEN KOM P MIDDAG Minisaga.doc"       "./N�R TROLLFAMILJEN KOM P� MIDDAG Minisaga.doc" (y/n) y

Ready! I converted 1 files in 1 seconds.

File then looks identical in a "ls-la" listing...

ls -la
-rw-rw-r-- 1 dbsyncuser dbsyncuser 37376 May 11  2015 'N'$'\216''R TROLLFAMILJEN KOM P'$'\217'' MIDDAG Minisaga.doc'

But surprisingly now rclone seems to be ably to detect and modify it automatically so that azure accepts it:

2022/07/28 10:20:56 DEBUG : rclone: Version "v1.59.0" starting with parameters ["/usr/bin/rclone" "--low-level-retries=0" "--retries=1" "--progress" "--checkers=1" "--transfers=1" "--log-file=logfile.txt" "--no-traverse" "--log-level=DEB
UG" "--exclude" "logfile.txt" "copy" "." "az-blobteststdarcglrs202101:test-test"]
2022/07/28 10:20:56 DEBUG : Creating backend with remote "."
2022/07/28 10:20:56 DEBUG : Using config file from "/home/dbsyncuser/.config/rclone/rclone.conf"
2022/07/28 10:20:56 DEBUG : fs cache: renaming cache item "." to be canonical "/home/dbsyncuser/testfiles/newtest-20220727"
2022/07/28 10:20:56 DEBUG : Creating backend with remote "az-blobteststdarcglrs202101:test-test"
2022/07/28 10:20:56 NOTICE: Local file system at /home/dbsyncuser/testfiles/newtest-20220727: Replacing invalid UTF-8 characters in "N\x8eR TROLLFAMILJEN KOM P\x8f MIDDAG Minisaga.doc"
2022/07/28 10:20:56 DEBUG : logfile.txt: Excluded
2022/07/28 10:20:56 DEBUG : Azure container test-test: Waiting for checks to finish
2022/07/28 10:20:56 DEBUG : Azure container test-test: Waiting for transfers to finish
2022/07/28 10:20:56 DEBUG : N�R TROLLFAMILJEN KOM P� MIDDAG Minisaga.doc: md5 = 03d0d558ce04b555480a5cd4b4e99358 OK
2022/07/28 10:20:56 INFO  : N�R TROLLFAMILJEN KOM P� MIDDAG Minisaga.doc: Copied (new)
2022/07/28 10:20:56 INFO  :

So it looks as it is an UTF-8 encoded filename, but still with illegal characters of some sort?

All this "encoding stuff" is not my cup of tea at all, but I know these files have been migrated many times over the years with different tools, and has resided on many different platforms, including AFS, Samba, CIFS, USB drives, Dropbox, FTP etc.

All in all I have a few hundred files that behave like this, so perhaps I should just handle them manually and rename them...

OK, that is good to know - I wasn't 100% sure!

I think what is happening is that Azure blob is refusing the unicode characters with codepoint 0x8e and 0x8f. These count as control characters so it seems reasonable to not allow them.

I think they were probably something like Windows code page 850 which got translated incorrectly to UTF-8 which would make the characters Ä and Å respectively so the file would become

NÄR TROLLFAMILJEN KOM PÅ MIDDAG Minisaga.doc

which I think is probably correct as this translates from Swedish as "When the troll family came to dinner" according to google translate.

This will cause rclone's illegal unicode mechanism to kick in - that is what the � symbols are. I don't recommend doing this.

I have spent a lot of time on this in the past!

If you can rename the files to the correct UTF-8 future data archaeologists will thank you :slight_smile:

Hopefully the windows code page will give you an idea as to what the characters should be. Viewing the file names in vi seems like the best way to see what the characters really are.

1 Like

I ended up just removing these unreadable characters (mostly <8b>, <81>,< 8f>, and <8e>) from the filenames manually at the source and then they copied fine into azure blob.

Thanks for the help!

1 Like