Bug with filenames on Amazon / S3-Comp. remotes. All failing. Crypt related?

This is a copy from GitHub issue #3345 to discuss.

What is the problem you are having with rclone?

After unmounting the crypt remote "encloud" using "fusermount -u", I then manually cleared the vfs cache folder, and remounted with --cache-db-purge option. But then I was unable to list the directory or perform any other action on that remote, even when mounting with bypassing the crypt remote to see the encrypted files. Object storage crashed. The Scaleway support had to apply a hotfix for this bug at their end and had coprehensively explained the issue.

What is your rclone version (output from rclone version)

rclone v1.48.0

  • os/arch: linux/amd64
  • go version: go1.12.6

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Ubuntu 18.04.2 LTS

Which cloud storage system are you using? (eg Google Drive)

S3 - Scaleway

The command you were trying to run (eg rclone copy /tmp remote:tmp)

Mount

A log from the command with the -vv flag (eg output from rclone -vv copy /tmp remote:tmp)

Rclone logged error before hotfix at scaleway:
2019/07/09 17:34:17 ERROR : /: Dir.Stat error: InternalError: We encountered an internal error. Please try again. status code: 500, request id: tx795027a68d2d416f8d704-005d24b3f8, host id: tx795027a68d2d416f8d704-005d24b3f8

Rclone logged error after hotfix at scaleway:
2019/07/11 19:55:10 ERROR : /: Dir.Stat error: SerializationError: failed to decode REST XML response status code: 200, request id: txc7bdb342b42a43faa97f8-005d2777fd caused by: XML syntax error on line 2: illegal character code U+0001


Here is the rclone config:

[cloud]
type = s3
provider = Other
access_key_id = ***
secret_access_key = ***
region = nl-ams
endpoint = s3.nl-ams.scw.cloud
acl = private
bucket_acl = private
upload_cutoff = 256M
chunk_size = 256M
upload_concurrency = 8
force_path_style = false

[encloud]
type = crypt
remote = cloud:/album/cloud
filename_encryption = standard
directory_name_encryption = true
password = ***
password2 = ***

Scaleway Team response to he issue:

Hello,

First of all, thank you for your report, we did have an encoding bug on our
end, which is now fixed. The problem you are now having is on the rclone's
side, and there is nothing we can do about it.

Rclone did upload object with control characters in the names. For example:

data/-position-left-05

Which is:

00000000  64 61 74 61 2f 05 2d 70  6f 73 69 74 69 6f 6e 2d  |data/.-position-|
00000010  6c 65 66 74 2d 30 35 0a                           |left-05.

Notice the 0x05 just before the '-'.

Now, the first time you reported the bug to us, the gateway was crashing on
listing, because our underlying encoding library was panicking on characters
such as this one. We now have the same comportement as Amazon, which is
introducing the characters in the XML: "&#x05".

However, in this specific case, the 1.0 XML parsers are not compatible with
such characters, that is why rclone is failing on mount (The same bug is
happening to our console right now, which is why you cannot list via the web
interface).

Amazon specifies the ?encoding-type=url parameter for such cases.
From Amazon's documentation[1]:

Param: encoding-type
Description: Requests Amazon S3 to encode the response and specifies the
encoding method to use.

An object key can contain any Unicode character. However, XML 1.0 parsers
cannot parse some characters, such as characters with an ASCII value from 0 to
10. For characters that are not supported in XML 1.0, you can add this
parameter to request that Amazon S3 encode the keys in the response.

Type: String
Default: None
Valid value: url

We did test on the Amazon S3 gateway, with those objects names, and rclone
fails the same way. This is why I believe the rclone s3 crypt implementation is
broken in this way, since it uploads special object names, without specifying
the encoding type to url on listing.

In the meantime, I suggest you use another tool to access your data. Another
workaround will be to list your files with aws s3 ls (which is using the
encoding-type=url) and spot the files that have control characters in their
names. You can attempt to delete them in order to be able to mount your bucket
with rclone.

In the hope I have answered all your questions,

[1] https://docs.aws.amazon.com/AmazonS3/latest/API/v2-RESTBucketGET.html

Cordialement / Best regards,

Pierre-Antoine PAGANELLI

Customer Success Specialist Advanced

Scaleway team replicated the same situation on Amazon S3 and got the same error from rclone.

This issue cause a lot of issues for us because this set up is running in production environment...
At this moment I didn't try to interact with the storage directly through the native scaleway's cli and just restored the backup to a new bucket.

This doesn't look production ready,..
Any thoughts ?

If you have a bug opened already on the github, no need to duplicate the conversations, please use the issue for anything related to it.