Problems with Content-Disposition and Backblaze B2 (using S3)

What is the problem you are having with rclone?

We are trying to upload a lot of objects from a minio instance to Backblaze B2, including the metadata. Among all the metadata attributes we are storing with the objects, there is the X-Amz-Meta-Content-Disposition one.

It turns out that when that metadata attribute is set, rclone will set the Content-Disposition header with the value of "X-Amz-Meta-Content-Disposition". This is causing problems for us since we have values like this one:

X-Amz-Meta-Content-Disposition : attachment; filename="3 =?utf-8?q?Interaci=C3=B3n=2Epdf=22?=

which will cause this error from the B2 backend:

2022/09/28 20:33:16 ERROR : ddcce9dc0f5d4ab2a7e0ab2900b01dca: Failed to copy: InvalidRequest: invalid b2-content-disposition: hit end of string while looking for closing quote starting from 21
        status code: 400, request id: 5394e15f7ac106ae, host id: aMLNlZWTDY7xkDmJWM9k5SjTpZipmvjhN

I think when using the S3 protocol, setting Content-Disposition on PUT (creating an object) will give the advantage of having that very same "filename=" available automaticall when a GET is requested, but I am not 100% sure of that.

Regardless of the "correctness" of the X-Amz-Meta-Content-Disposition values we may have, we have to go along with what we have stored because we simply cannot change tens of thousands of objects.

Run the command 'rclone version' and share the full output of the command.

rclone v1.59.1
- os/version: centos 7.4.1708 (64 bit)
- os/kernel: 5.11.22-5-pve (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.18.5
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Source is MinIO. Target is Backblaze B2 (configure with type s3 in rclone's config)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

$ rclone --config=/etc/rclone.conf -M -v copy  myminio:attachments/00/aa/ffc334afecd backblaze:attachments/00/aa/ffc334afecd

The rclone config contents with secrets removed.

[backblaze]
type = s3
provider = Other
env_auth = false
access_key_id = huehuehuehue
secret_access_key = secretpass
endpoint = s3.eu-central-003.backblazeb2.com
sse_customer_algorithm=AES256
sse_customer_key=mylongandsecurekey
#sse_customer_key_md5=

A log from the command with the -vv flag

Not sure which part of the log I should paste here, can you please specify?

I don't think rclone can do anything about B2 not accepting Content-Disposition with no closing quote.

So you can delete the metadata:

  • You can use --metadata-set to set the content disposition to something else - maybe empty string --metadata-set content-disposition= - this should work with current rclone
  • Ideally you'd want a --metadata-delete flag which I intend to implement at some point.

You could also attempt to fix the metadata. I could conceive of a filter where rclone pipes metadata through your program which has an opportunity to fix it.

Would setting metadata-set content-disposition=... cause the meta attribute X-Amz-Meta-Content-Disposition to be lost? Because we cannot afford losing any metadata attribute or else our application would not work as expected :frowning:

You are right, rclone is not responsible for what the B2 server (which is probably a legit validation) is going to accept or refuse, at least to some degree. It is also true that rclone can't do anything about our "invalid" string being set in the meta attribute, but the question is: Why is rclone setting Content-Disposition based on the value of X-Amz-Meta-Content-Disposition ?

About fixing the metadata, even if it would be desirable at some point it's not in the scope of the massive copying we're doing right now, it's tens of thousands of files at least.

How would that metadata piping would go, is that something to be implemented in rclone?

Many thanks!

Yes it would. So that isn't a solution.

Ah, I missed that detail.

It is because rclone takes all the metadata from the source object, the X-Amz-Meta-* and the Content-Type, Content-Disposition and puts it into a single object. If you want to see this do

$ rclone lsjson -M --stat s3:rclone/test.txt
{
	"Path": "test.txt",
	"Name": "test.txt",
	"Size": 6,
	"MimeType": "text/plain; charset=utf-8",
	"ModTime": "2022-03-04T09:15:07.446367359Z",
	"IsDir": false,
	"Tier": "STANDARD",
	"Metadata": {
		"btime": "2022-03-04T09:15:09Z",
		"content-type": "text/plain; charset=utf-8",
		"mtime": "2022-03-04T09:15:07.446367359Z"
	}
}

You'll see that the X-Amz-Meta-Content-Disposition becomes just content-disposition in the Metadata object.

When rclone tries to set the metadata it reads the keys from that object, and ones it understands like Content-Type and Content-Disposition it will set as attributes on the object, the others it will set as X-Amz-Meta-*. This makes metadata transfer between different backend types possible.

However it isn't a perfect solution because the "system" metadata and the "user" metadata get put in the same object. This is deliberate so the metadata gets transferred properly between different systems. It does mean though that if you have "user" metadata (X-Amz-Meta-Content-Disposition) which is the same as "system" metadata (Content-Disposition) then things like you are seeing can happen.

Is all the metadata you want to preserve X-Amz-Meta-* or do you want to preserve other things like Content-Type?

It would be easy to make a flag to disable the reading / setting of system metadata on S3. This would mean the X-Amz-Meta-* get transferred only.

I could also rename any X-Amz-Meta-* which have the same names as system metadata, So i could output X-Amx-Meta-Content-Disposition as x-amz-meta-content-disposition, say, which would mean it gets preserved properly.

What do you think? Any other ideas?

OK

Yes, rclone would have to implement the machinery, but then you'd write a program which received Metadata objects on standard input and output transformed versions of them on standard output.

Sorry for the late reply.

No worries!

Ok, this answers one of my questions indeed :slight_smile:

Yes, all the medatadata we want to preserve is within the X-Amz-Meta* .

This sounds like something desireable whether other options are available or not, in my opinion.

So you mean effectively separating system metadata and "user" metadata? The metadata to be transferred must be untouched though I assume this is what you meant here, not sure.

In any case, should I create an issue in Github for this? Referencing this forum thread or so.

Thanks!

Give this a go

v1.60.0-beta.6481.0ed92dcb1.fix-s3-metadata-no-system on branch fix-s3-metadata-no-system (uploaded in 15-30 mins)

I added a flag --s3-no-system-metadata (or config variable no_system_metadata = true) which should fix the problem by not reading or writing the system metadata.

1 Like

I've tested it with a few problematic files and it solved the issue. Many thanks! :smiley:

Thanks for testing. I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.60

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.