Copying AWS S3 metadata

What is the problem you are having with rclone?

Inconsistent metadata updates between 2 AWS S3 buckets.

Hello forum,

I am trying to update the metadata of objects in two different S3 buckets. All objects were initially synced between the buckets using the aws cli sync command. As it turns out the aws cli does not sync metadata when objects are transferred with multipart uploads. Therefore objects with smaller sizes had metadata synced whereas larger files are missing metadata.

Rclone to the rescue! When running the trailing rclone command all files are copied again and metadata is correctly included for larger files which were initially missing the metadata.

However smaller files that already had all the metadata from the original sync are also copied again. But in this case the metadata with key Content-Disposition is removed.

Is it possibly to run a copy that will update all files to contain the same metadata as the source without affecting any newly created files in the destination.

Your feedback would be greatly appreciated.

Run the command 'rclone version' and share the full output of the command.

rclone v1.62.2
- os/version: amazon 2 (64 bit)
- os/kernel: 4.14.301-224.520.amzn2.x86_64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.20.2
- go/linking: static- go/tags: none

Which cloud storage system are you using? (eg Google Drive)


The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone --metadata --progress copy aws:source-bucket/ aws:destination-bucket

The rclone config contents with secrets removed.

type = s3
provider = AWS
env_auth = true
region = us-west-2
location_constraint = us-west-2
server_side_encryption = AES256

That is odd - rclone should be preserving the Content-Disposition metadata.

Maybe this flag?

  --ignore-existing   Skip all files that exist on destination

Not 100% sure what you are trying to achieve.

Thank you for your reply, much appreciated. I didn't articulate the issue very well.

I copied the content from one (AWS) S3 bucket to another using the aws cli. This resulted in some of the larger files losing their metadata due to a limitation in the aws cli with multipart uploads.

I am now trying to backfill the missing metadata using the rclone command: rclone -vv --metadata --progress copy aws:original-bucket/ aws:new-bucket/

Larger files which did not have their metadata copied during the initial aws cli sync are being updated correctly.

However smaller files, which did have all their metadata copied during the initial sync, are losing the "Content-Disposition" metadata after the copy.

My question therefore boils down to how can I copy objects from one bucket to another if the source contains metadata that the destination doesn't.

Content-Disposition should be preserved when using rclone copy --metadata from one s3 bucket to another.

I don't really understand this question, because rclone doesn't ever update metadata. It will sync an entire file from the source to the destination but it won't update metadata only at the destination.

So rclone should be copying objects with all the supported metadata from source to dest.

Can you do this on an object in the source and destination to dump the metadata?

$ rclone lsjson --stat --metadata s3:rclone/test.txt
	"Path": "test.txt",
	"Name": "test.txt",
	"Size": 6,
	"MimeType": "text/plain; charset=utf-8",
	"ModTime": "2022-10-11T17:53:10.286745272+01:00",
	"IsDir": false,
	"Tier": "STANDARD",
	"Metadata": {
		"btime": "2022-10-11T16:53:11Z",
		"content-type": "text/plain; charset=utf-8",
		"mtime": "2022-10-11T17:53:10.286745272+01:00",
		"test": "=?UTF-8?B?w4PChE3Dg8KEWsODwpXDg8KRIFMz?="

In the following scenario the file 014b04ed-f59d-4dec-939d-cdca87a15514.docx was copied from the original-bucket to the new-bucket on December 31 using the aws cli. It contained the content-disposition in the new bucket after it was copied (the bucket has versioning enabled and I can still check the file).

When I ran rclone it logged the following:

2023-04-23 18:52:06 DEBUG : 014b04ed-f59d-4dec-939d-cdca87a15514.docx: Modification times differ by 29184h40m22s: 2019-09-02 03:44:18 +0000 UTC, 2022-12-31 04:24:40 +0000 UTC

In the new version the content-disposition has disappeared in the new bucket and mtime was added:

# rclone lsjson --stat --metadata aws:original-bucket/014b04ed-f59d-4dec-939d-cdca87a15514.docx
        "Path": "014b04ed-f59d-4dec-939d-cdca87a15514.docx",
        "Name": "014b04ed-f59d-4dec-939d-cdca87a15514.docx",
        "Size": 11535,
        "MimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
        "ModTime": "2019-09-02T03:44:18.000000000Z",
        "IsDir": false,
        "Tier": "STANDARD",
        "Metadata": {
                "btime": "2019-09-02T03:44:18Z",
                "content-disposition": "filename=\"test2.docx\"",
                "content-type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"

# rclone lsjson --stat --metadata aws:new-bucket/014b04ed-f59d-4dec-939d-cdca87a15514.docx
        "Path": "014b04ed-f59d-4dec-939d-cdca87a15514.docx",
        "Name": "014b04ed-f59d-4dec-939d-cdca87a15514.docx",
        "Size": 11535,
        "MimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
        "ModTime": "2019-09-02T13:44:18.000000000+10:00",
        "IsDir": false,
        "Tier": "STANDARD",
        "Metadata": {
                "btime": "2023-04-23T08:52:07Z",
                "content-type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
                "mtime": "2019-09-02T13:44:18+10:00"

I tried to replicate this like this, but it worked just fine - the content-disposition metadata was server side copied just fine.

echo with metadata > /tmp/with_metadata.txt
rclone copy -vv --dump bodies --metadata --metadata-set 'content-disposition=filename="test2.docx"'  /tmp/with_metadata.txt s3:rclone/
rclone lsjson --stat --metadata s3:rclone/with_metadata.txt
rclone copyto --metadata s3:rclone/with_metadata.txt s3:rclone/with_metadata-2.txt -vv --dump bodies
rclone lsjson --stat --metadata s3:rclone/with_metadata-2.txt

Can you make a series of commands like the above which replicates the problem for me that I can try here?

After a couple of attempts I am also unable to replicate the issue. Even when using the exact same files, and process, that previously resulted in lost metadata. Apologies for sending you on a wild goose chase.

One last question that could be really helpful. Does rclone have an option to copy, and overwrite, files even if they already exist in the destination?
"files are always transferred"

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.