Inconsistent metadata updates between 2 AWS S3 buckets.
Hello forum,
I am trying to update the metadata of objects in two different S3 buckets. All objects were initially synced between the buckets using the aws cli sync command. As it turns out the aws cli does not sync metadata when objects are transferred with multipart uploads. Therefore objects with smaller sizes had metadata synced whereas larger files are missing metadata.
Rclone to the rescue! When running the trailing rclone command all files are copied again and metadata is correctly included for larger files which were initially missing the metadata.
However smaller files that already had all the metadata from the original sync are also copied again. But in this case the metadata with key Content-Disposition is removed.
Is it possibly to run a copy that will update all files to contain the same metadata as the source without affecting any newly created files in the destination.
Your feedback would be greatly appreciated.
Run the command 'rclone version' and share the full output of the command.
Thank you for your reply, much appreciated. I didn't articulate the issue very well.
I copied the content from one (AWS) S3 bucket to another using the aws cli. This resulted in some of the larger files losing their metadata due to a limitation in the aws cli with multipart uploads.
I am now trying to backfill the missing metadata using the rclone command: rclone -vv --metadata --progress copy aws:original-bucket/ aws:new-bucket/
Larger files which did not have their metadata copied during the initial aws cli sync are being updated correctly.
However smaller files, which did have all their metadata copied during the initial sync, are losing the "Content-Disposition" metadata after the copy.
My question therefore boils down to how can I copy objects from one bucket to another if the source contains metadata that the destination doesn't.
Content-Disposition should be preserved when using rclone copy --metadata from one s3 bucket to another.
I don't really understand this question, because rclone doesn't ever update metadata. It will sync an entire file from the source to the destination but it won't update metadata only at the destination.
So rclone should be copying objects with all the supported metadata from source to dest.
Can you do this on an object in the source and destination to dump the metadata?
In the following scenario the file 014b04ed-f59d-4dec-939d-cdca87a15514.docx was copied from the original-bucket to the new-bucket on December 31 using the aws cli. It contained the content-disposition in the new bucket after it was copied (the bucket has versioning enabled and I can still check the file).
When I ran rclone it logged the following:
2023-04-23 18:52:06 DEBUG : 014b04ed-f59d-4dec-939d-cdca87a15514.docx: Modification times differ by 29184h40m22s: 2019-09-02 03:44:18 +0000 UTC, 2022-12-31 04:24:40 +0000 UTC
In the new version the content-disposition has disappeared in the new bucket and mtime was added:
After a couple of attempts I am also unable to replicate the issue. Even when using the exact same files, and process, that previously resulted in lost metadata. Apologies for sending you on a wild goose chase.
One last question that could be really helpful. Does rclone have an option to copy, and overwrite, files even if they already exist in the destination?