When transferring the stream using rclone rcat to AWS S3, with the file size >10 MB (thus multipart-upload is used), the object does not have a hash set. The ETag is something like 7e1d484c8fafe880099947d7b9d9fb82-1 and there is no X-Amz-Meta-Md5chksum metadata tag.
To reproduce:
rclone config:
[backup]
type = s3
provider = aws
env_auth = true
acl = private
region = eu-west-1
However, if we modify the object in any way to create a new version of it, the new ETag will be a correct md5 checksum. We can do it by adding any metadata to the object.
I tried to use hasher backend with the rcat, but it did not change anything:
[hasher]
type = hasher
remote = backup:
hashes = md5
I'm not sure if that's a bug or maybe a feature request.
My use case:
upload a directory bundling it in a single .tar.gz archive on the fly
check if content changed before uploading a new version next time
i can confirm your output and @Animosity022 output.
and might explain why the OP command did not store the hash whereas @Animosity022 command did.
@Animosity022 command uploading instead of streaming
the OP command uploading instead of streaming
imho, odds are, this is not a bug, perhaps a feature that is not currently implemented.
and even if the feature is implemented, still would not work with s3.
with s3, once a file is uploaded, cannot change metadata.
when rclone copy to s3, rclone has to calculate the hash BEFORE upload starts.
and rclone rcat cannot know the hash until the streaming ends.
@Animosity022 sorry, I started this as a feature type, then changed to "suspected bug" and the template didn't load since I already wrote something, I guess.
@asdffdsa I understand. I had an assumption that rclone should always save the hash. In this case it would probably need to calculate it on the fly from the stream and update the object metadata after the upload is completed. Probably not worth it.
Strictly speaking, this COULD be implemented at the cost of longer upload and wasted local disk (thus optional under a flag) by spooling the streamed-in data in a temporary local file together with hash calculation, then updating s3 metadata and uploading upstream.
Cloud storages frequently skip hashsum in case of streaming or multipart uploads etc, for similar reasons (constraining spool space). Thus, it's an unimplemented niche feature, definitely not a bug.
An old version of rclone did do exactly that. However adding a hash to an object after it is uploaded requires server side copying it, which is an expensive operation so I removed it after user complaints!
This should maybe be an optional flag, I'm not sure.
Note that you can set this flag larger
--streaming-upload-cutoff SizeSuffix Cutoff for switching to chunked upload if file size is unknown, upload starts after reaching cutoff or when file ends (default 100Ki)
Rclone will buffer the stream in memory, then upload with a checksum below that limit.
That is not a bad idea so a --streaming-upload-disk-cutoff or something like that.