S3: ERROR corrupted on transfer: sizes differ NNN vs MMM

What is the problem you are having with rclone?

ERROR : test.json: corrupted on transfer: sizes differ 898 vs 283

I am setting up a backup of Cloudflare R2 (S3 compatible) to Google Drive, and I noticed right away that json files were not transferring and unexpectedly failing.

I saw that there was a similar ticket here Error about corrupted transfer and that there's a new feature in 1.60 which I just updated to https://forum.rclone.org/t/rclone-1-60-0-release/33646:

Add --s3-decompress flag to decompress gzip-encoded files (Nick Craig-Wood)

With this flag, the JSON files are now transferring, so it looks like it's a necessary solution for S3 sources, at least for Cloudflare R2. I'm posting this to
a) validate that the flag was the right thing to enable in my case
b) ponder whether perhaps it should be defaulted to on, or if rclone can autodetect when it needs to be enabled and do so without having to do it manually
c) wonder if this ticket would have been the correct solution to Error about corrupted transfer as well.

Run the command 'rclone version' and share the full output of the command.

rclone version
rclone v1.60.0
- os/version: opensuse-leap 15.3 (64 bit)
- os/kernel: 5.19.2-x86_64-linode156 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.19.2
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Cloudflare R2 (S3 compatible) -> Google Drive.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync

The rclone config contents with secrets removed.

[Drive]
type = drive
client_id = 
client_secret = 
scope = drive
token = root_folder_id = 
team_drive = 

[Cloudflare R2]
type = s3
provider = Other
env_auth = false
access_key_id = 
secret_access_key = 
endpoint = 
acl = private
bucket_acl = private
upload_cutoff = 5G
force_path_style = false

A log from the command with the -vv flag

2022-10-24 16:13:04 NOTICE: test.json: Not decompressing 'Content-Encoding: gzip' compressed file. Use --s3-decompress to override
2022-10-24 16:13:06 ERROR : test.json: corrupted on transfer: sizes differ 898 vs 283
2022-10-24 16:13:06 INFO  : test.json: Removing failed copy

hi

perhaps that should be provider = Cloudflare

i could be wrong but why use force_path_style = false?
as that would imply to use virtual hosted style

and from the source code
case "Cloudflare": virtualHostStyle = false

Ah, I set it up before upgrading rclone to the latest version, and Cloudflare wasn't there yet. I've just updated the config to all defaults, including removing force_path_style:

[Cloudflare R2]
type = s3
provider = Cloudflare
access_key_id = 
secret_access_key = 
endpoint = 

but I still have to use --s3-decompress.

It looks like your json files are uploaded with Content-Encoding: gzip does that seem likely?

You can check with

rclone lsjson -M --stat remote:path/to/test.json

Cloudflare R2 deviates slightly from the S3 standard in that it auto decompresses Content-Encoding: gzip files - that is where this error comes from

Adding --s3-decompress makes rclone expect decompressed files.

I guess it is possible to make a workaround in rclone for this provided provider is set correctly - that would probably be a good idea.

Forcing --s3-decompress if provider == Cloudflare maybe?

This would be an easy patch to make, but I'm unable to find the official Cloudflare documentation on this - I don't know if Cloudflare implement Cache-Control: no-transform or not.

I have the same error as described from archon810.
This error was introduced with the release of version 1.60.
We are running a pipeline in AzureDevOps to create a backup of the files.

That is the error from each file:

<5>NOTICE: xyz: Not decompressing 'Content-Encoding: gzip' compressed file. Use --s3-decompress to override
<3>ERROR : xyz: Failed to copy: RequestError: send request failed

I used the flag --s3-decompress and the error went away.

What changed in the new release for such a different behavior?

Where are you coping from and to? Do your files use Content-Encoding gzip?

Here's the result of the rclone lsjson command:

{
        "Path": "test2.json",
        "Name": "test2.json",
        "Size": 6717,
        "MimeType": "application/json",
        "ModTime": "2022-10-18T20:38:28.000000000Z",
        "IsDir": false,
        "Tier": "STANDARD",
        "Metadata": {
                "btime": "2022-10-18T20:38:28Z",
                "content-type": "application/json"
        }
}

I'm copying the files from STACKIT.cloud to STACKIT: Cloud & Colocation – EU secure, straightforward and custom-made

Maybe JSON files are being compressed by a proxy...

Can you both @archon810 @sammetb run something like this on a failing JSON file without --s3-decompress and post or attach the result

rclone copy --retries 1 --low-level-retries 1 -vv --dump bodies remote:path/to/test.json /tmp

This may have the contents of the JSON file in which you can remove if you want (I don't need to see it).

I did some tests before using the --s3-decompress parameter, but without remote:path/to/test.json /tmp

Would that test be enough?

If you could do the test above that would be most useful

@ncw I just did your test to a small JSON file which downloaded OK, and then to a slightly larger file that showed the gzip issue.

Logs:
File without the issue: 2022-10-28_12-26-50.txt · GitHub
File with the issue: 2022-10-28_12-25-48.txt · GitHub

Hope this helps.

Thank you for that @archon810 , that is very helpful.

Here is the problem.

First rclone HEADs the file to make sure it exists

2022/10/28 12:23:22 DEBUG : HTTP REQUEST (req 0xc000bf2200)
2022/10/28 12:23:22 DEBUG : HEAD /<SNIP>/test.json HTTP/1.1
Host: <SNIP>.r2.cloudflarestorage.com
User-Agent: rclone/v1.60.0
Authorization: XXXX
X-Amz-Content-Sha256: <SNIP>
X-Amz-Date: 20221028T192322Z

And the response is

2022/10/28 12:23:22 DEBUG : HTTP RESPONSE (req 0xc000bf2200)
2022/10/28 12:23:22 DEBUG : HTTP/1.1 200 OK
Content-Length: 130
Accept-Ranges: bytes
Cf-Ray: <SNIP>
Connection: keep-alive
Content-Type: application/json
Date: Fri, 28 Oct 2022 19:23:22 GMT
Etag: "<SNIP>"
Last-Modified: Fri, 28 Oct 2022 19:23:05 GMT
Server: cloudflare

Note that this does not have a Content-Encoding: gzip. The length is declared to be 130 bytes here.

However when rclone comes to download the file

2022/10/28 12:23:22 DEBUG : HTTP REQUEST (req 0xc000067900)
2022/10/28 12:23:22 DEBUG : GET /<SNIP>/test.json HTTP/1.1
Host: <SNIP>.r2.cloudflarestorage.com
User-Agent: rclone/v1.60.0
Accept-Encoding: gzip
Authorization: XXXX
X-Amz-Content-Sha256: <SNIP>
X-Amz-Date: 20221028T192322Z

Cloudflare sends it with Content-Encoding: gzip

2022/10/28 12:23:22 DEBUG : HTTP RESPONSE (req 0xc000067900)
2022/10/28 12:23:22 DEBUG : HTTP/1.1 200 OK
Transfer-Encoding: chunked
Cf-Ray: <SNIP>
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json
Date: Fri, 28 Oct 2022 19:23:22 GMT
Etag: W/"<SNIP>"
Last-Modified: Fri, 28 Oct 2022 19:23:05 GMT
Server: cloudflare
Vary: Accept-Encoding

So it looks like Cloudflare is auto compressing those JSON files - AWS would not do this - it would send Content-Encoding: gzip in the HEAD request if it was going to send gzipped stuff in the GET request.

I've tried to fix this with a provider quirk - can you give this a go?

v1.61.0-beta.6506.b615cbeaa.fix-r2-gzip on branch fix-r2-gzip (uploaded in 15-30 mins)

This should be set automatically for provider = Cloudflare but you can set it manually with --s3-might-gzip=true which might help you @sammetb

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.