Transfer percentages >100%? Again

This is something I've been meaning to fix for a while!

If you try this, you'll find the objects will transfer correctly. Rclone by default transfers them compressed (as in unaltered from how they are stored on cloud storage) - if you want them decompressed use --s3-decompress.

The fact your objects end with .gz and have a Content-Encoding: gzip header is puzzling. I would expect these files not to have Content-Encoding: gzip as they are intended to be downloaded gzipped most likely. I guess they could be double gzipped in which case you will need the --s3-decompress flag.

v1.60.0-beta.6387.bf43d3f59.fix-s3-versions on branch fix-s3-versions (uploaded in 15-30 mins)

Yes. Rclone is highly concurrent and anything can happen! If you want a defined order use --check-first and --order-by something

The thing is, they aren't "double gzipped". These files all would have been uploaded, either with the AWS cli, or the boto3 python library, using the default options. So I think it is reasonable to assume that there are a lot of .gz files in S3 out there that have these headers set. Would something like a --no-s3-decompress flag make sense here, to instruct rclone to ignore the Content-Encoding and always download the content verbatim?

It would be interesting if you could find out which tool did the uploading. I'll test it and report a bug!

This is the default if you don't supply the --s3-decompress flag in the beta above.

Are you sure? Based on the output, it really looks like it is decompressing the gzipped content. Remember the .gz file is only 150k, and:

It sure looks like we are decompressing the content based on it's encoding by default...

The --s3-decompress flag is allowing my sync to proceed, thank you!

That said, I still suggest that we have an option like --no-s3-decode to prevent rclone from obeying the Content-Encoding: header.

I am getting my files out, but I am going to end up with a bunch of files that end with .gz that are not in fact gzipped.

If you are getting this then its a bug. Not using --s3-decompress should copy the files as-is and not decompress them. Are you 100% sure you used the beta for this test?

Here is my test - I created a 10M file of compressible data, gzipped it and uploaded it with Content-Encoding: gzip - I think this is the same as your file.

$ rclone lsjson --stat -M s3:rclone-gzip-encoding/10M-file.gz
{
	"Path": "10M-file.gz",
	"Name": "10M-file.gz",
	"Size": 66441,
	"MimeType": "application/gzip",
	"ModTime": "2022-07-29T15:15:31.335789869+01:00",
	"IsDir": false,
	"Tier": "STANDARD",
	"Metadata": {
		"atime": "2022-07-29T15:15:31.283790045+01:00",
		"btime": "2022-07-29T14:17:20Z",
		"content-encoding": "gzip",
		"content-type": "application/gzip",
		"gid": "1000",
		"mode": "100664",
		"mtime": "2022-07-29T15:15:31.335789869+01:00",
		"uid": "1000"
	}
}

$ aws s3api head-object --bucket rclone-gzip-encoding --key 10M-file.gz
{
    "AcceptRanges": "bytes",
    "LastModified": "2022-07-29T14:17:20+00:00",
    "ContentLength": 66441,
    "ETag": "\"692d987dc7254e4deb254c5364f103d3\"",
    "ContentEncoding": "gzip",
    "ContentType": "application/gzip",
    "Metadata": {
        "mode": "100664",
        "gid": "1000",
        "uid": "1000",
        "atime": "2022-07-29T15:15:31.283790045+01:00",
        "btime": "2022-07-29T15:15:31.283790045+01:00",
        "mtime": "1659104131.335789869"
    }
}

I then retrieved it without --s3-decompress - note the NOTICE saying that rclone isn't going to decompress it.

$ rclone copy -vv s3:rclone-gzip-encoding/10M-file.gz /tmp/
2022/08/01 16:45:02 DEBUG : rclone: Version "v1.60.0-beta.6387.bf43d3f59.fix-s3-versions" starting with parameters ["rclone" "copy" "-vv" "s3:rclone-gzip-encoding/10M-file.gz" "/tmp/"]
2022/08/01 16:45:02 DEBUG : Creating backend with remote "s3:rclone-gzip-encoding/10M-file.gz"
2022/08/01 16:45:02 DEBUG : Using config file from "/home/ncw/.rclone.conf"
2022/08/01 16:45:02 DEBUG : fs cache: adding new entry for parent of "s3:rclone-gzip-encoding/10M-file.gz", "s3:rclone-gzip-encoding"
2022/08/01 16:45:02 DEBUG : Creating backend with remote "/tmp/"
2022/08/01 16:45:02 DEBUG : 10M-file.gz: Need to transfer - File not found at Destination
2022/08/01 16:45:02 NOTICE: 10M-file.gz: Not decompressing 'Content-Encoding: gzip' compressed file. Use --s3-decompress to override
2022/08/01 16:45:02 DEBUG : 10M-file.gz: md5 = 692d987dc7254e4deb254c5364f103d3 OK
2022/08/01 16:45:02 INFO  : 10M-file.gz: Copied (new)
2022/08/01 16:45:02 INFO  : 
Transferred:   	   64.884 KiB / 64.884 KiB, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         0.4s

2022/08/01 16:45:02 DEBUG : 5 go routines active

$ ls -l /tmp/10M-file.gz 
-rw-rw-r-- 1 ncw ncw 66441 Jul 29 15:15 /tmp/10M-file.gz

I then retrieved it with --s3-decompress - now the file is 10M but we can't check the checksum.

$ rm /tmp/10M-file.gz 
$ rclone copy -vv --s3-decompress s3:rclone-gzip-encoding/10M-file.gz /tmp/
2022/08/01 16:45:50 DEBUG : rclone: Version "v1.60.0-beta.6387.bf43d3f59.fix-s3-versions" starting with parameters ["rclone" "copy" "-vv" "--s3-decompress" "s3:rclone-gzip-encoding/10M-file.gz" "/tmp/"]
2022/08/01 16:45:50 DEBUG : Creating backend with remote "s3:rclone-gzip-encoding/10M-file.gz"
2022/08/01 16:45:50 DEBUG : Using config file from "/home/ncw/.rclone.conf"
2022/08/01 16:45:50 DEBUG : s3: detected overridden config - adding "{JT4Z6}" suffix to name
2022/08/01 16:45:51 DEBUG : fs cache: adding new entry for parent of "s3:rclone-gzip-encoding/10M-file.gz", "s3{JT4Z6}:rclone-gzip-encoding"
2022/08/01 16:45:51 DEBUG : Creating backend with remote "/tmp/"
2022/08/01 16:45:51 DEBUG : 10M-file.gz: Need to transfer - File not found at Destination
2022/08/01 16:45:51 DEBUG : 10M-file.gz: md5 = 6be7ac6047eec4b8e652751a7d2bcacc OK
2022/08/01 16:45:51 DEBUG : 10M-file.gz: Size and md5 of src and dst objects identical
2022/08/01 16:45:51 DEBUG : 10M-file.gz: Src hash empty - aborting Dst hash check
2022/08/01 16:45:51 INFO  : 10M-file.gz: Copied (Rcat, new)
2022/08/01 16:45:51 INFO  : 
Transferred:   	       10 MiB / 10 MiB, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         0.5s

2022/08/01 16:45:51 DEBUG : 5 go routines active

$ ls -l /tmp/10M-file.gz 
-rw-rw-r-- 1 ncw ncw 10485760 Jul 29 15:15 /tmp/10M-file.gz

How does the same test fare when you try it on one of your .gz objects?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.