Prevent HEAD on Amazon S3 family

What is the problem you are having with rclone?

Unecessary use of Amazon S3 HEAD requests when using rclone copy.
I'm pretty sure i've tried every flag combination i could think of to prevent it as per the command example below
There is always 1 HEAD after the PUT to S3. When you have 400m files, this is expensive.
There is also 1 HEAD top level prefix in each job - not fussed about that one it wont cause me spend really. It fails anyway (trying to head a prefix gives 404 head of the bucket itself gives 409 - why have it at all)

What is your rclone version (output from rclone version)

Which OS you are using and how many bits (eg Windows 7, 64 bit)

rclone v1.53.4

  • os/arch: windows/amd64
  • go version: go1.15.6

Which cloud storage system are you using? (eg Google Drive)

Local windows NTFS (Source)
Amazon S3 Standard (Destination)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

my first sync

./rclone copy --dump headers --log-level DEBUG --log-file firstsync.txt --cache-db-purge --transfers 1 --no-check-dest --ignore-checksum --fast-list --auto-confirm --no-traverse --local-no-check-updated --s3-disable-checksum --s3-no-check-bucket C:\testdata s3:REDACTED

my incremental forever sync

rclone copy --max-age 24h --dump headers --log-level DEBUG --log-file incrementalsync.txt --transfers 1 --no-check-dest --ignore-checksum --fast-list --auto-confirm --no-traverse --local-no-check-updated --s3-disable-checksum --s3-no-check-bucket C:\testdata s3:REDACTED

The rclone config contents with secrets removed.

[s3]
type = s3
provider = AWS
env_auth = true
region = ap-southeast-2
location_constraint = ap-southeast-2
server_side_encryption = AES256
storage_class = STANDARD
shared_credentials_file = %USERPROFILE%\.aws\credentials


A log from the command with the -vv flag

2021/01/28 14:50:55 DEBUG : rclone: Version "v1.53.4" starting with parameters ["REDACTED\\rclone.exe" "copy" "--dump" "headers" "--log-level" "DEBUG" "--log-file" "firstsync.txt" "--cache-db-purge" "--transfers" "1" "--no-check-dest" "--ignore-checksum" "--fast-list" "--auto-confirm" "--no-traverse" "--local-no-check-updated" "--s3-disable-checksum" "--s3-no-check-bucket" "C:\\REDACTED" "s3:REDACTED"]
2021/01/28 14:50:55 DEBUG : Creating backend with remote "C:\\testdata"
2021/01/28 14:50:55 DEBUG : Using config file from "REDACTED\\rclone.conf"
2021/01/28 14:50:55 DEBUG : fs cache: renaming cache item "C:\\testdata" to be canonical "//?/C:/testdata"
2021/01/28 14:50:55 DEBUG : Creating backend with remote "s3:REDACTED"
2021/01/28 14:50:55 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2021/01/28 14:50:55 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/01/28 14:50:55 DEBUG : HTTP REQUEST (req 0xc00043f800)
2021/01/28 14:50:55 DEBUG : HEAD /testdata HTTP/1.1
Host: REDACTED
User-Agent: rclone/v1.53.4
Authorization: XXXX
X-Amz-Content-Sha256: REDACTED
X-Amz-Date: 20210128T065055Z

2021/01/28 14:50:55 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/01/28 14:50:55 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/01/28 14:50:55 DEBUG : HTTP RESPONSE (req 0xc00043f800)
2021/01/28 14:50:55 DEBUG : HTTP/1.1 404 Not Found
Transfer-Encoding: chunked
Content-Type: application/xml
Date: Thu, 28 Jan 2021 06:50:55 GMT
Server: AmazonS3
X-Amz-Id-2: REDACTED
X-Amz-Request-Id: REDACTED


2021/01/28 14:50:55 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/01/28 14:50:55 DEBUG : fs cache: renaming cache item "s3:REDACTED" to be canonical "s3:REDACTED"
2021/01/28 14:50:55 DEBUG : S3 bucket Host: REDACTED-datasync path testdata: Waiting for checks to finish
2021/01/28 14:50:55 DEBUG : S3 bucket Host: REDACTED-datasync path testdata: Waiting for transfers to finish
2021/01/28 14:50:55 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/01/28 14:50:55 DEBUG : HTTP REQUEST (req 0xc00043fd00)
2021/01/28 14:50:55 DEBUG : PUT /testdata/testfile1?REDACTED
Host: REDACTED
User-Agent: rclone/v1.53.4
Content-Length: 104857
content-md5: REDACTED
content-type: application/octet-stream
x-amz-acl: private
x-amz-meta-mtime: 1611816654.4557248
x-amz-server-side-encryption: AES256
x-amz-storage-class: STANDARD
Accept-Encoding: gzip

2021/01/28 14:50:55 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/01/28 14:50:56 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/01/28 14:50:56 DEBUG : HTTP RESPONSE (req 0xc00043fd00)
2021/01/28 14:50:56 DEBUG : HTTP/1.1 200 OK
Content-Length: 0
Date: Thu, 28 Jan 2021 06:50:57 GMT
Etag: REDACTED
Server: AmazonS3
X-Amz-Id-2: REDACTED
X-Amz-Request-Id: REDACTED
X-Amz-Server-Side-Encryption: AES256

2021/01/28 14:50:56 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/01/28 14:50:56 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/01/28 14:50:56 DEBUG : HTTP REQUEST (req 0xc000136c00)
2021/01/28 14:50:56 DEBUG : HEAD /testdata/testfile1 HTTP/1.1
Host: REDACTED
User-Agent: rclone/v1.53.4
Authorization: XXXX
X-Amz-Content-Sha256: REDACTED
X-Amz-Date: 20210128T065056Z

2021/01/28 14:50:56 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/01/28 14:50:56 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/01/28 14:50:56 DEBUG : HTTP RESPONSE (req 0xc000136c00)
2021/01/28 14:50:56 DEBUG : HTTP/1.1 200 OK
Content-Length: 104857
Accept-Ranges: bytes
Content-Type: application/octet-stream
Date: Thu, 28 Jan 2021 06:50:57 GMT
Etag: REDACTED
Last-Modified: Thu, 28 Jan 2021 06:50:57 GMT
Server: AmazonS3
X-Amz-Id-2: REDACTED
X-Amz-Meta-Mtime: 1611816654.4557248
X-Amz-Request-Id: REDACTED
X-Amz-Server-Side-Encryption: AES256
X-Amz-Version-Id: null

2021/01/28 14:50:56 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/01/28 14:50:56 INFO  : testfile1: Copied (new)
2021/01/28 14:50:56 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/01/28 14:50:56 DEBUG : HTTP REQUEST (req 0xc00066b400)
2021/01/28 14:50:56 DEBUG : PUT /testdata/sub1/testfile2?REDACTED
Host: REDACTED
User-Agent: rclone/v1.53.4
Content-Length: 104857
content-md5: REDACTED
content-type: application/octet-stream
x-amz-acl: private
x-amz-meta-mtime: 1611816654.4867294
x-amz-server-side-encryption: AES256
x-amz-storage-class: STANDARD
Accept-Encoding: gzip

2021/01/28 14:50:56 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/01/28 14:50:56 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/01/28 14:50:56 DEBUG : HTTP RESPONSE (req 0xc00066b400)
2021/01/28 14:50:56 DEBUG : HTTP/1.1 200 OK
Content-Length: 0
Date: Thu, 28 Jan 2021 06:50:57 GMT
Etag: REDACTED
Server: AmazonS3
X-Amz-Id-2: REDACTED
X-Amz-Request-Id: REDACTED
X-Amz-Server-Side-Encryption: AES256

2021/01/28 14:50:56 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/01/28 14:50:56 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/01/28 14:50:56 DEBUG : HTTP REQUEST (req 0xc00066ba00)
2021/01/28 14:50:56 DEBUG : HEAD /testdata/sub1/testfile2 HTTP/1.1
Host: REDACTED
User-Agent: rclone/v1.53.4
Authorization: XXXX
X-Amz-Content-Sha256: REDACTED
X-Amz-Date: 20210128T065056Z

2021/01/28 14:50:56 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/01/28 14:50:56 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/01/28 14:50:56 DEBUG : HTTP RESPONSE (req 0xc00066ba00)
2021/01/28 14:50:56 DEBUG : HTTP/1.1 200 OK
Content-Length: 104857
Accept-Ranges: bytes
Content-Type: application/octet-stream
Date: Thu, 28 Jan 2021 06:50:57 GMT
Etag: REDACTED
Last-Modified: Thu, 28 Jan 2021 06:50:57 GMT
Server: AmazonS3
X-Amz-Id-2: REDACTED
X-Amz-Meta-Mtime: 1611816654.4867294
X-Amz-Request-Id: REDACTED
X-Amz-Server-Side-Encryption: AES256
X-Amz-Version-Id: null

2021/01/28 14:50:56 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/01/28 14:50:56 INFO  : sub1/testfile2: Copied (new)
2021/01/28 14:50:56 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/01/28 14:50:56 DEBUG : HTTP REQUEST (req 0xc000137400)
2021/01/28 14:50:56 DEBUG : PUT /testdata/sub1/sub11/testfile3?REDACTED
Host: REDACTED
User-Agent: rclone/v1.53.4
Content-Length: 104857
content-md5: REDACTED
content-type: REDACTED
x-amz-acl: private
x-amz-meta-mtime: 1611816654.5167276
x-amz-server-side-encryption: AES256
x-amz-storage-class: STANDARD
Accept-Encoding: gzip

2021/01/28 14:50:56 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/01/28 14:50:56 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/01/28 14:50:56 DEBUG : HTTP RESPONSE (req 0xc000137400)
2021/01/28 14:50:56 DEBUG : HTTP/1.1 200 OK
Content-Length: 0
Date: Thu, 28 Jan 2021 06:50:57 GMT
Etag: REDACTED
Server: AmazonS3
X-Amz-Id-2: REDACTED
X-Amz-Request-Id: REDACTED
X-Amz-Server-Side-Encryption: AES256

2021/01/28 14:50:56 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/01/28 14:50:56 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/01/28 14:50:56 DEBUG : HTTP REQUEST (req 0xc00022be00)
2021/01/28 14:50:56 DEBUG : HEAD /testdata/sub1/sub11/testfile3 HTTP/1.1
Host: REDACTED
User-Agent: rclone/v1.53.4
Authorization: XXXX
X-Amz-Content-Sha256: REDACTED
X-Amz-Date: 20210128T065056Z

2021/01/28 14:50:56 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/01/28 14:50:56 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/01/28 14:50:56 DEBUG : HTTP RESPONSE (req 0xc00022be00)
2021/01/28 14:50:56 DEBUG : HTTP/1.1 200 OK
Content-Length: 104857
Accept-Ranges: bytes
Content-Type: application/octet-stream
Date: Thu, 28 Jan 2021 06:50:57 GMT
Etag: REDACTED
Last-Modified: Thu, 28 Jan 2021 06:50:57 GMT
Server: AmazonS3
X-Amz-Id-2: REDACTED
X-Amz-Meta-Mtime: 1611816654.5167276
X-Amz-Request-Id: REDACTED
X-Amz-Server-Side-Encryption: AES256
X-Amz-Version-Id: null

2021/01/28 14:50:56 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/01/28 14:50:56 INFO  : sub1/sub11/testfile3: Copied (new)
2021/01/28 14:50:56 INFO  : 
Transferred:   	  307.198k / 307.198 kBytes, 100%, 409.954 kBytes/s, ETA 0s
Transferred:            3 / 3, 100%
Elapsed time:         1.8s

2021/01/28 14:50:56 DEBUG : 4 go routines active

hello and welcome to the forum,

--cache-db-purge does nothing on aws s3

if you want forever forward incremental backups, you can use
https://rclone.org/docs/#backup-dir-dir with a timestamp for the archive dir
for example,
rclone sync c:\data remote:data/backup --backup-dir=remote:data/archive/20210117.085703

This is rclone checking the file got uploaded OK...

There isn't a way of stopping this at the moment as it is done by the backend here

There is some argument to say rclone should be filling up the metadata from the response of the PUT rather than doing a separate HEAD requests (some of the other backends do that).

However the size of the upload doesn't come back in the response from PUT and that is kind of important in ensuring data integrity. We could assume that it is what size we set, but that isn't really ensuring the integrity of the data.

I guess we could put in an option which would do this. It would

  • read the Etag from the response of HEAD
  • read the Date header and use that as uploaded time
  • assume that the size was whatever we uploaded
  • assume that the metadata was whatever we uploaded

What is bothering you the most - the time of the extra transaction, or the cost of it? It is my understanding that HEAD requests are reasonably cheap.

Hi Nick, thanks for getting back to me.

Yes the cost is important for me. In my scenario (tape replacement kind of thing) if HEAD or LIST is used then my storage on S3 Deep Archive would be around $300/month, head would be $5000/month and LIST would be $58000/month (and therefore no longer a tape replacement candidate). To clarify, I didn't see any LIST which was great and already a win over some other utilities i have tested.

Therefore a new flag to not check integrity doesn't put me in a worse position than using tape drives today. If you get a 200 OK response from a PutObject, that is enough for me.

So i guess we are in new flag territory then? What do you think?

It would be super cool if AWS would include the date & content-length in the PutObject response too.

What would happen if if did my own build and set the response of err = 'Skipped meta data check' in the code you quoted as a dodgy workaround? Would other parts of the rclone expect to see a HEAD response and get upset?

The --no-check-dest is removing the LISTs for you.

However you'll still be charged for the PUTs won't you? According to https://aws.amazon.com/s3/pricing/ PUT requests cost $0.005 per 1000 whereas HEAD requests cost $0.0004 so avoiding the HEAD request saves $0.0004 out of $0.005 + $0.0004 which is about 7%. So not nothing but not a massive saving.

I had a go at adding the new flag here. I think the minimum set of flags for uploading with one transactions per file becomes --s3-no-head --no-check-dest --s3-no-check-bucket

v1.54.0-beta.5112.a20c1df9d.fix-s3-no-head on branch fix-s3-no-head (uploaded in 15-30 mins)

Here is the help for it

If set, don't HEAD uploaded objects to check integrity

This can be useful when trying to minimise the number of transactions
rclone does.

Setting it means that if rclone receives a 200 OK message after
uploading an object with PUT then it will assume that it got uploaded
properly.

In particular it will assume:

  • the metadata, including modtime, storage class and content type was as uploaded
  • the size was as uploaded

It read these items from the response for a single part PUT - for
multipart uploads these aren't read:

  • the MD5SUM
  • The uploaded date

If an object of unknown length is uploaded then rclone will do a
HEAD request.

Setting this flag increases the chance for undetected upload failures,
in particular an incorrect size, so it isn't recommended for normal
operation. In practice the chance of an undetected upload failure is
very small even with this flag.

This is rclone trying to work out whether the destination you've supplied points to an object or not.

I plan at some point to allow users to put a trailing / on the destination and rclone to take it as gospel that it points to a directory.

This works well, thanks Nick! I now only see 1 head (at the bucket prefix specified) instead of per object.

Thanks for testing. I've merged this to master now which means it will be in the latest beta in 15-30 mins and released in v1.54

1 Like

Legendary! I have asked AWS for a feature request for content-length in the response of PutObject as well. That way we could have the best of both.

Yes, that would be perfect :slight_smile:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.