--s3-profile failing when explicit s3 endpoint is present

The use case is a copy from a local (linux) filesystem to AWS s3 using both:

  • --s3-profile
  • An s3 endpoint in the rsync configuration

Version is:

rclone v1.61.1
- os/version: centos 7.9.2009 (64 bit)
- os/kernel: 3.10.0-1160.83.1.el7.x86_64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.19.4
- go/linking: static
- go/tags: none

Here's the command, and the sad result:

$ ~/bin/rclone-v1.61.1-linux-amd64/rclone copy --s3-acl "" --s3-profile MYNAMEDPROFILE 1gb-accountingtest.tz s3-standard:TARGETBUCKET/standard/x3 -v
2023/02/09 12:05:23 ERROR : Attempt 1/3 failed with 1 errors and: SerializationError: failed to unmarshal error message
        status code: 400, request id: 
caused by: UnmarshalError: failed to unmarshal error message
        00000000  3c 3f 78 6d 6c 20 76 65  72 73 69 6f 6e 3d 22 31  |<?xml version="1|
00000010  2e 30 22 20 65 6e 63 6f  64 69 6e 67 3d 22 55 54  |.0" encoding="UT|
00000020  46 2d 38 22 3f 3e 0a 3c  45 72 72 6f 72 3e 3c 43  |F-8"?>.<Error><C|
00000030  6f 64 65 3e 49 6e 76 61  6c 69 64 52 65 71 75 65  |ode>InvalidReque|
00000040  73 74 3c 2f 43 6f 64 65  3e 3c 4d 65 73 73 61 67  |st</Code><Messag|
00000050  65 3e 4d 69 73 73 69 6e  67 20 72 65 71 75 69 72  |e>Missing requir|
00000060  65 64 20 68 65 61 64 65  72 20 66 6f 72 20 74 68  |ed header for th|
00000070  69 73 20 72 65 71 75 65  73 74 3a 20 78 2d 61 6d  |is request: x-am|
00000080  7a 2d 63 6f 6e 74 65 6e  74 2d 73 68 61 32 35 36  |z-content-sha256|
00000090  3c 2f 4d 65 73 73 61 67  65 3e 3c 52 65 71 75 65  |</Message><Reque|
000000a0  73 74 49 64 3e 32 45 46  57 41 31 41 4e 57 43 48  |stId>2EFWA1ANWCH|
000000b0  54 31 44 46 53 3c 2f 52  65 71 75 65 73 74 49 64  |T1DFS</RequestId|
000000c0  3e 3c 48 6f 73 74 49 64  3e 6d 77 38 78 32 6e 78  |><HostId>mw8x2nx|
000000d0  65 32 33 35 64 32 4f 6a  2f 57 65 77 37 33 43 6b  |e235d2Oj/Wew73Ck|
000000e0  48 6e 55 6a 71 36 48 6b  69 47 38 74 55 31 6e 66  |HnUjq6HkiG8tU1nf|
000000f0  45 67 63 55 76 77 45 58  64 4b 52 2b 49 6b 56 79  |EgcUvwEXdKR+IkVy|
00000100  73 4d 74 4c 77 50 67 52  6e 76 65 55 39 76 68 74  |sMtLwPgRnveU9vht|
00000110  6f 47 31 77 3d 3c 2f 48  6f 73 74 49 64 3e 3c 2f  |oG1w=</HostId></|
00000120  45 72 72 6f 72 3e                                 |Error>|

caused by: unknown error response tag, {{ Error} []}
2023/02/09 12:05:33 INFO  : 
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:      1m37.3s

Here's the config, with the endpoint redacted. Note that the bucket in the endpoint is actually correct AWS syntax

[s3-standard]
type = s3
provider = AWS
env_auth = true
region = us-east-1
endpoint = https://bucket.vpce-REDACTED.s3.us-east-1.vpce.amazonaws.com
storage_class = STANDARD

A log from the command with the -vv flag

$ ~/bin/rclone-v1.61.1-linux-amd64/rclone copy --s3-acl "" --s3-profile MYNAMEDPROFILE 1gb-accountingtest.tz s3-standard:TARGETBUCKET/standard/x3/ -vv
2023/02/09 12:24:48 DEBUG : rclone: Version "v1.61.1" starting with parameters ["STUFF/bin/rclone-v1.61.1-linux-amd64/rclone" "copy" "--s3-acl" "" "--s3-profile" "MYNAMEDPROFILE" "1gb-accountingtest.tz" "s3-standard:TARGETBUCKET/standard/x3/" "-vv"]
2023/02/09 12:24:48 DEBUG : Creating backend with remote "1gb-accountingtest.tz"
2023/02/09 12:24:48 DEBUG : Using config file from "/STUFF/.config/rclone/rclone.conf"
2023/02/09 12:24:48 DEBUG : fs cache: adding new entry for parent of "1gb-accountingtest.tz", "STUFF/drtesting"
2023/02/09 12:24:48 DEBUG : Creating backend with remote "s3-standard:TARGETBUCKET/standard/x3"
2023/02/09 12:24:48 DEBUG : s3-standard: detected overridden config - adding "{VJUoK}" suffix to name
2023/02/09 12:25:39 DEBUG : fs cache: renaming cache item "s3-standard:TARGETBUCKET/standard/x3" to be canonical "s3-standard{VJUoK}:TARGETBUCKET/standard/x3"
2023/02/09 12:26:27 ERROR : Attempt 1/3 failed with 1 errors and: SerializationError: failed to unmarshal error message
        status code: 400, request id: 
caused by: UnmarshalError: failed to unmarshal error message
        00000000  3c 3f 78 6d 6c 20 76 65  72 73 69 6f 6e 3d 22 31  |<?xml version="1|
00000010  2e 30 22 20 65 6e 63 6f  64 69 6e 67 3d 22 55 54  |.0" encoding="UT|
00000020  46 2d 38 22 3f 3e 0a 3c  45 72 72 6f 72 3e 3c 43  |F-8"?>.<Error><C|
00000030  6f 64 65 3e 49 6e 76 61  6c 69 64 52 65 71 75 65  |ode>InvalidReque|
00000040  73 74 3c 2f 43 6f 64 65  3e 3c 4d 65 73 73 61 67  |st</Code><Messag|
00000050  65 3e 4d 69 73 73 69 6e  67 20 72 65 71 75 69 72  |e>Missing requir|
00000060  65 64 20 68 65 61 64 65  72 20 66 6f 72 20 74 68  |ed header for th|
00000070  69 73 20 72 65 71 75 65  73 74 3a 20 78 2d 61 6d  |is request: x-am|
00000080  7a 2d 63 6f 6e 74 65 6e  74 2d 73 68 61 32 35 36  |z-content-sha256|
00000090  3c 2f 4d 65 73 73 61 67  65 3e 3c 52 65 71 75 65  |</Message><Reque|
000000a0  73 74 49 64 3e 52 36 31  51 34 30 50 4b 42 35 43  |stId>R61Q40PKB5C|
000000b0  43 39 36 54 33 3c 2f 52  65 71 75 65 73 74 49 64  |C96T3</RequestId|
000000c0  3e 3c 48 6f 73 74 49 64  3e 6d 4b 38 2f 6b 67 38  |><HostId>mK8/kg8|
000000d0  68 38 4a 4f 63 44 4f 69  76 74 38 50 55 6b 5a 59  |h8JOcDOivt8PUkZY|
000000e0  35 58 6b 5a 69 6f 77 41  5a 79 70 2b 51 79 5a 31  |5XkZiowAZyp+QyZ1|
000000f0  62 5a 36 6c 61 38 66 78  4a 64 47 72 68 42 72 46  |bZ6la8fxJdGrhBrF|
00000100  73 49 53 32 41 6c 37 49  48 6f 71 45 70 52 6e 58  |sIS2Al7IHoqEpRnX|
00000110  48 76 77 41 3d 3c 2f 48  6f 73 74 49 64 3e 3c 2f  |HvwA=</HostId></|
00000120  45 72 72 6f 72 3e                                 |Error>|

caused by: unknown error response tag, {{ Error} []}
2023/02/09 12:26:39 INFO  : 
Transferred:              0 B / 0 B, -, 0 B/s, ETA -
Elapsed time:      1m51.3s

Notes and thoughts

  1. This looks an awful lot like the error on empty ACLs with S3 profiles that was fixed in 1.61., see this. Note that the test includes the empty --s3-acl parameter which was part of the 1.61 fix
  2. The problem is directly related to the presence or absence of a declared S3 endpoint in the config entry. For example, using a config entry like this, which is the same except it has no endpoint:
[s3-standard-none]
type = s3
provider = AWS
env_auth = true
region = us-east-1
storage_class = STANDARD

we see:

$ ~/bin/rclone-v1.61.1-linux-amd64/rclone copy --s3-acl "" --s3-profile dr-systems-storage-team 1gb-accountingtest.tz s3-standard-none:ncbi-dr-systems-test-1a/standard/x15/ -vv
2023/02/09 12:35:18 DEBUG : rclone: Version "v1.61.1" starting with parameters ["/home/pattcornerri/bin/rclone-v1.61.1-linux-amd64/rclone" "copy" "--s3-acl" "" "--s3-profile" "dr-systems-storage-team" "1gb-accountingtest.tz" "s3-standard-none:ncbi-dr-systems-test-1a/standard/x15/" "-vv"]
2023/02/09 12:35:18 DEBUG : Creating backend with remote "1gb-accountingtest.tz"
2023/02/09 12:35:18 DEBUG : Using config file from "/home/pattcornerri/.config/rclone/rclone.conf"
2023/02/09 12:35:18 DEBUG : fs cache: adding new entry for parent of "1gb-accountingtest.tz", "/home/pattcornerri/drtesting"
2023/02/09 12:35:18 DEBUG : Creating backend with remote "s3-standard-none:ncbi-dr-systems-test-1a/standard/x15/"
2023/02/09 12:35:18 DEBUG : s3-standard-none: detected overridden config - adding "{VJUoK}" suffix to name
2023/02/09 12:35:18 DEBUG : fs cache: renaming cache item "s3-standard-none:ncbi-dr-systems-test-1a/standard/x15/" to be canonical "s3-standard-none{VJUoK}:ncbi-dr-systems-test-1a/standard/x15"
2023/02/09 12:35:19 DEBUG : 1gb-accountingtest.tz: Need to transfer - File not found at Destination
2023/02/09 12:35:19 INFO  : S3 bucket ncbi-dr-systems-test-1a path standard/x15: Bucket "ncbi-dr-systems-test-1a" created with ACL ""
2023/02/09 12:35:21 DEBUG : 1gb-accountingtest.tz: multipart upload starting chunk 1 size 5Mi offset 0/1.096Gi
2023/02/09 12:35:21 DEBUG : 1gb-accountingtest.tz: multipart upload starting chunk 2 size 5Mi offset 5Mi/1.096Gi
2023/02/09 12:35:21 DEBUG : 1gb-accountingtest.tz: multipart upload starting chunk 3 size 5Mi offset 10Mi/1.096Gi
2023/02/09 12:35:21 DEBUG : 1gb-accountingtest.tz: multipart upload starting chunk 4 size 5Mi offset 15Mi/1.096Gi
2023/02/09 12:35:22 DEBUG : 1gb-accountingtest.tz: multipart upload starting chunk 5 size 5Mi offset 20Mi/1.096Gi

etc
  1. The endpoint is required to use direct connects
  2. The named profile is required for cross account access

This looks to be the same issue as --s3-profile is failing when used with VPC endpoints · Issue #6443 · rclone/rclone · GitHub

What I need to debug is to see it failing with -vv --dump bodies - preferably unedited!

If it contains things you don't want to be made public then you can email a log to nick@craig-wood.com - put a link to this forum post in the email for context.

If you could post a link to a doc about these vpce endpoints that would be useful too - thanks!

Thanks so much ... a response is in your email, along with a little table of tests I've tried.

Probably the best doc on endpoints is here. The tricky bit in this not terribly-well-written doc is the reference to the term 'bucket' in the endpoint URL. That's really a literal 'bucket' and tells the endpoint that we're using it to reference a bucket in S3, not something else.

Continued testing just illustrates and affirms the problem. --s3-profile alone works fine. endpoint alone works fine. The two together fail.

I'm wondering if the interaction between the AWS credentials file and the AWS config file is entirely correct in the code. The rclone documentation often refers to a "profile" residing in the AWS credentials file, but the term "profile" is badly overloaded. The usual (AWS) usage is that the profile is in the AWS config file and the source_profile that the profile relies upon is in the AWS credentials file. The profile in config ties together a source_profile from credentials with a role in the config profile. The net effect is:

  1. The reference in CLI --profile, and rclone --s3-profile refers to the config profile entry
  2. The --profile or (hopefully) --s3-profile reads the source_profile entry in config, finds the corresponding section in credentials, then establishes a session using the keys in credentials and...
  3. Immediately uses that session to assume the role listed in config

That's enough to glaze anybody's eyes. Could we have possibly not quite gotten that right in rclone? Because the term profile is so overloaded that we really have two possible use cases with different interpretations of the word:

  1. The use case this ticket involves, where the indicated --s3-profile points to an entry in the config file which does the authentication and subsequent role assumption indicated above, and
  2. The much more common use case where --s3-profile merely refers to the heading of an entry in AWS credentials.

Maybe we have those two cases mixed up. But why that would interact with the presence or absence of a custom endpoint is a mystery.

I'll keep looking at the code, but don't really have a build toolchain to test anything in go.

Here's an example of the use case of concern (profile is in config, role is assumed after initial auth):

~/.aws/config file, has the profile, a pointer to a credential file entry and a role to assume

[profile dr-systems-storage-team]
role_arn = arn:aws:iam:999999999999:role/ROLE-RCLONE-WILL-ASSUME
source_profile = 888888888888-service-user-dr-systems-storageteam

~/.aws/credentials file has an entry that config's source_profile refers to

[888888888888-service-user-dr-systems-storageteam]
aws_access_key_id=AKIAstuff
aws_secret_access_key = ssssshItsASecret

Any progress here, or any way I can help there be progress? It appears we have a straightforward conflict between --s3-profile and --s3-endpoint.

Also looks like if we could get the two working together it might solve the long session problem posted here via the caching mechanism.

It looks like using --s3-profile (without an explicit private endpoint) does indeed refresh creds which is cool ... but using private endpoints is a security essential.

Apologies for the delay in replying.

Here is where we set the endpoint

I suspect these lines are the culprit as they remove the Endpoint config which got set a few lines earlier

These lines were put in in this commit, and you can see from the commit history that they've been in and out!

Can you try commenting those lines out and see if it works for you.

If that works then we need to find a better fix than those lines are giving.

Any help here much appreciated :slight_smile:

Let me know if you need me to make a build for you.

Yes, thanks Nick! Please do a build ... I'll also try, but have not done so before and will need to install Go. Let me know when the build you did is done, and meantime I'll try to find build and toolchain instructions for future experiments, as it seems that a simple one line fix is just the beginning!

Also, I'm not clear from the notes exactly which lines need commenting out, as there were a few involved

Here is a build with the relevant line commented out

v1.62.0-beta.6743.31cb3beb7.fix-s3-endpoint on branch fix-s3-endpoint (uploaded in 15-30 mins)

I'm not sure if it will make a difference but it is a place to start! If you click on the branch link and look at the latest commit you'll see what I did.

Can you think of a way I could test this myself?

I have an AWS account, but I only have shared key auth set up, so I'd need very detailed instructions to set anything different up :slight_smile:

Took awhile to set up. No change from the branch, e.g. we error out using the v1.62.0-beta.6743.31cb3beb7.fix-s3-endpoint in the familiar way.

2023/02/28 12:57:13 DEBUG : 2 go routines active
2023/02/28 12:57:13 Failed to copy: SerializationError: failed to unmarshal error message
        status code: 400, request id: 
caused by: UnmarshalError: failed to unmarshal error message
        00000000  3c 3f 78 6d 6c 20 76 65  72 73 69 6f 6e 3d 22 31  |<?xml version="1|
00000010  2e 30 22 20 65 6e 63 6f  64 69 6e 67 3d 22 55 54  |.0" encoding="UT|
00000020  46 2d 38 22 3f 3e 0a 3c  45 72 72 6f 72 3e 3c 43  |F-8"?>.<Error><C|
00000030  6f 64 65 3e 49 6e 76 61  6c 69 64 52 65 71 75 65  |ode>InvalidReque|
00000040  73 74 3c 2f 43 6f 64 65  3e 3c 4d 65 73 73        |st</Code><Mess|

Testing should be simple because the failure occurs when specifying an endpoint and profile, even if we specify a standard public endpoint (just tried that). An exact duplicate test would require a custom endpoint, but the standard one seems just as bad.

I'm very happy to work as directly as you can stand on this, as it's become quite an issue and coding around it is being difficult and expensive! Slack or other screen shares will work fine ... I'm E.T. and free until 5:30P (changed)

To duplicate, something like this failcase:

rclone copy --s3-profile YOURPROFILE  --s3-acl "" ~/testbed/bigmovie s3-standard-internet:YOURBUCKET-1 -vv
  1. The --s3-acl appears optional assuming the bucket has no ACL
  2. Sourcefile is any old thing
  3. The provider needs to look something like this:
[s3-standard-internet]
type = s3
provider = AWS
env_auth = true
region = us-east-1
endpoint = https://s3.us-east-1.amazonaws.com
storage_class = STANDARD

The important thing there is to specify an endpoint (I put in the public internet endpoint, which fails as badly as our custom endpoint)

  1. The profile works like this ... I'm going to get real specific because the term profile is badly overloaded and can mean either the name of a section in AWS credentials or a section in AWS config. In our case it must refer to AWS config. I don't want to belabor the point, but as you asked for detail, it works like this:

a. In your AWS account, make a role you can assume to test the capability. The role should allow S3 capabilities. If you're not used to IAM, I can walk you thru the role creation. A role in a second AWS account is what we're doing, but for now let's just get it working with any old role.

b. Identify a bucket in your AWS account. For an even better test, make the bucket policy require the new role. If that's too fiddly for now, any old bucket will do at first.

c. Put your AWS credentials in the AWS credentials file, usually located at `~/.aws/credentials. Something like this:

[MyCreds]
aws_access_key_id=AKIAXXXXXXXXXXXX
aws_secret_access_key=YYYYYYYYYYYY

This is not the entry that you'll refer to in the command line!

d. Create an entry in your AWS profile file (usually ~/.aws/config) that names the profile we use on the rclone command ilne, and that ties together the role we'll assume and the credentials you put in the creds file. Like this:

[profile MyProfile]
role_arn = arn:aws:iam::999999999999:role/YOUR-ROLE-FROM-STEP-1
source_profile = MyCreds
duration_seconds = 900

duration_seconds is optional and good for testing timeouts

You can see how Amazon's lack of documentation skills messes people up here. We are literally creating a profile (in the config sense) and pointing it at a "source_profile" (in the credentials sense). This is why engineers can benefit from writing class!

Anyway, flame off. We have a config profile called MyProfile that refers to a credentials entry called MyCreds and says that when invoked we use the MyCreds keys to:

  • establish a session with MyCreds
  • Assume the role MyRole for the durationSeconds
  • (the cool part) Cache those creds, and refresh them when they will expire.

Then just plug it all into a command:

rclone copy --s3-profile MyProfile --s3-acl "" SOMESOURCEFILE s3-standard-internet:YOURBUCKET-1 -vv

and see if it dies.

The easy way to show that it's the presence of endpoint along with an --s3-profile that causes the problem is to try the same copy with a rclone config that has no endpoint specified, e.g.

[s3-standard-none]
type = s3
provider = AWS
env_auth = true
region = us-east-1
storage_class = STANDARD

You should have my work email, or key me with a response and I'll be happy to talk about it, or walk you through.

I did figure out (I think) how to do builds ... easier than I thought, just had to find the docs.

R.

ADDENDUM: Figured out from timestamps you're probably across the pond. Will try to get up early and leave another note in case you want a collaboration to get things going, dogs willing.

OK that is one thing ticked off the list :slight_smile:

OK, let me give that a go. I'm using :s3: which doesn't use the config file here, so I can get everything into the command line.

My ~/.aws/credentials looks like

[default]
aws_access_key_id=XXX
aws_secret_access_key=XXX

[rclone]
aws_access_key_id=YYY
aws_secret_access_key=YYY

[profile MyProfile]
role_arn = arn:aws:iam::123123123:role/s3-full
source_profile = rclone
duration_seconds = 900

Now test, first without endpoint

rclone-v1.61.1 lsf --s3-provider AWS --s3-profile "MyProfile" --s3-env-auth --s3-acl "" :s3:rclone -vv --dump headers --s3-region eu-west-2

Works fine, now with endpoint

rclone-v1.61.1 lsf --s3-provider AWS --s3-profile "MyProfile" --s3-env-auth --s3-acl "" :s3:rclone -vv --dump bodies --s3-region eu-west-2 --s3-endpoint s3.EU-west-2.amazonaws.com

Does not work! And gives the same error - so I have a reproduction :smile:

No idea as to why it is failing, but will continue after lunch!

Thanks so much! All the above aligns with what I'm seeing too, and perhaps there's a way forwards now. I'll check in periodically through the day and tomorrow in the hope that reproduction is the key to success!

R.

I have found the problem...

The first request to assume the role should go to sts.amazonaws.com but if endpoint is set then it goes to the endpoint set instead which is clearly wrong.

Working (without --s3-endpoint)

2023/03/01 15:35:19 DEBUG : POST / HTTP/1.1
Host: sts.amazonaws.com
User-Agent: rclone/v1.62.0-beta.6752.9baa4d1c3
Content-Length: 151
Authorization: XXXX
Content-Type: application/x-www-form-urlencoded; charset=utf-8
X-Amz-Date: 20230301T153519Z
Accept-Encoding: gzip

Action=AssumeRole&DurationSeconds=900&RoleArn=arn%3Aaws%3Aiam%3A%XXX%3Arole%2Fs3-full&RoleSessionName=XXX&Version=2011-06-15

2023/03/01 15:35:19 DEBUG : HTTP RESPONSE (req 0xc000169900)
2023/03/01 15:35:19 DEBUG : HTTP/1.1 200 OK

Broken (with --s3-endpoint s3.eu-west-2.amazonaws.com)

2023/03/01 15:35:29 DEBUG : HTTP REQUEST (req 0xc0006ae400)
2023/03/01 15:35:29 DEBUG : POST / HTTP/1.1
Host: s3.eu-west-2.amazonaws.com
User-Agent: rclone/v1.62.0-beta.6752.9baa4d1c3
Content-Length: 151
Authorization: XXXX
Content-Type: application/x-www-form-urlencoded; charset=utf-8
X-Amz-Date: 20230301T153529Z
Accept-Encoding: gzip

Action=AssumeRole&DurationSeconds=900&RoleArn=arn%3Aaws%3Aiam%3A%XXX%3Arole%2Fs3-full&RoleSessionName=XXX&Version=2011-06-15

2023/03/01 15:35:29 DEBUG : HTTP RESPONSE (req 0xc0006ae400)
2023/03/01 15:35:29 DEBUG : HTTP/1.1 400 Bad Request
Connection: close
Transfer-Encoding: chunked
Content-Type: application/xml
Date: Wed, 01 Mar 2023 15:35:29 GMT
Server: AmazonS3
X-Amz-Id-2: Ocj8aCVasmvlq2LrKbPqZtZUj64zo1EZOonL4cFE/81mMuMyCUSrzGbzWkbZ+WMwZcQR+T2Tmwk=
X-Amz-Request-Id: ZF315VTSGNPDHP0F

126
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidRequest</Code><Message>Missing required header for this request: x-amz-content-sha256</Message><RequestId>ZF315VTSGNPDHP0F</RequestId><HostId>Ocj8aCVasmvlq2LrKbPqZtZUj64zo1EZOonL4cFE/81mMuMyCUSrzGbzWkbZ+WMwZcQR+T2Tmwk=</HostId></Error>

So that all makes sense now. Now how to work around....

OK here is a potential fix!

v1.62.0-beta.6753.c6b0587dc.fix-6443-s3-endpoint on branch fix-6443-s3-endpoint (uploaded in 15-30 mins)

This implements an endpoint resolver so rclone can override only the "s3" service. It seems to work in my testing.

There may be a problem. The reason we (and others) use custom endpoints is for (a) security, and (b) operation in environments without direct access to the internet.

So, if the first call goes to an STS endpoint, we can't just assume it's the public AWS STS endpoint (and even if we did, we'd have to deal with region). We also cannot use the S3 endpoint we already specified, because the specified --s3-endpoint is precisely an S3 endpoint and endpoints are specific to a service.

Ideally we'd create an STS endpoint in a VPC, and specify the STS endpoint (different than the s3 endpoint, as noted above) for STS calls.

Do you think you might engineer the solution such that we can specify the STS endpoint as a command line or configuration option (ideally command line)? That way we can handle both scenarios. We could even "default" to the regional STS endpoint, but override with the command line or the configuration STS_ENDPOINT option.

Might that work?

Here's a link to some AWS doc on STS endpoints. They almost imply that a public endpoint call may be necessary along with the private VPC endpoint -- I certainly hope not!

I'm doing double duty as a plumber in our kitchen, will get back in a half hour or so

This should use the same sts endpoint as is used without setting the --s3-endpoint flag did. So if it was working without --s3-endpoint it should carry on working.

I don't know how sts endpoints are confgured in the environment - maybe they aren't?

An --s3-sts-endpoint flag would be straightforward to implement and you could set that in the config file or on the command line.

Give a test of the code when its built and let me know if you need that :slight_smile:

I'm running rclone-v1.62.0-beta.6753.c6b0587dc.fix-6443-s3-endpoint-linux-amd64 on CentOS7 and it's working in a preliminary test! :grinning: Will push a little harder including some timing tests to check the cached creds.

As far as STS endpoints, I've never done them yet, but will do so in anticipation of being able to use them. Endpoints are typically configured (via console, terraform, api, etc) in a VPC, which is something I can definitely do!

`

Creating an STS endpoint turned out to be (at least until we test it) as simple as you'd think. Select a VPC in a region, specify STS for the service, pick a decent SG, etc. If we get something to test, I'm ready, although preliminary testing can be done, I think, with the public STS endpoint by direct specification as well as default.

Thanks for the help on the core issue, about to test with some bigg-ish transfers.

So far the large transfers have successfully outlived their DurationSeconds and are continuing ... I'll know for sure in the early morning, but things look good.

Yes, if you can implement the separate --s3-sts-endpoint flag that would be gratefully received and would complete our use cases! If it defaults to the region's public endpoint it should avoid getting in anyone else's way!

I've given that a go here

v1.62.0-beta.6754.59e798204.fix-6443-s3-endpoint on branch fix-6443-s3-endpoint (uploaded in 15-30 mins)

That's made a --s3-sts-endpoint and corresponding config file entry sts_endpoint. This seems to work in my testing!