Rclone with Amazon S3 access point

What is the problem you are having with rclone?

When trying to use Rclone to access an S3 access point, it isn't working.

./rclone ls afs1-ap:<redacted>-s3alias/
2023/10/18 14:13:38 Failed to ls: InvalidRequest: The authorization mechanism you have provided is not supported. Please use Signature Version 4.
        status code: 400, request id: <redacted>, host id: <redacted>

The AWS CLI works correctly with the access point, from the same place.

aws s3 ls <redacted>-s3alias --region af-south-1

Run the command 'rclone version' and share the full output of the command.

./rclone version
rclone v1.63.1
- os/version: amazon 2 (64 bit)
- os/kernel: 4.14.322-246.539.amzn2.x86_64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.20.6
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Amazon s3, but via an Access Point.

The s3 bucket and the access point are created in a different account to my account.
I am trying to access it from my account using the access point alias.

Note, I have also tried using the Access Point ARN but this seems to be invalid with rclone (maybe all the semi-colons etc messing it up). The Access Point Alias does not have special characters and since it works with the AWS CLI tool I thought I would log this issue with the same context - using the access point alias rather than the ARN.

Here is roughly what I get when trying the AP ARN:

./rclone -vvv ls afs1-ap:arn:aws:s3:af-south-1:123456123456:accesspoint/my-access-point

<output>
2023/10/18 14:22:50 Failed to ls: InvalidARNError: invalid ARN
caused by: invalid Amazon s3 ARN, resource-id not set, arn:aws:s3:af-south-1:123456123456:accesspoint

Note, I am not using the more recent "Cross Account Access Point" which is when the bucket and the access point are in different accounts. I am using the older "standard" access point described above, where bucket and access point are in the same account (another account) and have been shared with my account (specifically, my VPC in my account).

I have tried specifying the access point standard endpoint for my region, which is documented here:

The command you were trying to run (eg rclone copy /tmp remote:tmp)

./rclone -vvvvv ls afs1-ap:<redacted>-s3alias/

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

 ./rclone config redacted
Command config needs 0 arguments maximum: you provided 1 non flag arguments: ["redacted"]
./rclone config file
Configuration file is stored at:
/home/ec2-user/.config/rclone/rclone.conf

cat /home/ec2-user/.config/rclone/rclone.conf

[afs1-ap]
type = s3
provider = AWS
no_check_bucket = true
server_side_encryption=aws:kms

region = af-south-1
# location_constraint = af-south-1
endpoint = https://s3-accesspoint.af-south-1.amazonaws.com

A log from the command that you were trying to run with the -vv flag

./rclone -vvvvv ls afs1-ap:<redacted>-s3alias/

2023/10/18 14:09:03 DEBUG : rclone: Version "v1.63.1" starting with parameters ["./rclone" "-vvvvv" "ls" "afs1-ap:<redacted>-s3alias/"]
2023/10/18 14:09:03 DEBUG : Creating backend with remote "afs1-ap:<redacted>-s3alias/"
2023/10/18 14:09:03 DEBUG : Using config file from "/home/ec2-user/.config/rclone/rclone.conf"
2023/10/18 14:09:03 DEBUG : name = "afs1-ap", root = "<redacted>-s3alias/", opt = &s3.Options{Provider:"AWS", EnvAuth:false, AccessKeyID:"", SecretAccessKey:"", Region:"af-south-1", Endpoint:"https://s3-accesspoint.af-south-1.amazonaws.com", STSEndpoint:"", LocationConstraint:"", ACL:"", BucketACL:"", RequesterPays:false, ServerSideEncryption:"aws:kms", SSEKMSKeyID:"", SSECustomerAlgorithm:"", SSECustomerKey:"", SSECustomerKeyBase64:"", SSECustomerKeyMD5:"", StorageClass:"", UploadCutoff:209715200, CopyCutoff:4999341932, ChunkSize:5242880, MaxUploadParts:10000, DisableChecksum:false, SharedCredentialsFile:"", Profile:"", SessionToken:"", UploadConcurrency:4, ForcePathStyle:true, V2Auth:false, UseAccelerateEndpoint:false, LeavePartsOnError:false, ListChunk:1000, ListVersion:0, ListURLEncode:fs.Tristate{Value:false, Valid:false}, NoCheckBucket:true, NoHead:false, NoHeadObject:false, Enc:0x3000002, MemoryPoolFlushTime:60000000000, MemoryPoolUseMmap:false, DisableHTTP2:false, DownloadURL:"", DirectoryMarkers:false, UseMultipartEtag:fs.Tristate{Value:false, Valid:false}, UsePresignedRequest:false, Versions:false, VersionAt:fs.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Decompress:false, MightGzip:fs.Tristate{Value:false, Valid:false}, UseAcceptEncodingGzip:fs.Tristate{Value:false, Valid:false}, NoSystemMetadata:false}
2023/10/18 14:09:03 DEBUG : Resolving service "s3" region "af-south-1"
2023/10/18 14:09:03 DEBUG : fs cache: renaming cache item "afs1-ap:<redacted>-s3alias/" to be canonical "afs1-ap:<redacted>-s3alias"
2023/10/18 14:09:03 DEBUG : 4 go routines active
2023/10/18 14:09:03 Failed to ls: InvalidRequest: The authorization mechanism you have provided is not supported. Please use Signature Version 4.
        status code: 400, request id: <redacted>, host id: <redacted>

I saw the below topic was marked as resolved, but unfortunately specifying the endpoint didn't help me.

You don't appear to have any auth specified? Try adding env_auth = true

3 Likes

Wow, @ncw thank you for spotting that!
(And in general, thanks for the quick response and all the contributions on this amazing piece of software)

It works perfectly now.
I will paste below my working configuration for others who see this post.

One suggestion from my side, which I'm happy to log as a request if you agree it is helpful....

Most common AWS tools use similar credential providers, including checking the environment by default. I work with these tools a lot so I'm used to how they operate. This, and the cryptic error message bubbling up from (I assume) the AWS SDK is why I didn't straight away realise I need to set env_auth = True.

Because of my background I think that needing to set env_auth is a bit of an "unexpected" nuance of rclone.
It could be helpful to call this out in the debug logging somewhere, if possible, to guide users like me.

Suggest 2 ways to do this:

  1. If using AWS provider and env_auth is not True (and maybe if no other auth-related things specified) then print a debug message specifically mentioning this.
  2. Catch the cryptic error from the SDK and print a debug message suggesting something like "did you mean to set env_auth=True" ?

Either way, thanks again for the feedback :slight_smile:

And one last suggestion.....

For AWS the suffix -s3alias in bucket names is reserved specifically for Access Points. A user cannot create a bucket with that suffix. This means rclone could perhaps determine the default access point endpoints to use.
(This could also be done from the Access Point ARN if that is supported one day)
ref Bucket naming rules - Amazon Simple Storage Service

1 Like

Working configuration pasted below.

Note the custom endpoint needed for s3 access point, this is per above linked AWS S3 documentation, for my region.

cat ~/.config/rclone/rclone.conf

[afs1-ap]
type = s3
provider = AWS
no_check_bucket = true
server_side_encryption=aws:kms
region = af-south-1
endpoint = https://s3-accesspoint.af-south-1.amazonaws.com
env_auth = true

Sample usage. The Access point ARN does not work, but the Access point alias does work, good enough.

rclone ls afs1-ap:my-access-point-<redacted>-s3alias/some-prefix

Note: If you then get "Access Denied" it is likely a missing or incorrect AWS IAM permission.
The AWS documentation does not clearly mention this, but the following 3 authorizations may be involved:

  • Bucket policy on the bucket
  • Access point policy on the access point
  • IAM policy on the consumer role/user, referencing the S3 bucket /object resources.

To elaborate on the 3rd one - specifically, the consumers IAM policy must allow access to the s3 bucket / s3 object resources even though an access point is being used - it does not work if the access point resource is referenced.

    # terraform for the policy on my IAM role
    actions = [
      "s3:List*",
      "s3:Get*"
    ]
    resources = [
      "arn:aws:s3:::my-bucket-not-my-access-point",
      "arn:aws:s3:::my-bucket-not-my-access-point/*"
     ]

Glad you got it working :slight_smile:

If I was starting rclone from scratch again today, I'd make env_auth=True the default and turn it off if you provided any other auth.

We could almost do this without breaking anything today, except for anonymous access which doesn't need any auth...

A log message is not a bad idea, though detecting whether there is auth in the environment isn't easy. This stands a danger of being a false positive with anonymous access.

This message is quite odd! What is should say I think is no authorization found and if you try this on a normal bucket you'll get "AccessDenied: Access Denied" which is a bit more of a hint that there is something wrong with the authorization.

So I think this particular message is an oddity of s3 access points. They are looking for v4 auth and if there isn't any it gives this message.

Ah, interesting.

Maybe there should be a config flow to help users with this. How common are s3 access points? I'm not really sure what they are for though - can you give a simple explanation?

You are accessing the ARN as if it was a bucket... I'm not sure that will work with rclone but there is probably some way of giving it to the S3 SDK and getting it to do something useful with it!

Thanks for your useful notes - I'm sure those that follow will find them invaluable!

Thanks for the feedback :slight_smile:

Maybe a message like this - regardless of the environment state or any auth settings provided. It's simply stating to the user what the behavior of env_auth=False does. To me, it sounds compatible with anonymous access as well.

2023/10/18 14:09:03 DEBUG : Since env_auth=False, the runtime environment is not used for authentication.

This could well be true, in general S3 access-related messages can be cryptic or misleading. Not an rclone problem.

A simple explanation is unfortunately not a short one :smiley:

S3 access points are a relatively new feature.

The problem
With large organisations using many AWS accounts, managing access to S3 buckets cross-account can become a headache. There are bucket policies (with a short 20kB length limit), or IAM users with IAM policies, or IAM roles with IAM policies. This means quite a bit of testing (what does the application support) and eventually spaghetti (a mix of these methods in practice) is needed to provision what should be "simple" access.

Most of the time, IAM users are not allowed because of the risk of having static long-lived credentials. So then it is down to bucket policy or IAM role policy.

If you allow access via the bucket policy, it gets full real quick. Worse - since there can only be one per bucket, it often means one team or repo managing it and that doesn't scale very well. Especially if each consumer needs to do their own fine-grained permissions in the policy...

If you go with IAM role policy, you need to configure this on both the source and target accounts, and then "jump into" the role assuming the application can do this.

A solution?
Access points are basically aliases for a bucket.

A bucket can have many access points defined, for example maybe you create two per consumer - one for "read" and one for "read write".
Then, each access point can have a policy on it that controls that consumer's specific low-level access within the bucket.

The net permissions is still determined by the combination of bucket policy + access point policy + IAM policy.

This still ends up with some spaghetti but it's a bit less than before. Not a complete solution.

One useful thing, you can share an access point to a specific VPC in a consumer account. So you could have a backend VPC with things that are allowed to talk to the bucket, and a frontend VPC which is not allowed.

Another solution?

You now (March 2023) have "cross account access points" which work the same but means the access point can be created in the consumer's account, not co-located with the bucket in its account. The consumer can see and manage their own fine-grained permissions for their application in their own account, and not need to access things in the bucket owner account.

ref Simplify and scale access management to shared datasets with cross-account Amazon S3 Access Points | AWS Storage Blog

TL;DR
It's a way to implement cross-account access to buckets, which scales better in a large organization. Particularly if you have a "central" account with many consumers eg a "data lake" pattern.

I think it will become a more common pattern in future at least in large orgs :slight_smile:

I tried this which gives a debug if rclone detects anonymous credentials. Can you give it a go and see if you think it would have helped you?

v1.65.0-beta.7449.debbb4ccb.fix-s3-auth-warning on branch fix-s3-auth-warning (uploaded in 15-30 mins)

Thank you for your explanation of s3 access points. I think I now grasp roughly what they are for. Yes they do sound like the sort of thing that big enterprises will love and I suspect they will be more common.

Looking at your config, you have

When you run through the S3 AWS config it doesn't show you the endpoint list at all at the moment.

Option endpoint.
Endpoint for S3 API.
Leave blank if using AWS to use the default endpoint for the region.
Enter a value. Press Enter to leave empty.
endpoint> 

What I could do is add the list of access point endpoints there and change the message to something like

Leave blank if using AWS to use the default endpoint for the region.

It is not necessary to set this unless you are using an Access Point.

If you are using an Access Point then please pick the correct one for your region.

That would be relatively easy - do you think that would be enough help? Any suggestions to the wording? Do you think it would confuse the majority of the users who don't use access points?

Still working on this :wink:

Meanwhile, a correction to the IAM policy required on the consumer side.
Both the bucket AND the access point resources are needed.

    # terraform for the policy on my IAM role
    actions = [
      "s3:List*",
      "s3:Get*"
    ]
    resources = [
      # Access point resources
      "arn:aws:s3:af-south-1:<account with bucket>:accesspoint/<access point name>",
      "arn:aws:s3:af-south-1:<account with bucket>:accesspoint/<access point name>/object/*",
      # Bucket resources
      "arn:aws:s3:::my-bucket-not-my-access-point",
      "arn:aws:s3:::my-bucket-not-my-access-point/*"
     ]
1 Like

Circling back to this :slight_smile:

I've tested and this is perfect, it would definitely have helped me pick up my configuration issue. And does not break anything.
Thank you!

So at first I wasn't understanding why this didn't work by default, as surely the AWS CLI uses the same AWS SDK and it "just works". Plus, looking at the rclone code, it should fallback to what the SDK does... This lead me to re-test what I was doing.

And (of course) after I removed the endpoint= from my rclone.conf, rclone was still working perfectly against the access point.

So I don't think any further changes are necessary at this point, the interactive config is clear enough. To be honest, I've also not been using it

Falling back to the AWS SDK is perfect in case (when) the default values change.

If anything else pops up I'll let you know!

Thanks again for the help.

Excellent!

Great.

I've merged the change above to master now which means it will be in the latest beta in 15-30 minutes and released in v1.65

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.