Securely Calling AWS APIs From Kubernetes

How does a workload on Kubernetes securely communicate with AWS APIs? If you're running EKS this is pretty trivial, since EKS has been able to generate credentials for AWs APIs for quite some time. What if you're on an on-prem cluster or running in a different cloud? In this blog post we're going to look at multiple solutions to this problem, depending on what kind of infrastructure you're running.

Option 1 - Static Credentials

The easiest way to go is to generate a user with a static API key and secret. This method is pretty easy, but is also the most likely to lead to a compromised credential. There's a reason the AWS console throughs up all kinds of warnings against this approach. When using a static credential with AWS, you'll want to:

Put as many restrictions as is feasible on the credentials. Such as restricting policies by source ip and limiting services.
Rotate often. It's best to use a secrets vault to store these credentials so that it's easier to track and rotate them.
Monitor usage to ensure bad actors haven't started using the credentials.

While the static credentials route is the easiest to get started with, it will lead to quite a bit of complexity in the long term to maintain your security. You could build an operator (or use an existing one), to limit access to these credentials, but what you're really doing is creating another attack vector for someone to use into your infrastructure. It's best to avoid this route.

Option 2 - Identity Federation

AWS provides a very robust framework for integrating external identities to avoid using a static credential. They use a concept of a Security Token Service (STS), that let's you exchange a JSON Web Token or a SAML2 assertion for an AWS session that is associated with an AWS role under a correct set of circumstances. For OpenID Connect, setting up an AWS identity provider is pretty straight forward:

Create an IAM Role with permissions that you want someone from your identity provider to be able to do. For instance, create a Role that lets a workload push artifacts into S3
Create an IAM Identity Provider with the URL of your identity provider's OIDC discovery document
Associate your IAM Identity Provider with the IAM Role you created
Have your workload call AWS' STS to get an IAM session

There's no need to establish a trust based on shared secrets between AWS and your workload because there are already multiple layers of security to establish trust:

Your identity provider's OIDC discovery document contains the public keys that can be used to verify your JWTs. These keys are public, because you can only verify a JWT with them, you can't sign one.
Your identity provider's OIDC discovery document is hosted behind your certificate, which has to be signed by a commercial CA. This means that the document retrieved by AWS came from the service you specified during your configuration
AWS takes thumb prints of the certificates, so even if rotate your keys you'll need to reestablish the trust with AWS.
When configuring AWS, your OIDC document MUST be served from the issuer's "well-known" address. For instance, if your issuer were "https://myidp.aws.tremolo.dev" then your OIDC discovery document MUST be hosted at "https://myidp.aws.tremolo.dev/.well-known/openid-configuration". This way, you couldn't generate tokens issued from a domain that you don't own.

Now that we know how AWS can leverage an external identity using an STS, let's look at some ways that you can generate identities to use with AWS.

Kubernetes TokenRequest API

Kubernetes is able to generate JWTs for both its self and that can be consumed by 3rd parties, like AWS, using the TokenRequest API. Before we dig into the how, let's walk through what the TokenRequest API is.

When Kubernetes was first released, every Pod had an identity. The identity of the Pod was a JWT associated with a ServiceAccount that was generated by the API server and stored as a Secret inside of Kubernetes' etcd database. While this JWT could be regenerated by deleting and recreating the ServiceAccount it was associated with, the JWT its self had no expiration. In addition to never expiring, these tokens were associated with a ServiceAccount, not an individual Pod. If your Deployment had ten Pods, they all shared the same identity so it was impossible to know which Pod had been compromised if one had been. Finally, these tokens were often abused by generating them and then handing them to users or external systems. I wrote a previous blog post as to why this is an anti-pattern.

Due to the issues with ServiceAccount tokens, Kubernetes introduced the TokenRequest API to provide a way for Kubernetes to generate a unique, short lived identity for each Pod. A JWT is generated for each Pod but is not stored inside of Kubernetes' etcd database. Also, when the Pod is destroyed, the JWT is no longer accepted by the API server. Using the ToeknRequest API became the standard starting with Kubernetes 1.24.

Now that we know how the TokenRequest API came to be, let's explore how you can use it to work with AWS. First, you need to enable access to the OIDC discovery document to unauthenticated users:

kubectl create clusterrolebinding oidc-reviewer --clusterrole=system:service-account-issuer-discovery --group=system:unauthenticated

The above command creates a ClusterRoleBinding that allows unauthenticated users to access the discovery document. You can now get your API servers OIDC discovery document:

curl --insecure https://192.168.2.23:6443/.well-known/openid-configuration 2>/dev/null | jq -r
{
  "issuer": "https://kubernetes.default.svc.cluster.local",
  "jwks_uri": "https://192.168.2.23:6443/openid/v1/jwks",
  "response_types_supported": [
    "id_token"
  ],
  "subject_types_supported": [
    "public"
  ],
  "id_token_signing_alg_values_supported": [
    "RS256"
  ]
}

There are a few issues with this document for use with AWS. First, the issuer (iss) claim is "https://kubernetes.default.svc.cluster.local", which isn't the URL of our cluster. We can create a new issuer by adding the "--service-account-issuer" flag to our API server, but that assumes you even have that capability (most managed clusters don't provide you this capability). Also, since we had to allow unauthenticated access via the above RBAC binding, that could also create a security issue. You could overcome this by setting up a reverse proxy that has a constrained ServiceAccount, but even then you need to have a public facing URL and proxy to make this document work with AWS' identity provider configuration.

Assuming you've engineered a system that works for you to expose your OIDC discovery document to the internet, the next thing is to project a token into your Pod:

    volumes:
    - name: oidc-token
      projected:
        sources:
          - serviceAccountToken:
              path: oidc-token
              expirationSeconds: 7200
              audience: aws

In the above YAML, we're specifying an audience for the token to generate, and the TokenRequest API will mount a token that is good for two hours. The minimum time for a token is ten minutes. Assuming you are able to publish your OIDC discovery document on the web, this token can now be used to get an AWS identity.

While this option can be made to work, it's quite difficult to do it securely. You would need to:

Configure your API Server to generate JWTs for appropriate issuers
Expose your OIDC discovery document on the internet
Mount a TokenRequest API volume to your Pod
Use the mounted token to get an AWS session

The first two steps can be problematic for multiple reasons. First, you may not have the ability to update your API server's parameters. Next, you need a secure way to expose your discovery document. Finally, you need to host your OIDC discovery document publicly on the internet, which especially for on-prem clusters is a non-starter. This feature is great for internal resources a cluster may need to interact with, such as a vault, but not for accessing AWS.

Having explored how the TokenRequest API can be used to interact with AWS, let's look at how the SPIRE project can be used.

SPIRE and SPIFFE

The Secure Production Identity Framework for Everyone, or SPIFFE, is a standard that was created to generate product agnostic identities for any workload. The SPIRE project is a graduated Cloud Native Computing Foundation project that implements the SPIFFE framework. It can be used to provide any workload an identity in the form of a JWT or a PKI key pair. Since Amazon can consume an online OIDC discovery document and accept JWTs that are covered by that document, you can use SPIRE to generate JWTs that AWS will trust to generate AWS session keys. There are two drawbacks to this approach:

You need to deploy SPIRE, which requires its own infrastructure and domain knowledge
For AWS to consume SPIRE's OIDC discovery document, it must be hosted on the internet

SPIRE is a great project, and at KubeCon NA 2019 - Multi-Cloud Workload Identity with SPIFFE does a great job of showing how SPIRE can be used for this very use-case. Deploying SPIRE will provide your build infrastructure with significant benefits, the question is can you deploy it securely in enough time to make it useful for your workloads? If you have the resources to deploy SPIRE for your workloads, then you should absolutely do it.

OpenUnison as a Security Token Service (STS)

OpenUnison is what's known as an "Identity Proxy", meaning that it can translate between two different identity systems. For instance, if you're using GitHub, OpenUnison is translating a GitHub session into an OpenID Connect or impersonation session so you can access your Kubernetes cluster. The same functionality can be used to translate a Pod's ServiceAccount token into a token that can be consumed by AWS. The first question you might ask is, "How is this different then using the TokenRequest API? Won't I have the same limitations?" The difference between using OpenUnison and the TokenRequest API is that instead of generating an OpenID Connect token, OpenUnison will generate a SAML2 assertion for the AWS STS. Unlike OIDC IAM Identity Providers, SAML2 IAM Identity providers require that you upload your SAML2 metadata. This means that using OpenUnison, you don't need to make your cluster accessible from the internet! Let's walk though creating a Pod that's able to interact with S3 without needing an AWS identity of its own. The first step is to deploy OpenUnison with your authentication system of choice. Once OpenUnison is deployed we need to do a few things:

Create a SAML2 identity provider in OpenUnison
Generate the metadata from the identity provider and deploy an AWS IAM SAML2 Identity Provider
Create an Application that uses the ScaleJS Token system to generate an AWS session

First, create your identity provider:

kubectl apply -f https://gist.githubusercontent.com/mlbiam/facea0acecfcd7fbfb08f44b0c8f305e/raw/3a0af30c81901bb842f716045791d79a5d422cc5/aws-saml1-idp.yaml

We're now able to generate our identity provider. Using the aws cli:

curl --insecure https://k8sou.192-168-2-24.nip.io/auth/forms/saml2_idp_metadata.jsp\?idp\=aws > /tmp/aws-saml2.xml
aws iam create-saml-provider --name kube-saml2 --saml-metadata-document "$(< /tmp/aws-saml2.xml)"
{
    "SAMLProviderArn": "arn:aws:iam::252245117542:saml-provider/kube-saml2"
}

Hold on to the SAMLProviderArn, we'll need that in a moment. Next, use that to set the Federated option in the below AWS trust document. Also update the SAML:iss with your OpenUnison's host. Mine looks like:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::XXXXXXXXXXXXXXX:saml-provider/kube-saml2"
            },
            "Action": "sts:AssumeRoleWithSAML",
            "Condition": {
                "StringEquals": {
                    "SAML:iss": "https://k8sou.192-168-2-24.nip.io/auth/idp/aws"
                }
            }
        }
    ]
}

Save this file to /tmp/trust.json. Next we'll create an IAM role:

aws iam create-role --role-name kube-saml2 --assume-role-policy-document "$(< /tmp/trust.json )"
{
    "Role": {
        "Path": "/",
        "RoleName": "kube-saml2",
        "RoleId": "AROATVOX3BJTEDCFFW6ER",
        "Arn": "arn:aws:iam::XXXXXXX:role/kube-saml2",
        "CreateDate": "2023-04-04T22:07:10Z",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Federated": "arn:aws:iam::XXXXXXXX:saml-provider/kube-saml2"
                    },
                    "Action": "sts:AssumeRoleWithSAML",
                    "Condition": {
                        "StringEquals": {
                            "SAML:iss": "https://k8sou.192-168-2-24.nip.io/auth/idp/aws"
                        }
                    }
                }
            ]
        }
    }
}

Finally, attach your policy. I created a simple policy that lets me list the contents of an S3 bucket:

aws iam attach-role-policy --role-name kube-saml2  --policy-arn 'arn:aws:iam::XXXXXXXXX:policy/unit-test-s3-read-only'

The last step is to setup our token endpoint where we can get our AWS session. First, download the token endpoint's application manifest. Next, update the idpName and the roleName with the ARN of your idp and role. Then add the manifest to your Kubernetes cluster. We're now ready to get our token.

I'm going to cheat a bit to keep this post shorter and just run the test from my workstation. First, let's make sure I don't have an existing set of credentials:

➜  ~ aws s3 ls s3://ou-unit-test-bucket
Unable to locate credentials. You can configure credentials by running "aws configure".
➜  ~ aws sts get-caller-identity
Unable to locate credentials. You can configure credentials by running "aws configure".
➜  ~

Next, let's get a token from our OpenUnison container:

export TOKEN="$(k exec -ti $(k get pods -l app=openunison-orchestra -n openunison -o json | jq -r '.items[0].metadata.name') -n openunison -- cat /var/run/secrets/kubernetes.io/serviceaccount/token)"

With our token in hand, let's get our session:

curl  --insecure -H "Authorization: Bearer $TOKEN" https://k8sou.192-168-2-24.nip.io/aws/token/user 2>/dev/null | jq -r '.token["Set Environment Variables"]' > /tmp/creds

This command does quite a bit. It uses our container's token to call the /aws/token/user endpoint, which we just deployed into OpenUnison above. This URL returns JSON with our AWS session. We pipe that into a file so that we can run it to set our environment. Next, we run that file to setup our environment and check using the aws command:

➜  ~ chmod +x /tmp/creds
➜  ~ . /tmp/creds
➜  ~ aws sts get-caller-identity
{
    "UserId": "AROATVOX3BJTEDCFFW6ER:system_serviceaccount_openunison_openunison-orchestra",
    "Account": "252245117542",
    "Arn": "arn:aws:sts::XXXXXXXXX:assumed-role/kube-saml2/system_serviceaccount_openunison_openunison-orchestra"
}
➜  ~ aws s3 ls s3://ou-unit-test-bucket
                           PRE test-folder/

Now, for the next fifteen minutes, we can do anything that our policy will let us all without any static keys! This is a big security boon for any pipeline and didn't require setting up an internet accessible identity provider or special command line tools.

We covered quite a bit in this section, so lets do a quick review. First, we setup an AWS SAML2 identity provider in IAM so that we can generate an AWS session. We then configured the token service to use that AWS identity provider to quickly get an AWS session based on a Kubernetes ServiceAccount, and finally we were able to retrieve our AWS session and do something useful.

What's Next?

Assuming you decide to use OpenUnison to translate between your Kubernetes identity and AWS, you can customize which ServiceAccounts are allowed to interact with the endpoint. You can also setup multiple endpoints for different scenarios. If you want to run this approach, but want a support contract, please don't hesitate to contact us!