Amazon EKS and Identity, 2.0

In the past, we've written blog posts on EKS The Easy Way and how to integrate Okta and EKS pretty easily. In December 2023 Amazon released several new features for integrating EKS into the AWS IAM framework. Let's walk through what's changed in how you access your EKS clusters, and how Kubernetes identity and AWS IAM relate to each other. Before we go too far, I want to thank Micah Hausler, Principal Engineer at AWS, for reviewing this article and keeping me honest!

Mapping from AWS IAM to Kubernetes RBAC

Before we dive too deep into how to configure Kubernetes to understand AWS IAM, let's take a look at these two different identity systems and how they relate to each other. This will make it much easier to work through the different options and how to choose which one is the right one for you.

First off, let's make sure we separate out the nomenclature. Kubernetes and AWS IAM have their own vocabulary and if you're more use to one or the other, you may get lost in the overlap. Let's start with each system's definition of a Role.

Kubernetes Role - A Kubernetes Role is a collection of permissions that are applied to a group of APIs. If you're coming from the Kubernetes world, you're likely familiar with this term. There are two types of Roles: Namespace scoped Roles that apply to APIs that are scoped to namespaces, and ClusterRoles that apply to cluster wide objects. A single identity in Kubernetes can be bound to multiple Roles and ClusterRoles via RoleBinding and ClusterRole binding objects. These objects allow an identity to interact with the Kubernetes API via an identity based on that identity's identifier or group memberships.

AWS IAM Role - An IAM Role is the smallest unit of identity of significance in AWS. A Role has no credentials, but is the place holder for all of a user's interaction with the AWS API. Roles may be associated with multiple Policy objects, but each user can only assume a single Role at a time.

While both an IAM Role and a Kubernetes Role are used to map a "user" to a set of permissions, they do it very differently. This is why so much work goes into mapping an AWS identity into a Kubernetes identity. The two are handled in fundamentally different ways. Where Kubernetes' identity is focussed on the user's binding to different permissions via Roles, AWS assumes you'll only ever have one Role at a time. This leads to the pattern of:

Obtain an AWS IAM Identity - Hopefully via some form of SSO that generates a short lived token
Assume a Role for EKS access - Once you have your identity, you assume a Role that is able to access your EKS cluster
Generate a Kubectl configuration - Once you have assumed a Role that has access to Kubernetes, you need a kubectl configuration that uses a kubectl authentication plugin to establish an identity with EKS and the Kubernetes API

On each API interaction, EKS maps your Role to a Kubernetes identity that looks like any other identity. You can map this identity to a specific identifier and grant that identifier group memberships that you reference in your EKS' RBAC configuration. We'll talk about how that's done next, but for now just know that this is the sequence of events.

Now that we see how an IAM identity relates to a Kubernetes identity, let's walk through that mapping.

Granting Access to an EKS Cluster

Having walked through how a user accesses an EKS cluster, let's walk through providing access. We'll start with a simple EKS cluster and a user. Let's take a look at how AWS sees our identity:

~ aws sts get-caller-identity
{
    "UserId": "AIDATVOX3BJTGAGEPYY2C",
    "Account": "252XXXXXXXXX",
    "Arn": "arn:aws:iam::252XXXXXXXXX:user/blog-eks-user"
}

This user doesn't really have any permissions right now. If we want our user to access our EKS cluster, we need to get them to be able to assume a Role that has cluster access.

NOTE: You can assign IAM users directly to a cluster to provide access, but this is an antipattern. Similar to how explicitly listing users in a RoleBinding or a ClusterRoleBinding is an antipattern, it's important to use role driven access management instead of explicitly defined management. This makes it easier to audit and easier to externalize.

The first thing to do is create a Role that can be assumed by our users. Here's the example I created for this post:

{
    "Path": "/",
    "RoleName": "blog-iam-eks",
    "RoleId": "AROATVOX3XXXXXXXXXXXX",
    "Arn": "arn:aws:iam::252XXXXXXXXX:role/blog-iam-eks",
    "CreateDate": "2024-03-05T16:47:20Z",
    "AssumeRolePolicyDocument": {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "Statement1",
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam::252XXXXXXXXX:user/blog-eks-user"
                },
                "Action": "sts:AssumeRole"
            }
        ]
    },
    "Description": "",
    "MaxSessionDuration": 3600
}

This IAM Role is a trust role that will allow our user assume it. We're hard coding our user as an allowed principal, but in a production rollout you'd tie this role to an identity provider based on some kind of group membership or other Condition instead of statically listing the accounts. Once we have an IAM Role to assume, we need to add some permissions so that our user can access our EKS cluster. Here's an example permission:

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "VisualEditor0",
			"Effect": "Allow",
			"Action": [
				"eks:ListEksAnywhereSubscriptions",
				"eks:DescribeFargateProfile",
				"eks:ListTagsForResource",
				"eks:DescribeInsight",
				"eks:ListAccessEntries",
				"eks:ListAddons",
				"eks:DescribeEksAnywhereSubscription",
				"eks:DescribeAddon",
				"eks:ListAssociatedAccessPolicies",
				"eks:DescribeNodegroup",
				"eks:ListUpdates",
				"eks:DescribeAddonVersions",
				"eks:ListIdentityProviderConfigs",
				"eks:ListNodegroups",
				"eks:DescribeAddonConfiguration",
				"eks:DescribeAccessEntry",
				"eks:DescribePodIdentityAssociation",
				"eks:ListInsights",
				"eks:ListPodIdentityAssociations",
				"eks:ListFargateProfiles",
				"eks:DescribeIdentityProviderConfig",
				"eks:DescribeUpdate",
				"eks:AccessKubernetesApi",
				"eks:DescribeCluster",
				"eks:ListClusters",
				"eks:ListAccessPolicies"
			],
			"Resource": "*"
		}
	]
}

This policy grants read-only access to a cluster, or since we don't have any specific resources listed, all clusters. In a production rollout you'll want to either bind this to specifically named clusters or to clusters via tagging the cluster.

With a user and IAM role in place, the next step is to tell EKS to trust our IAM Role.

Legacy - Using the aws-auth ConfigMap

Until December 2023, the only way to map your IAM Role into your EKS cluster was by updating the aws-auth ConfigMap in the kube-system namespace. This ConfigMap provides a way to map your IAM Role to a Kubernetes "user" by mapping to a user identifier and a set of groups. This is a legacy approach though and is only being included for reference and context. You should avoid using this method and instead use the EKS API to setup the access mapping.

New clusters default to a hybrid of the two possible access control methods, but you really should disable the legacy aws-auth ConfigMap. It's still available in case their are existing controls in place for the aws-auth ConfigMap, but this ConfigMap can be abused for privilege escalation, even when not using AWS native authentication. Do not use this method and disable it unless it's 100% nescessary.

That said, here's an example update to allow our user to access our cluster via the aws-auth ConfigMap:

apiVersion: v1
data:
  mapRoles: |
    - groups:
      - system:bootstrappers
      - system:nodes
      rolearn: arn:aws:iam::252XXXXXXXXX:role/eksctl-blogiam-nodegroup-ng-XXXXX-NodeInstanceRole-XXXXXXXX
      username: system:node:{{EC2PrivateDNSName}}
    - groups:
      - blog-cluster-managers
      rolearn: arn:aws:iam::252XXXXXXXXX:role/blog-iam-eks
      username: blog-eks-cm
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system

The mapRoles key is embedded YAML that tells EKS how to map our IAM Role into Kubernetes by saying any member of our IAM Role will have the user blog-eks-cm and the group blog-cluster-managers. We can see this by assuming our role and getting a kubectl configuration:

$ aws sts assume-role --role-arn arn:aws:iam::252XXXXXXXXX:role/blog-iam-eks --role-session-name testing > /tmp/dontdothis
$ export AWS_ACCESS_KEY_ID=$(jq '.Credentials.AccessKeyId' -r < /tmp/dontdothis)
$ export AWS_SESSION_TOKEN=$(jq -r '.Credentials.SessionToken' -r < /tmp/dontdothis)
$ export AWS_SECRET_ACCESS_KEY=$(jq '.Credentials.SecretAccessKey' -r < /tmp/dontdothis)
$ aws sts get-caller-identity
{
    "UserId": "AROATVOX3BJXXXXXXXXXX:testing",
    "Account": "252XXXXXXXXX",
    "Arn": "arn:aws:sts::252XXXXXXXXX:assumed-role/blog-iam-eks/testing"
}
$ aws eks update-kubeconfig --name blogiam
$ kubectl auth whoami
ATTRIBUTE             VALUE
Username              blog-eks-cm
UID                   aws-iam-authenticator:252XXXXXXXXX:AROATVOX3BJXXXXXXXXXX
Groups                [blog-cluster-managers system:authenticated]
Extra: accessKeyId    [ASIATVOX3BXXXXXXXXXX]
Extra: arn            [arn:aws:sts::252XXXXXXXXX:assumed-role/blog-iam-eks/testing]
Extra: canonicalArn   [arn:aws:iam::252XXXXXXXXX:role/blog-iam-eks]
Extra: principalId    [AROATVOX3BXXXXXXXXXX]
Extra: sessionName    [testing]

We can see that the Username from Kubernetes and the Groups line up with our aws-auth ConfigMap. The UID and extra info keys provide a tie to our assumed IAM Role. You can see the Extra: arn matches the ARN from our call to aws sts get-caller-identity. We can now create RBAC bindings to either the user blog-eks-cm or for the group blog-cluster-managers.

There are some issues with this approach. First, since the aws-auth ConfigMap has embedded YAML in a key, there's no error checking. Make updates carefully! Also, if you have multiple users under the same IAM Role, Kubernetes can't tell the difference. We'll dive in on this later in the post more, since the same issue applies to cluster access provided by the EKS api.

Speaking of the EKS API for access mapping, let's look at that next.

Mapping Access via the EKS API

The new EKS mapping API was introduced in December, 2023. It provides some major benefits over the legacy ConfigMap method:

Type Checked API - Whether you're using the AWS Console, CLI, or orchestration APIs like Pulumi or Teraform you'll get immediate feedback if something was setup incorrectly.
AWS Authorization - The new API allows you to specify not just how to map an IAM Role into Kubernetes, but also how to authorize them for access outside of RBAC. This allows you to use AWS' own policy language instead of RBAC.
Protection Against Deleted Role Collisions - If you create an access entry, delete the IAM Role, but leave the access entry in place, then create a new IAM Role with the same name, the new IAM Role won't be granted access.

The ability to have a typed API makes it much less likely of having a misconfiguration, and assuming you're already using an IaC tool like Pulumi or Teraform to provision your clusters this makes for a major win for better automation and consistency.

The fact that AWS can provide its own authorization opens the door to not only automate your authorization rules via the same IaC tools you use for automating cluster rollouts, but also for AWS to provide other authorization options in the future.

To setup our IAM Role for access to our cluster, we can use the following AWS CLI commands:

aws eks create-access-entry --cluster-name blogiam --principal-arn arn:aws:iam::252XXXXXXXXX:role/blog-iam-eks --type STANDARD --user blok-eks --kubernetes-groups blog-eks-group
aws eks associate-access-policy --cluster-name blogiam --principal-arn arn:aws:iam::252XXXXXXXXX:role/blog-iam-eks --access-scope type=cluster --policy-arn arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy

The first command does the same work as we did earlier by updating the aws-auth ConfigMap. If this was the only command that we ran, we'd need to create an RBAC binding for the user or group the IAM Role is bound to. The second command tells AWS to authorize the user via the AmazonEKSClusterAdminPolicy cluster level policy, allowing us to skip the creation of an RBAC ClusterRoleBinding to our user or group.

With that done we can now access our cluster:

$ aws sts assume-role --role-arn arn:aws:iam::252XXXXXXXXX:role/blog-iam-eks --role-session-name testing > /tmp/dontdothis
$ export AWS_ACCESS_KEY_ID=$(jq '.Credentials.AccessKeyId' -r < /tmp/dontdothis)
$ export AWS_SESSION_TOKEN=$(jq -r '.Credentials.SessionToken' -r < /tmp/dontdothis)
$ export AWS_SECRET_ACCESS_KEY=$(jq '.Credentials.SecretAccessKey' -r < /tmp/dontdothis)
$ aws sts get-caller-identity
{
    "UserId": "AROATVOX3BJXXXXXXXXXX:testing",
    "Account": "252XXXXXXXXX",
    "Arn": "arn:aws:sts::252XXXXXXXXX:assumed-role/blog-iam-eks/testing"
}
$ aws eks update-kubeconfig --name blogiam
$ kubectl auth whoami
ATTRIBUTE             VALUE
Username              blok-eks
UID                   aws-iam-authenticator:252XXXXXXXXX:AROATVOX3BJXXXXXXXXXX
Groups                [blog-eks-group system:authenticated]
Extra: accessKeyId    [ASIATVOX3BXXXXXXXXXX]
Extra: arn            [arn:aws:sts::252XXXXXXXXX:assumed-role/blog-iam-eks/testing]
Extra: canonicalArn   [arn:aws:iam::252XXXXXXXXX:role/blog-iam-eks]
Extra: principalId    [AROATVOX3BJXXXXXXXXXX]
Extra: sessionName    [testing]
$ kubectl get nodes
NAME                             STATUS   ROLES    AGE     VERSION
ip-192-168-24-176.ec2.internal   Ready       3h56m   v1.29.0-eks-5e0fdde
ip-192-168-61-125.ec2.internal   Ready       3h56m   v1.29.0-eks-5e0fdde

Just as with the updates to the aws-auth ConfigMap, we can see that our access to our cluster tracks from our assumed identity.

Having configured our cluster for access, we'll next look to understand how AWS' approach to identity impacts Kubernetes cluster management.

How AWS IAM Changes Kubernetes Cluster Management

Most of this blog post has been written from the perspective of AWS. Earlier, we compared the differences between an IAM Role and a Kubernetes Role, but since then it's been all AWS for configuration and updates. If you are using AWS specific tooling for deploying and managing your clusters, this can be a real power bonus, but what impact is there on the management of your cluster that would differ from a self-deployed cluster?

The first major impact is that the user's unique identity never makes it to the cluster. When we ran kubectl auth whomi earlier we found that the identity, as far as Kubernetes is concerned, is from the assumed role, not the original user. This means that when we look at the audit logs in CloudWatch, we'll see:

"user": {
    "username": "blok-eks",
    "uid": "aws-iam-authenticator:252XXXXXXXXX:AROATVOX3BJXXXXXXXXXX",
    "groups": [
        "blogs",
        "system:authenticated"
    ],
    "extra": {
        "accessKeyId": [
            "ASIATVOX3BXXXXXXXXXX"
        ],
        "arn": [
            "arn:aws:sts::252XXXXXXXXX:assumed-role/blog-iam-eks/testing"
        ],
        "canonicalArn": [
            "arn:aws:iam::252XXXXXXXXX:role/blog-iam-eks"
        ],
        "principalId": [
            "AROATVOX3BJXXXXXXXXXX"
        ],
        "sessionName": [
            "testing"
        ]
    }
},

You can see that the audit log provides the same user information as kubectl auth whoami did. If we want to track the access back to the original user, we're going to need to tie the assumed role and session back to the user in CloudTrail. In addition to not being able to gain access to the original user, if you leverage AWS' new authorization capability, you will need to audit AWS' authorizations in addition to the RBAC authorizations inside of your cluster.

The next major area of change is how to access management GUIs. Whether you're talking about the Kubernetes UI, TektonCD's UI, or Kiali, most clusters are made of more then just the CLI. While it's possible to still deploy these tools, it's much harder using the AWS native access methods because there's no way to replicate the assumed role the same way. You could integrate these tools directly into your identity provider using impersonation the same way we did when Comparing Kubernetes Authentication Methods, but now you have a difference between how users access the Kubernetes API via web UIs vs the CLI.

Another impact on cluster management could come from having multiple mechanisms to access clusters depending on which cloud is managing them. Few large enterprises have a single cloud provider, and forcing users to use different methods for accessing their clusters can cause support headaches.

Finally, the AWS approach can complicate multi-tenancy. While AWS does provide authorizations at the namespace level, those authorization rules are right now not customizable. In the future that's likely to change, but that will slow down your ability to customize authorizations for current deployments.

If these issues impact the way you want to manage your clusters, we'll next cover impersonation and how you can use it to provide a Kubernetes native access solution.

Using Impersonation for EKS Access

While EKS provides limited support for OpenID Connect, the easiest way to provide a Kubernetes native access experience is using an impersonating proxy. When using impersonation, a reverse proxy sits between your users and the API server. The reverse proxy authenticates the user and the request then is forwarded to the API server with additional headers that indicate who the user is and what groups the user is a member of. This mechanism works the same across clouds and works with both the API and web UIs like the dashboard to provide the same level of security regardless of if you're accessing the API via kubectl, a local dashboard, or a web UI. If you want the details as to how impersonation works, you can find them in the free authentication chapter of Kubernetes: An Enterprise Guide - 2nd Edition (Links directly to a PDF in GitHub, no registration or DRM).

In addition to providing a Kubernetes native access mechanism, using an impersonating proxy moves the responsibility of managing access to a provisioned cluster away from the cloud team and onto cluster owners. Depending on how your management is siloed, this can provide a significant relief to your cloud team!

While thee are several ways to implement impersonation, I'm comfortable to say that Tremolo Security's OpenUnison, combined with the kube-oidc-proxy, is the easiest way to deploy it and is trusted by multiple large financial institutions, global consulting firms, US Federal agencies, and even other identity management companies. We pre-build in the certificate management, integration, NetworkPolicies, and RBAC needed to provide a "secure-by-default" implementation whether your authentication method is Okta, AuzreAD, Google, LDAP/Active Directory, SAML2, or GitHub! Check out our documentation and getting started guide at https://openunison.github.io/.

Which Access Method is Right For You?

Choosing between AWS' native authentication methods and using an impersonating proxy doesn't have a "correct" answer. The benefits of using AWS native authentication is that it's native to AWS, so any tooling or IaC you use with AWS will now translate better to EKS. If you're only on AWS, this can also be an advantage as your developers are far more likely to have aws "muscle memory" and be comfortable using the AWS cli to access EKS.

There are drawbacks though, including a separation of authentication and authorization, no integration with web UIs for cluster management and tooling, and requires your cloud team to also own Kubernetes access management. If these impact you, using an impersonating proxy like OpenUnison could be your better approach.