Securing Multi-Cluster ArgoCD

I'm on the final chapters of Kubernetes: An Enterprise Guide, 3rd ed. Just as in previous editions, we're finishing the book with building out a platform based on what we learned through out the book. In the first two editions, we built out a GitOps platform using ArgoCD in a single cluster. In the 3rd Ed, we're using three separate clusters: control plane, development, and production. Each tenant in our cluster will get a namespace in both development, and production and each namespace will run a virtual cluster. In previous attempts at a similar architecture, I deployed an ArgoCD with each vCluster. While this gave quite a bit of control to tenants, it makes for a lot of extra resources and is an additional system that needs to be maintained in each tenant. We were already deploying a centralized Vault and OpenUnison. It would be better to deploy a single, centralized ArgoCD which could then manage each tenant's individual vClusters remotely.

High level platform architecture from Kube: Enterprise Guide 3rd Ed

ArgoCD already has the ability to add a remote cluster for management, but there's a critical issue with how ArgoCD does this, it uses a ServiceAccount and an associated token to authenticate to the remote cluster. We've blogged before about how this is an anti-pattern. ServiceAccount tokens were never designed to be used from outside the cluster. Since 1.24, the standard has been to generate a token that has an expiration, but that means you need to have a rotation strategy and if you were to lose that token, there's no way to invalidate it. We ideally want to be able to generate an identity for our remote cluster based on an existing identity in the control plane cluster.

I wanted my control plane ArgoCD to communicate with clusters via a very short lived token. The token should be tightly scoped and the remote cluster should be able to accept the token without a pre-shared secret. I'd also like to be rotate the key used to sign the token on a regular basis without having to update the downstream cluster. After doing some digging into how ArgoCD works, it became apparent that I had all the pieces I needed, I just had to make them work together.

Part I - Cluster Management

The first component I needed was a secure way to generate a token for my remote cluster. I had already deployed OpenUnison's Namespace as a Service to the control plane cluster, and integrated my development cluster. OpenUnison needs to be able to call the remote API in the same way ArgoCD does, and we accomplish this by deploying a kube-oidc-proxy on the managed cluster that trusts the control plane OpenUnison. This way, OpenUnison can generate a token that can manage the remote cluster. The remote cluster trusts the control plane OpenUnison the same way an on-premise cluster trusts a remote identity provider, by ingesting an OIDC discovery document that includes the public keys needed to validate all tokens generated by OpenUnison.

When OpenUnison needs to call the remote cluster's API, it generates a one minute lived token that's scoped to the kube-oidc-proxy. The proxy is only able to impersonate a specific identity that has cluster-admin access. The API server request is authenticated by the proxy based on the short lived token before injecting the impersonation headers into the request. This lets us securely manage remote clusters without need to have a long lived token.

Now that we have a way to generate tokens that match our criteria, we need ArgoCD to know how to use them.

Part II - ArgoCD and Remote Cluster Authentication

After digging through ArgoCD's documentation, I found how the ApplicationSet operator can create a remote cluster in ArgoCD based on a Secret. ArgoCD uses the client-go SDK for Kubernetes, and supports configuring credential plugins. This is how you might use kubectl with your cloud hosted clusters if you're using their native IAM integration and the docs provide examples for the major clouds how to do this with ArgoCD. There are some drawbacks to relying on a cloud's IAM for Kubernetes:

If you're clusters aren't all on the same cloud, you won't get very far with this approach
Cloud IAM permissions don't always line up well with Kubernetes RBAC
Cloud IAM won't work for on-premises clusters
You, as the Kubernetes team, may not have the ability to manage cloud IAM roles on your own

Since ArgoCD already has a mechanism to use custom credentials and an identity provided by its cluster, I next needed to figure out how to get the identity and tell ArgoCD to use it. Luckily, OpenUnison has me covered!

Part III - Getting a Token

OpenUnison makes it really easy to create an API that I can use to generate a token. Almost any OpenUnison component can be customized via JavaScript. In this case, we're going to define an Application object (in OpenUnison, not ArgoCD) that will take the name of a remote registered cluster, generate a token, and return it back as the response to the call. If you're thinking "wow, that could really be abused", you're right! In order to make sure that only ArgoCD can all our service, we'll want to validate ArgoCD's token to make sure its bound to a running Pod using a TokenReviewRequest. Thankfully, OpenUnison does this right out of the box! That's how OpenUnison validates Prometheus when it calls the OpenUnison metrics endpoint. We're now going to create a pretty simple API:

---
apiVersion: openunison.tremolo.io/v1
kind: Application
metadata:
  name: get-target-token
  namespace: openunison
spec:
  azTimeoutMillis: 3000
  isApp: true
  urls:
  - hosts:
    - "#[OU_HOST]"
    filterChain:
    - className: com.tremolosecurity.proxy.filters.JavaScriptFilter
      params:
        javaScript: |-
          GlobalEntries = Java.type("com.tremolosecurity.server.GlobalEntries");
          HashMap = Java.type("java.util.HashMap");
          
          function initFilter(config) {

          }

          function doFilter(request,response,chain) {
            var targetName = request.getParameter("targetName").getValues().get(0);
            var k8s = GlobalEntries.getGlobalEntries().getConfigManager().getProvisioningEngine().getTarget(targetName).getProvider()


            response.getWriter().print(k8s.getAuthToken());
          }

    uri: /api/get-target-token
    azRules:
    - scope: filter
      constraint: (sub=system:serviceaccount:argocd:argocd-application-controller)
    authChain: oauth2k8s
    results: {}
  cookieConfig:
    sessionCookieName: tremolosession
    domain: "#[OU_HOST]"
    secure: true
    httpOnly: true
    logoutURI: "/logout"
    keyAlias: session-unison
    timeout: 1
    scope: -1
    cookiesEnabled: false

This Application has a single endpoint that uses JavaScript to get the target (cluster), and generate a token. The authChain makes sure that the API is authenticated by a valid ServiceAccount token from a running Pod, and the authorization rule makes sure that only the ArgoCD controller can call this endpoint. For instance if someone were to gain control of the ArgoCD UI, which has its own identity, they couldn't call this service to get tokens.

Once I have an endpoint, now I need a way for kubectl to call it. Turns out you can write a credential provider in anything, even bash! So I wrote a really simple provider that just uses curl to call our endpoint with the right data for our remote cluster using ArgoCD's identity:

#!/bin/bash

REMOTE_TOKEN=$(curl -H "Authorization: Bearer $(<$3)" https://$1/api/get-target-token?targetName=$2 2>/dev/null)

echo -n "{\"apiVersion\": \"client.authentication.k8s.io/v1\",\"kind\": \"ExecCredential\",\"status\": {\"token\": \"$REMOTE_TOKEN\"}}"

The credential plugin just passes in arguments like any other command line would. I built a simple kubectl configuration file to test and it worked great!

apiVersion: v1
kind: Config
users:
- name: openunison-control-plane
  user:
    exec:
        command: /path/to/remote-token.sh
        apiVersion: "client.authentication.k8s.io/v1"
        env: []
        args:
        - k8sou.idp-cp.tremolo.dev/api/get-target-token
        - k8s-kubernetes-satelite
        - /tmp/token
        installHint: |
            copy shell file
        provideClusterInfo: false
        interactiveMode: Never
clusters:
- name: kubernetes-satelite
  cluster:
    server: https://oumgmt-proxy.idp-dev.tremolo.dev
    extensions:
    - name: client.authentication.k8s.io/exec
contexts:
- name: openunison-control-plane@kubernetes-satelite
  context:
    cluster: kubernetes-satelite
    user: openunison-control-plane
current-context: openunison-control-plane@kubernetes-satelite

Now that I have my credential plugin, I need to get it into ArgoCD.

Part IV - Deployment

The great thing about containers is, well, they're self contained. This is also a problem. Thankfully, ArgoCD has a few ways to make the deployment easier. It turns out that in addition to the bash script, I needed to download curl too. The helm chart makes it easy to add additional volume mounts, so I updated my values.yaml so that my controller can download curl and copy in my script to the appropriate place in the container:

controller:
  volumes:
  - name: custom-tools
    emptyDir: {}
  - name: remote-tokens
    configMap:
      name: argocd-remote-tokens
  volumeMounts:
  - mountPath: /custom-tools
    name: custom-tools
  initContainers:
  - name: downloadtools
    image: alpine
    command: [sh, -c]
    args:
    - wget -O /custom-tools/curl https://github.com/moparisthebest/static-curl/releases/download/v8.7.1/curl-amd64 && chmod +x /custom-tools/curl && cp /remote-tokens/remote-token.sh /custom-tools && chmod +x /custom-tools/remote-token.sh
    volumeMounts:
    - mountPath: /custom-tools
      name: custom-tools
    - mountPath: /remote-tokens
      name: remote-tokens

I can run my script manually and get a token from inside of the ArgoCD controller pod! Finally, it's time to tell ArgoCD to sync some yaml!

Part V - Configuration

The first step to integrating with out cluster is to generate a Secret that stores our cluster connection information:

---
apiVersion: v1
kind: Secret
metadata:
  name: k8s-kubernetes-satelite
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: cluster
    tremolo.io/clustername: k8s-kubernetes-satelite
type: Opaque
stringData:
  name: k8s-kubernetes-satelite
  server: https://oumgmt-proxy.idp-dev.tremolo.dev
  config: |
    {
      "execProviderConfig": {
        "command": "/custom-tools/remote-token.sh",
        "args": ["k8sou.idp-cp.tremolo.dev","k8s-kubernetes-satelite","/var/run/secrets/kubernetes.io/serviceaccount/token"],
        "apiVersion": "client.authentication.k8s.io/v1"
      },
      "tlsClientConfig": {
        "insecure": false,
        "caData": "LS0tLS1C..."
      }
    }

We're telling ArgoCD to use our script and the information it needs to generate a token. The great thing is, that while this is stored in a Secret, there's nothing really secret here! No credentials, keys, etc.If you just create this Secret though, you won't find our new cluster in the ArgoCD interface. You still need to deploy an ApplicationSet for the operator to pick it up. Here's my ApplicationSet:

---
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: test-remote-cluster
  namespace: argocd
spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]
  generators:
  - clusters: 
      selector:
        matchLabels:
          tremolo.io/clustername: k8s-kubernetes-satelite
  template:
    metadata:
      name: '{{.name}}-guestbook' # 'name' field of the Secret
    spec:
      project: "default"
      source:
        repoURL: https://github.com/mlbiam/test-argocd-repo.git
        targetRevision: HEAD
        path: yaml
        directory:
          recurse: true
      destination:
        server: '{{.server}}' # 'server' field of the secret
        namespace: myns

The magic happens because we specify a cluster via label matching. The labels in the ApplicationSet line up with the label in our cluster Secret. Now that all of our objects are in place, we can test to see if this process works.

Part VI - Synchronization

I waited a minute to let everything catch up (eventual consistency is a lie!). I logged into ArgoCD and BAM! I now have an Application object, a cluster, and a synchronized repository!

ArgoCD synchronizing from a git repo to a remote repository using no static keys or credentials

ArgoCD registered a remote cluster without static credentials

So what's the full process? Look at the below diagram:

ArgoCD syncing to a remote cluster with a short lived token.

ArgoCD runs a sync process and needs to interact with the remote cluster, since the cluster is configured using a client go-sdk credential plugin, the application controller called our API with the Pod's projected ServiceAccount token.
OpenUnison generated a TokenReviewRequest to validate the token.
The API server responds with a response. If the token expired, or the token was associated with an expired Pod, this step would fail.
OpenUnison generates a token signed by its private key that will be trusted by our cluster's kube-oidc-proxy.
The client-go SDK uses the token returned by the credential plugin to synchronize our git repo into the remote cluster.

We're now successfully using ArgoCD with no long lived credentials!

Part VII - Operational Security and Multi-tenancy

This is great, a user can create an ApplicationSet that tells ArgoCD to securely generate a token for the remote cluster without the user knowing how to generate that token. Of course, this also means that a user could generate an ApplicationSet that could use someone else's vCluster! Good thing our control plane has GateKeeper on it so we can write a policy that will make sure ApplicationSets only get associated with the correct cluster. Even better, a mutating webhook that automatically sets the cluster information so the user doesn't even have to know. I'll keep working on this for the chapter, I'm really excited for how its coming together!

It hasn't been checked in yet, but all the code for the book is in GitHub. We have scripts for Vault, OpenSearch, GateKeeper, vCluster, OpenUnison, ... It's a great repo. This code will make it in there too once the last two chapters are done! The book is set to release in July, 2024 and is available for pre-order now.

April 8, 2024

Securing Multi-Cluster ArgoCD

Learn how to securely integrate ArgoCD with remote Kubernetes clusters with short lived tokens based on ArgoCD's Kubernetes identity.

Architecture