TL;DR
- Yes, it is
- Evaluate multi-tenancy as a platform, not just Kubernetes
- Building multi-tenant clusters overlaps with work for single tenant clusters
I get into discussions on whether Kubernetes is "designed" to be multi-tenant on a regular basis in Slack channels and on Twitter. These discussions often focus on specific aspects of security. In this post I want to address several of these common threads and provide context to support the assertion that yes, Kubernetes is designed to be multi-tenant.
Kubernetes Is Not Designed to Be Multi-Tenant
This is a common thread I hear. It generally is "You can't do multi-tenancy out of the box with Kubernetes without special tools and configuration, so it wasn't designed to support it." There are several problems with this argument:
- Kubernetes is not a platform, it's a platform for building platforms. Assuming that everything that you do with Kubernetes has to be built directly into it's API contradicts it's core design principals of being extensible.
- Kubernetes 1.0 included Namespaces, Secrets, and ServiceAccounts. Even from the beginning Kubernetes was built to provide workloads with identity and to separate workloads by a logical boundary. As an example. you can't reference a Secret from a different Namespace in your Pod.
- The first "enterprise" capable Kubernetes distribution, OpenShift, has been multi-tenant since 2016.
The first bullet is really important. Kubernetes does not intend to be a Platform as a Service, or PaaS. It's designed to be a starting point for building an opinionated platform. Statements that it's not multi-tenant or that it may be "technically feasible" forget that the entire point of Kubernetes is to build platforms for your needs, not to be a platform in-and-of-its-self.
Another reasons that it's unreasonable to expect Kubernetes to have all multi-tenant configuration built in is that multi-tenancy will be specific to each implementation. We spend quite a bit of time helping customers build multi-tenant clusters, and each one is different because every organization is different. We have customers that use a request/approval model for creating namespaces, supplying cost centers for charge-back. We have other customers that drive tenancy based on Active Directory groups, not requiring any approval process. This logic is both too complex and specific to include directly in the Kubernetes API. It's important to remember that Kubernetes has the foundation for building these processes, it's not meant to implement it on it's own.
The fact that Kubernetes doesn't try to implement these multi-tenant features on it's own, but instead supplies primitives to build a multi-tenant platform from, is an important advantage and lesson learned from the PaaS platforms of the past. One of the reasons why Kubernetes has overtaken almost every legacy PaaS platform is that it gives implementers the room to implement their own opinions rather imposing it's own opinions on them. Security is an important set of opinions that will be different for each organization, and specifically the authorization process.
The reason these requirements are so different is because of the natural silos in each enterprise. The person who is responsible for, and who's pay check and bonus effected by, the up-time and roll-out of an application are not the same person that who owns the shared compute infrastructure. When it's your paycheck and your bonus that's on the line, you want as much control as possible. You want to control who has access and how to access. If you're not able control something, you're not going to accept responsibility for it. There's no way for the Kubernetes developers to understand these dynamics for each organization, so it's left to implementers (correctly) to build as part of a platform. This is very different from "it's not supported natively so it shouldn't be done at all".
This is similar to the reasons why Ian Coldwater, and many others, point out that Kubernetes isn't secure by default. It's not because it can't be secured, but because "security" will mean different things to different people and trying to embody all those different definitions into a single implementation of security is both impossible and unreasonable to put on the Kubernetes developers. The wide breath of options in both commercial and open source solutions shows just how powerful this approach is.
Instead of focusing on if Kubernetes is a multi-tenant system, focus on it your platform is multi-tenant. The primitives are there for building out multi-tenancy. Choose your platform based on how it's opinions align with your organization's own design. Kubernetes is just one part of your platform. See how multi-tenancy works with your CI/CD pipeline, your container registry, your monitoring solution, your service mesh, etc before declaring whether or not a multi-tenant is right for your clusters.
Multi-Tenant Clusters Can't Protect Nodes
This argument takes a few models. The first is that containers do not isolate well on their own, which is true, so you can't trust two tenants to run on the same node. There are multiple ways to handle this. One of the simplest is to use Pod Security Policies (PSP). It will be pointed out that PSPs are going away. That is 100% true. They won't be removed until the earliest at 1.25 which is at best 15 months away. Now assuming you're using a commercial distribution of some kind you're at least one version behind, more likely two so as of today (December 2020), you likely have at least until mid-late 2022 before the removal of PSP. There is no drop in replacement right now. GateKeeper is getting there, you can write the policies but it doesn't handle the mutation aspect of PSPs (at KubeCon NA 2020 I was told in the hallway track that it's about six months out). I'm going to guess that by late 2021 we'll start to see GateKeeper being able to take over for PSP.
Does this mean you shouldn't use PSPs at all? No. I spend some time talking about what makes PSPs so hard in my old post of Kubernetes Security Myths. Most PSP implementations look something like this:
- Default deny-all policy
- Privileged policies for your storage, network, node, and most control plane pods
- Ingress controller is allowed to open 443 and 80
This covers the vast majority of workloads. Is migrating to something new going to be painful? Yes. Is it worth it to have isolation on your nodes? I'd say so. If you're like most enterprises, you're not running the last Kubernetes to be released. Your probably at best 1-2 versions behind if not farther. If you're reading this and are using PSPs, make sure to let the sig-auth and sig-security teams know. Also let your voice be heard on github!
Let's look at this from another perspective. Let's say for the sake of argument you should use the cluster as the tenant boundary. Does that remove the need to protect the nodes? I doubt any security expert would argue it would. An application vulnerability can still lead to a breach and getting access to the node through an escape. An attacker can still use the node to run bitcoin miners, start a ransomware attack, or get access to credentials. Granted the blast radius is smaller, but that doesn't mean you won't go through the work of protecting nodes anyways, right? Of course you will. So whether it's via PSPs or some generic OPA policies you will go through the work of locking down container processes on your nodes regardless of if the node hosts multiple tenants. The protection of the nodes turns out not to be a mute point against multi-tenancy because a secure Kubernetes environment will address it regardless of it it's single tenant or multi-tenant.
RBAC is Hard, I'm Going to Only Allow Interaction via CI/CD or GitOps
This one didn't come in the latest conversations but it does come up often. The idea is that I'm only going to enable access to a cluster via a higher level construct like a pipeline or GitOps controller. The first thing to point out is that you haven't eliminated multi-tenancy authorization issues, you just moved it. Maybe your CI/CD tools have great multi-tenancy? It's a possibility but it hasn't been my experience. We'll assume that multi-tenancy and security is great with those platforms. The next question becomes what can the ServiceAccount that's used by your pipeline or GitOps controller do? Is it a cluster-admin? Even if your tenant has the entire cluster to themselves, do you want them able to create pods in kube-system? Probably not. You now need to manage and maintain RBAC rules for the service account. If your tenant boundary is at the cluster, then these RBAC policies need to be replicated across clusters and maintained. Aggregate Roles would certainly help but that can still be difficult to maintain especially as new custom resources become available. Your back to managing and maintaining RBAC. If your tenant boundary were the Namespace instead of the cluster you could automate this more easily and consistently. In fact Kubernetes comes with an aggregate role called admin that gives you most of the capabilities that you'd need for almost all user-land applications. Regardless of if you're boundary is the Namespace or the cluster, you're going to need to do some work on RBAC. You'll also still need to work within the multi-tenant capabilities of the rest of your platform too.
Another potential issue with this approach is it puts the Kubernetes team between an application owner and their application. Every time an app owner wants something they're reliant on the k8s team to implement it. If the k8s team becomes a bottleneck this can cause political issues. I've seen his happen multiple times where an over-engineered PaaS can lead to the cloud team being the reason why applications can't move forward, which is the opposite of what you're likely trying to accomplish! Work with the primitives that Kubernetes has to build your platform, don't try to over-shadow them or hide them.
Let Each Silo Have It's Own Clusters
I did mention that enterprises are siloed and that those silos have an impact on how Kubernetes is implemented. You could argue that each silo should have it's own cluster. There's valid logic to this. You need to be careful of where the silo is. Anything more then a single application team, and you now have a multi-tenant cluster! Determining how to divide up clusters is a risk management and communications exercise as much as it is a technical endeavor. Just focusing on the technology aspect won't solve your issues.
Let's again assume for the sake of argument you'll give each application team it's own cluster. This aligns to how most enterprises assign VMs today. Each application team is made up of developers, contractors, admins, managers, etc. Each cluster has the application as well as monitoring, ingress and other systems that need to be accessed. So now instead of having VM sprawl, where you likely have some centralized control over authentication and authorization, you now have cloud sprawl. You've solved the issue of container multi-tenancy, but built a harder to manage and easier to compromise system. There's a happy medium. Identifying which silos are important to respect gives you a balance between the manageability of shared resources while operating within the same constraints as your enterprise leading to a more secure platform.
How Do I Start Building a Multi-Tenant Cluster?
There are several technologies and tools out there. If you're interested in learning how authentication and authorization work in Kubernetes, we have published a self contained "lab" that is available on github. You'll need a VM with Ubuntu 18.04 and Active Directory to get started. Our blog section, The Whammy Bar, has several posts about building out multi-tenancy. If you want a really deep dive into how Kubernetes authentication, authorization, PSPs, Gatekeeper, and pipelines work I was honored to co-author Kubernetes and Docker, An Enterprise Guide. Finally, I would humbly submit OpenUnison as a great way to start your multi-tenant approach!