A few days ago we published a blog post about how to use Apache Kafka’s proprietary authentication and authorization (ACL) with Kubernetes Service Accounts via Supertubes. If you skimmed that post or have actually used it with Supertubes, you might think that this is (as Shakespeare put it) much ado about nothing, as things just work out-of-the-box.
As the CTO of Banzai Cloud, and someone involved in the details of our product development, I can confirm that this has been our single biggest engineering push of 2020 so far (spoiler alert - that record won’t last long: there are three or four parallel efforts already challenging it). Nevertheless, the view from 10,000 feet is unchanged:
- your Kafka clients are the same,
- the ACLs stay the same, however,
- you are suddenly on track to using Kubernetes RBAC to control Kafka’s authentication and authorization.
The purpose of this post is not to go into details about how Kafka ACL on Kubernetes works, but to highlight some of its more under-discussed aspects - a quick glance under the hood if you like - and a reflection on building non-intrusive software.
Simple is hard 🔗︎
See, us engineers, we like building complex things - the more complex the better - and, as a result, we often don’t focus enough on the end user and on legacy software. Similarly, here in Banzai-land we love building complex things, but with the twist that whatever we build on Kubernetes must converge toward two points:
- Heroku’s simplicity and
- Kubernetes’ flexibility.
Be it Pipeline, our container management platform for building hybrid clouds, PKE, our CNCF certified Kubernetes distribution, Backyards, the Istio distribution that just works or the subject of the post, Supertubes - the primary objective is to make using our software simple.
Making a feature simple to use is often difficult, and potentially unrewarding. Many competing container management platforms (and there are plenty out there) check off a feature by doing a simple
helm install behind the scenes. However, the rub lies in the flexibility, integration, edge cases and our obsession with going the extra mile - solutions that are usable out-of-the-box, production-ready defaults, but with replaceable batteries (not just included).
But enough rambling, let’s get back to business.
The Kafka ACL problem 🔗︎
We wanted to accomplish two things without having to modify Kafka client libraries or code:
- Authenticate Kafka clients using their Kubernetes namespaces and service accounts
- Get the ACLs defined for the
<namespace>-<serviceaccount>Kafka user to automatically apply for client applications
If you’re familiar with Supertubes, you know that we run Kafka inside an Istio servicemesh (with all the benefits that implies). This has been part of an incremental process on the road toward “perfection”. Initially, we started using Envoy alongside each broker to control access, and benefit from Envoy’s protocol filters for Kafka. As we’ve been deploying larger and larger Kafka clusters, we needed a way to control the Envoy data plane, and this is what Istio does as a control plane. We are big Istio users and fans, and have an Istio distribution called Backyards, thus we added a few additional Istio benefits to Supertubes, for example, automatic mTLS, certificate rotation and management, WASM filters, and so on.
Simplifying things a bit, the authentication and authorization flow looks like this under the hood:
- When a Kafka client application runs inside an Istio mesh where mTLS communication is provided by Istio, Istio generates a certificate for the client application, which is used during the mTLS session. This Istio-generated certificate contains the namespace and the name of the service account of the client application.
- Our Kafka ACL WASM filter for Envoy checks this certificate when traffic from the client application flows towards a broker. It extracts the namespace and the service account information, then mutates the traffic by injecting
- The Kafka brokers receive this mutated stream that contains
<namespace>-<serviceaccount>. Supertubes reads the
<namespace>-<serviceaccount>from the TCP stream and uses it as the Principal that represents the client application.
From the client application’s point of view this is the least intrusive solution, as it doesn’t require any changes.
Note: There are other solutions out there which provide a custom library that reads the service account of the client application and passes that info to Kafka through the SASL mechanism. In my opinion, this solution is relatively intrusive, since you have to modify the client application to interact with Kafka through this library (which may not be available in all programming languages).
Complexity under the hood 🔗︎
So what did it take to achieve this level of simplicity for the end user?
- A WASM filter for Envoy in C++ to extract the client application identity from the presented certificate
- New ChannelBuilder for PLAINTEXT protocol in Kafka for consuming the additional info (client app identity) sent by the WASM filter
- New Authenticator in Kafka to authenticate the client app identity passed in by the WASM filter
- Istio Pilot fix/enhancement to properly use the portNumber filter for the listener matching specified in EnvoyFilter. This was needed in order to be able to hook the WASM filter only to specific ports
- Istio Pilot enhancement to allow the PASSTHROUGH Ingress gateway to forward the original TLS package received from downstream to upstream to the target recipient
- A Koperator change to remove Kafka users’ hard dependency on cert manager/Vault
- A Koperator change to setup Kafka superusers for service accounts used by the Koperator , cruise control, and minion
This project turned out to be more complex than we originally envisioned, but it was worth the effort, as we managed to radically simplify the user experience and align legacy Kafka mechanisms with their Cloud Native equivalents. In the process, we managed to reuse a lot of know-how and components between our teams to uniquely benefit our customers. We believe that simplicity is a competitive advantage, but of course only if it is as simple as it can be, and no simpler…
And yes, we know we’re not the first to have made this point. ;)