It’s no news that for quite a while our Apache Kafka on Kubernetes take, Supertubes has been happily running inside an Istio-based service mesh, in both single or multi-cluster setups across hybrid clouds. While we have touched on several aspects of the advantages Istio gave us, this post’s aim is to collect some of the issues, cornerstones and benefits.
We see the service mesh as a key component of every modern Cloud Native stack. To make this a reality, we are on a mission to make Istio simple to use and manage for everyone. We have built a product called Backyards (now Cisco Service Mesh Manager), the Banzai Cloud operationalized and automated service mesh, which makes setting up and operating an Istio-based mesh a cinch.
Check out Supertubes in action on your own clusters:
Register for an evaluation version and run a simple install command!
As you might know, Cisco has recently acquired Banzai Cloud. Currently we are in a transitional period and are moving our infrastructure. Contact us so we can discuss your needs and requirements, and organize a live demo.
Evaluation downloads are temporarily suspended. Contact us to discuss your needs and requirements, and organize a live demo.
supertubes install -a --no-demo-cluster --kubeconfig <path-to-k8s-cluster-kubeconfig-file>
or read the documentation for details.
Take a look at some of the Kafka features that we’ve automated and simplified through Supertubes and the Koperator , which we’ve already blogged about:
- Oh no! Yet another Kafka operator for Kubernetes
- Monitor and operate Kafka based on Prometheus metrics
- Kafka rack awareness on Kubernetes
- Running Apache Kafka over Istio - benchmark
- User authenticated and access controlled clusters with the Koperator
- Kafka rolling upgrade and dynamic configuration on Kubernetes
- Envoy protocol filter for Kafka, meshed
- Right-sizing Kafka clusters on Kubernetes
- Kafka disaster recovery on Kubernetes with CSI
- Kafka disaster recovery on Kubernetes using MirrorMaker2
- The benefits of integrating Apache Kafka with Istio
- Kafka ACLs on Kubernetes over Istio mTLS
- Declarative deployment of Apache Kafka on Kubernetes
- Bringing Kafka ACLs to Kubernetes the declarative way
- Kafka Schema Registry on Kubernetes the declarative way
- Announcing Supertubes 1.0, with Kafka Connect and dashboard
Kafka on Istio, the usual suspects (problems) 🔗︎
The internet is full of questions and problems reported by people struggling with running Istio and Kafka alongside each other. Most problems are related to communication or bootstrap, and they all come from one single source: the sidecar.
One of the major problems is that a sidecar is not yet a first class citizen in Kubernetes. It’s been coming for a while and was announced in the 1.18 release, however it’s been pushed back to 1.19. To understand the problem and the solution in more detail please check out our post about sidecars: Sidecar container lifecycle changes in Kubernetes. And why is this causing any issues, you wonder? Well, the main reason is that Kafka and it’s metadata store, Zookeeper are designed to have all the required resources available at startup time:
Note that Kafka and Zookeeper were designed for physical on-prem datacenters and while it works fairly well in the cloud, out of the box it is not ready to run on a dynamic environment as Kubernetes.
- Zookeeper tries to speak with quorum members. If the Envoy proxy is not ready yet it may occur that ZK members cannot create a quorum.
- Kafka tries to connect to Zookeeper. If the Envoy proxy is not ready, brokers will crash.
- Default Zookeeper installation binds only to the pod IP. This causes problems when using Istio, because the proxy sidecar wants to forward packets to the localhost address which is not listening on port 3888, resulting in “connection refused” errors. The end result is that the Zookeeper nodes are unable to elect a leader and the ensemble never starts.
- In older (<1.4.3) Istio versions, Pilot sends the whole configuration to the proxies, which causes the reloading of the entire configuration. During these reloads Envoy terminates all existing connections.
While the above are some of the most common runtime problems you might face, there are different new problems as well. Let’s assume that Kafka on Istio is already working and all of the typical Kafka and Zookeeper communication failures are fixed. What is one of the first areas of interest you would focus on? Yes, Security.
Kafka on Istio, security a different way 🔗︎
There are well defined ways of handling security on Kafka (proprietary) and on Kubernetes, and these don’t match. Getting them to just work without rewriting Kafka broker clients, persisting the existing ACLs and translating/enforcing them as K8s RBAC is an extremely hard challenge. There are several benefits in using Istio’s built-in security mechanism (more details in the next paragraph), because:
- It provides full mTLS inside and outside the cluster.
- mTLS can be used for all the components: Kafka, Zookeeper, Cruise Control, Mirror Maker - you don’t need to set up JKS truststores and keystores for each.
- If you have a client application accessing Kafka, you only have to drop it into the mesh and you get instant mTLS.
But coming back to the original question: How should you handle the fine-grained Kafka ACL’s while clients access brokers using client certificates, the whole mesh is secured with mTLS, and Envoy does the SSL termination? The new (Istio 1.5) Envoy Kafka protocol filter comes to our rescue. That, with a KafkaPrincipalBuilder provided by Supertubes makes the whole process transparent to broker clients, and users are bypassing Envoy (instead of Envoy sending back a PLAINTEXT anonymous principal).
Kafka on Istio, the benefits 🔗︎
Now let’s go through the benefits. I am not going to list all of them as we’ve blogged about several benefits (check out the Supertubes posts).
Security benefits 🔗︎
Developers and operators do not have to worry about implementing security features, they can rely on the transparent security features brought in by the service mesh:
- Accessing brokers outside or inside the mesh happens through mTLS and is provided by Istio.
- Certificate issuance and renewal are fully managed by Istio.
- There is up to 20% performance improvement just by relying on Istio’s mTLS.
- No modifications or reconfigurations are needed on the client side to make mTLS work.
- Secure Zookeeper quorum communication and access is provided for Kafka clusters installed with Supertubes on Kubernetes.
- Fine grained access, built on Kubernetes native building blocks.
Operational benefits 🔗︎
- The ability to span Kafka clusters across AZ, multiple datacenters or hybrid clouds with ease.
- Operators are leveraging the power of Kubernetes and there is no need to rely on Kafka-specific security implementations.
- All perimeter and access security relies on native Kubernetes concepts, managed by Istio.
- Out of the box observability: Supertubes is using federated Prometheus-based monitoring and provides deep insights, dashboards and alerts.
- As deep observability comes free with Supertubes, properly sizing a Kafka cluster on Kubernetes is super easy and becomes a simple iterative process based on metrics.
- There are fine-grained and multiple access gateways relying on the multi ingress gateway support of Istio.
- Extended Kafka protocol level metrics without client or broker modification.
- Locality based load balancing makes Kafka clusters spanning across multiple Kubernetes clusters operate efficiently in multiple clouds.
Additional Supertubes benefits 🔗︎
The features above are already built and provided by Supertubes but this is not all. While setting up a production-ready Apache Kafka cluster on Kubernetes becomes as simple as registering for an evaluation version and running a simple command to install the CLI tool, there is way more Istio can provide.
Evaluation downloads are temporarily suspended. Contact us to discuss your needs and requirements, and organize a live demo.
- With the new Envoy protocol filter, RBAC integration can be entirely handled by the filter. There is now a way to define more fine grained ACLs than topics. We can push down ACLs to Kafka partition level.
- The filter plugin can do major version protocol-level transformations. Even though clients might stay on older Kafka versions, the cluster itself can be upgraded, and version incompatibility handled at Kafka protocol filter level.
- Observability and management UI highlight the complete flow.
- Extended client throttling based not just on throughput (provided by Kafka already) but on other metrics as well.
And finally, Istio has introduced WebAssembly extensibility support and this brings a totally new option (and additional languages, other than C++) to write different filters for the chain.
Conclusion 🔗︎
Supertubes was designed to be a best-of-class implementation of Kafka on Kubernetes leveraging Cloud Native technologies. As such, we opted to integrate tightly with the Istio service mesh, which - among other things - brings a layer of security, manageability, along with performance benefits to Kafka. This is a particularly compelling package if you are a SaaS provider who wants to run Kafka “as a service” on your own Kubernetes infrastructure, on your own terms. Supertubes installs, configures, and manages all the components that are required for Kafka success on Kubernetes.
If you plan to run Kafka on Kubernetes, and interested to learn more about Supertubes, check out the product page or read the documentation.
About Supertubes 🔗︎
Banzai Cloud Supertubes (Supertubes) is the automation tool for setting up and operating production-ready Apache Kafka clusters on Kubernetes, leveraging a Cloud-Native technology stack. Supertubes includes Zookeeper, the Koperator , Envoy, Istio and many other components that are installed, configured, and managed to operate a production-ready Kafka cluster on Kubernetes. Some of the key features are fine-grained broker configuration, scaling with rebalancing, graceful rolling upgrades, alert-based graceful scaling, monitoring, out-of-the-box mTLS with automatic certificate renewal, Kubernetes RBAC integration with Kafka ACLs, and multiple options for disaster recovery.