Ever since its inception, Supertubes has been about asking the question, “What if?” What if we built Kafka on the solid foundation of Istio and leveraged it and the Envoy proxy’s powerful networking capabilities to combine the two into something even more powerful? Today, along with releasing ksqlDB support for Supertubes, we’re engaging in even more speculation about what Istio and Kafka can do, together, to improve our lives.
Over the years, Kafka has grown itself into a whole platform with an entire ecosystem built around it, consisting of components like Kafka Connect, Kafka Schema Registry, and Mirror Maker 2. KsqlDB was one of the last things in the ecosystem to be supported by Supertubes, that is until today.
Of course, as always, we wanted to add the secret Banzai Cloud sauce to the recipe, and elevate it to the next level. In this case, that meant making it more secure with Istio, and taking some of the more tedious parts of ksqlDB management off our customer’s shoulders.
So what’s in the box? π︎
In Supertubes, we made ACL’s first-class citizens of Kubernetes. That should make a lot of sense, since we like to blur the line between Kafka, Istio and Kubernetes, making them into one big ecosystem that works together instead of against each other, and, more importantly, instead of against us. We tend to save a lot of frustration, time and money as part of this process.
It also helps to integrate Kafka more and more with our pre-existing, everyday Kubernetes tools like kubectl. Furthermore, with the help of our operators, we can deploy them declaratively from a CI/CD pipeline, and automate their handling in a dynamic way - which I’m sure most of you won’t miss doing, in the slightest.
Assuming that you already have a Kubernetes cluster with the latest Supertubes installed on it, and have a working ksqlDB installation, all you have to do is deploy a KsqlDB
custom resource to the cluster.
Fortunately, Supertubes CLI - which, in this case, provides some validation - makes this easy to do.
supertubes cluster ksqldb create -f <path-to-ksqldb-cr-file> -n my-namespace
or just apply the custom resource file with a plain kubectl apply
.
Here’s a simple example that will create a pre-configured ksqlDB cluster in interactive mode, that references pre-existing Schema Registry and Kafka Connect custom resources.
apiVersion: kafka.banzaicloud.io/v1beta1
kind: KsqlDB
metadata:
name: ksqldb-sample
spec:
clusterRef:
name: "my-kafka-cluster"
schemaRegistryRef:
name: "my-schema-regitry"
kafkaConnectRef:
name: "my-kafka-connect"
This will also create all the ACLs that are needed for both the interactive and headless modes of ksqlDB’s internal operations, allowing ksqlDB to manage its record processing log topic, and to produce the command topic as well.
To allow the ksqlDB cluster to work on our input and output topics, we will first have to provide them in the Spec part of the custom resource.
Input topics are topics you want to read data from, and output topics are topics you want to write to - these can be newly created topics as well e.g. when you create a stream. If you want to chain together queries in a way that one query’s output topic is another query’s input topic, you have to provide them in both places.
Supertubes will then generate the required ACLs for ksqlDB to operate on them. It’s easier if you know your input and output topics in advance when you operate your ksqlDB instance in headless mode - since the queries are already there when you start the instance - but we provide the same functionality in interactive mode, if you have some favorite topics you always want to read from.
...
Spec:
inputTopics:
- kafkacat-airports
outputTopics:
- AIRPORTS_ALL
...
Just scale it out π︎
We take care of scaling ksqlDB using an HPA. The twist here is that, by default, HPAs only support scaling through basic CPU or memory usage, and, while that’s generally enough for most workloads, in the case of ksqlDB, it’s a much better idea to scale by consumer lag
.
When ksqlDB cannot keep up with the rate of messages produced on your Kafka topics, it can fall behind in its processing of incoming data. Scaling by consumer lag will help solve this issue far better than scaling by any traditional metric. In the Supertubes ecosystem, we already track consumer lag in our Prometheus instance. You just have to enable HPA to understand these metrics by deploying the kube-metrics-adapter helm chart. An already deployed and configured HPA will do the rest for you.
Let’s see another method by which Supertubes can improve our lives when it comes to ksqlDB security.
Securing ksqlDB π︎
When it comes to securing ksqlDB, we have to distinguish between two modes: headless and interactive. Running ksqlDB in interactive mode is more complex than running ksqlDB in headless mode.
Securing ksqlDB running in interactive mode π︎
In the case of interactive mode, ksqlDB enables REST API endpoints, which require additional configuring in order to make them secure.
The plain ksqlDB way π︎
ksqlDB provides support for authenticating and encrypting client-server communication using HTTP Basic Authentication and TLS for RESTful and WebSocket endpoints. In order to enable encryption, you need to provide the following configuration parameters:
listeners=https://hostname:port
ssl.keystore.location=/ssl/certs/keystore.jks
ssl.keystore.password=supersecure
This configuration may not seem complex at first glance. However, importing new or renewed TLS certificates to the keystore makes it a bit cumbersome to maintain.
ksqlDB’s built-in authentication uses a basic HTTP authentication mechanism. That means it can be configured to require users to authenticate using a username and password. Moreover, it provides role-base authorization by specifying which roles can access the server.
Configuring authentication requires a standard jaas
file, which will define how the server authenticates the users.
The simplest example of this is when the jaas
file contains a path to a password file, which might look like this:
marty: delorean,user,admin
mcfly: drbrown,user,developer
This same file will contain the role of the user, which is matched against a config value in the ksqlDB server. Connecting to the server from the client side requires that you provide a username and password:
bin/ksql --user marty --password delorean http://localhost:8088
We just rapidly went through how to configure ksqlDB security when interactive mode is enabled. Now, let’s take a deep dive into how Supertubes does all of this configuring.
Securing ksqlDB with Supertubes π︎
Supertubes integrates KsqlDB in a unique way, mostly by leveraging Istio. We tried to make our customers’ lives a little easier by reducing the amount of time they had to deal with configuration. With Supertubes, ksqlDB is pre-configured to use mTLS and advanced authorization, out-of-the-box. And we ditched the whole HTTP-Basic authorization and replaced it with Istio’s Authorization Policy.
As the official documentation states, auth policies can be used to enable access control on workloads in the mesh.
It supports both allow and deny policies, to enable a fine grade of access control.
Moreover, it allows us to narrow down access control for REST endpoints.
Users can set up access to the endpoint /info
but, at the same time, can deny any POST
calls.
This table helps to clarify the differences between HTTP-Basic and Authorization Policies:
HTTP-Basic | Istio’s Authorization Policy | |
---|---|---|
Works without certificate | Yes | No |
Supports role based access | Yes | Yes |
Fine grained access control | No | Yes |
Requires client side configuration | Yes | No(only if app is outside of the mesh) |
Example Authorization Policy:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: ...
namespace: ...
spec:
selector:
matchLabels:
app: ksqldb
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/<principal-id>"]
to:
- operation:
methods: ["GET"]
paths: ["/..."]
To authenticate the client, it uses the certificate’s SAN URI
field, which, if in the form of an Istio generated secret, takes the format of:
spiffe://cluster.local/ns/<client-app-namespace>/sa/<client-app-service-account>
To allow such an application to connect to KsqlDB, the AuthorizationPolicy of the KsqlDB deployment would have to include:
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/<client-app-namespace>/sa/<client-app-service-account>"]
Client application types π︎
We distinguish between three different client types based on where they run:
- Client applications which reside inside the same Istio mesh as the ksqlDB server
- Client applications which reside on the same Kubernetes cluster as the ksqlDB server but outside of the Istio mesh
- Client applications which reside outside of the Kubernetes cluster
Weβll look at each of these separately.
Clients running inside the mesh π︎
When client applications that connect to ksqlDB run in the same Istio mesh, they donβt need to send certificates. Istio provides one for them, out-of-the-box.
This certificate is special, in that it carries information about the namespace and the service account of the application.
The Authorization policy will collect the SAN URI
information from it, and use that as the application’s identity.
Client running outside of the mesh π︎
When a client application connects from outside the Istio mesh, the value of the SAN URI
field (extracted from the certificate of the client application) is used as the applicationβs identity.
This certificate is special, in that it must contain the required SAN URI
fields.
This can be either generated by a tool like CertManager or Vault, but a KafkaUser resource will do the trick as well.
As of now, KafkaUser custom resources generate certificates which include all the fields required to use ksqlDB with Supertubes.
Client running outside of the Kubernetes cluster π︎
When the client application is external to the Kubernetes cluster, this flow differs from the above in that the traffic from the application passes through a LoadBalancer and Ingress gateway.
Securing ksqlDB running in headless mode π︎
Using headless mode, ksqlDB will not initialize any REST endpoint. That means that securing the communication between components is enough. Since Supertubes uses and configures Istio behind the scenes, no additional changes are required, either on the ksqlDB-side or on any service it connects to. Thus, we get mTLS out of the box when we use ksqlDB with Supertubes.
Summary π︎
Today we are glad to announce support for ksqlDB which means now all the major Kafka companion products shipped with Supertubes including Schema Registry, Kafka Connect and ksqlDB. Supertubes bundles Istio, Kubernetes and Kafka in a unique way which makes our customers life easier. We are not done yet though, and we plan to introduce additional improvements, features and capabilities extending the number of use cases where Supertubes can help. Stay tuned.