Supertubes automates the deployment of the ksqlDB event streaming database by introducing a new custom resource called KsqlDB. Supertubes provides two modes to manage ksqlDB backend(s):
Both methods use the KsqlDB Custom Resource Definition under the hood to manage ksqlDB instances.
Imperative management of ksqlDB instances π︎
β The Supertubes CLI provides commands to deploy ksqlDB instances with either default or custom settings with ease.
Note: To deploy ksqlDB instances or manage existing ones, run the
supertubes cluster ksql create
andsupertubes cluster ksql update
commands.
Declarative management of ksqlDB instances π︎
Managing ksqlDB instances with Supertubes is as simple as creating and updating the KsqlDB
custom resource. Supertubes automatically monitors the ksqlDB deployment and configuration settings specified using the KsqlDB
custom resource. For details on the custom resource, see the description of the custom resource.
These will perform the necessary steps to spin up new ksqlDB instances or reconfigure existing ones with the desired configuration.
Introduction to ksqlDB π︎
For a detailed description on how to manage ksqlDB with Supertubes, see our Managing ksqlDB with Supertubes blog post.
Modes of operation π︎
The ksqlDB server has two modes of operation: interactive and non-interactive (or headless) mode. For details, see the official ksqlDB documentation.
Supertubes supports both modes, and uses the interactive mode by default. To enable and configure ksqlDB in headless mode, see Running ksqlDB in headless mode.
Scaling by HPA π︎
Supertubes takes care of scaling ksqlDB using a Horizontal Pod Autoscaler (HPA). The twist here is that by default, HPAs only support scaling through basic CPU or memory usage. While thatβs generally enough for most workloads, in the case of ksqlDB itβs a much better to scale by consumer lag
.
When ksqlDB cannot keep up with the rate of messages produced on your Kafka topics, it can fall behind in its processing of incoming data. Scaling by consumer lag helps solve this issue far better than scaling by any traditional metric. In the Supertubes ecosystem, we already track consumer lag in our Prometheus instance.
To enable HPA to understand the consumer lag metrics, deploy the kube-metrics-adapter helm chart. An already deployed and configured HPA will do the rest for you.
# Default HPA configuration
scaling:
prometheusUrl: http://prometheus-operator-prometheus.supertubes-system.svc:9090
# Name of the ksqlDB streams that the PrometheusMetric will be filtered by
streams: []
# Minimum number of replicas
minValue: 1
# Maximum number of replicas
maxValue: 5
# Threshold for the hpa to activate
threshold: 30
Security π︎
Supertubes security features (like Kafka ACLs) apply to the ksqlDB deployment as well. The following sections detail the additional options that allow you to configure security for ksqlDB.
Authorization π︎
You can configure the authorization policy through the authorizations
field of the KsqlDB
custom resource. Only the listed principals can access the ksqlDB server.
You can list arbitrary number of KafkaUser
or ServiceAccount
entities in the specification.
Example authorization settings π︎
Here’s an example authorization spec, that allows traffic to the ksqlDB server for the user-1
user and the default
service account.
Authorizations:
- Principal:
Kind: KafkaUser
Namespace: kafka
Name: user-1
- Principal:
Kind: ServiceAccount
Namespace: kafka
Name: default
Access ksqlDB from outside the service mesh π︎
In order to access the ksqlDB from a CLI instance which is outside the service mesh, you have to configure the certificates manually.
- Extract the certificates from Istio as described in Client applications outside the Istio mesh.
- Use that certificate to configure the CLI as described in the ksqlDB’s documentation.
Access control π︎
Supertubes manages ACLs for ksqlDB
and even provides a way to fine grain your configuration through the KsqlDB Custom Resource Definition. For example:
...
Spec:
# Input topics to be used in ksql queries for reading
inputTopics: []
# Output topics to be used in ksql queries for write and create
outputTopics: []
...
The KsqlDB custom resource definition π︎
apiVersion: kafka.banzaicloud.io/v1beta1
kind: KsqlDB
metadata:
name: ksqldb-sample
namespace: kafka
spec:
# Name of the KafkaCluster custom resource that represents the Kafka cluster this ksqlDB instance to connect to
clusterRef:
name: kafka
# Name of the SchemaRegistry custom resource that represents the Schema registry to be made available for ksqlDB
schemaRegistryRef:
# Name of the KafkaConnect custom resource that represents the Kafka Connect to be made available for ksqlDB
kafkaConnectRef:
# Controls whether mTLS is enforced between ksqlDB and client applications (default: true)
MTLS: true
# Affinity settings for ksqlDB pods
# see https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity
affinity:
# Controls the list of principals who are authorized to access the ksqlDB REST API
authorizations:
# Settings for exposing ksqlDB REST API outside the Kubernetes cluster when running in interactive mode
externalEndpoint:
# Controls whether the ksqlDB is running in headless or interactive mode (default: false)
headless: false
# Heap settings for ksqlDB (default: -Xms512M -Xmx2G)
heapOpts: -Xms512M -Xmx2G
image:
# PullPolicy describes a policy for if/when to pull a container image
imagePullPolicy:
imagePullSecrets:
# Input topics to be used in ksql queries for reading
inputTopics:
# Output topics to be used in ksql queries for write and create
outputTopics:
# JmxExporterSpec defines the configuration for jmx exporter
jmxExporter:
# Defines the config values for ksqlDB
ksqlDBConfig:
# Node selector setting for ksqlDB pods
# https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector
nodeSelector:
# Annotations to be applied to ksqlDB pod
podAnnotations:
# Labels to be applied to ksqlDB pod
podLabels:
# Controls the name of the configmap which contains the ksqldb queries executed in headless mode. (default: <ksqldb cr name>-ksql-queries-configmap) Inside the configmap the query should be named as `queries.sql`
queryConfigMapName:
# Resources describes the compute resource requirements
# default:
# requests:
# cpu: 1
# memory: 1.5Gi
# limits:
# cpu: 2
# memory: 2.5Gi
resources:
# Defines HPA configurations
scaling:
# Service account for ksqlDB pod
serviceAccountName:
# Annotations to be applied on the service that exposes ksqlDB API on port `ServicePort`
serviceAnnotations:
# Labels to be applied to the service that exposes ksqlDB API on port `ServicePort`
serviceLabels:
# The port ksqlDB listens for REST API requests
servicePort:
# Toleration settings for ksqlDB pods
# see (https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/)
tolerations:
# Volume mounts for ksqlDB pods
# see (https://kubernetes.io/docs/concepts/storage/volumes/)
volumeMounts:
# Volumes for ksqlDB pods
# see (https://kubernetes.io/docs/concepts/storage/volumes/)
volumes:
The following KsqlDB
configurations are computed and maintained by Supertubes, and cannot be overridden:
- bootstrap.servers
- listeners
- ksql.schema.registry.url (if
schemaRegistryRef
is provided) - ksql.connect.url (if
kafkaConnectRef
is provided)
The default KsqlDB custom resource π︎
apiVersion: kafka.banzaicloud.io/v1beta1
kind: KsqlDB
metadata:
name: ksqldb-sample
spec:
clusterRef:
name: "kafka"