The content of this page hasn't been updated for years and might refer to discontinued products and projects.

Supertubes

Using ksqlDB with Supertubes

Supertubes automates the deployment of the ksqlDB event streaming database by introducing a new custom resource called KsqlDB. Supertubes provides two modes to manage ksqlDB backend(s):

imperative mode, and
declarative mode.

Both methods use the KsqlDB Custom Resource Definition under the hood to manage ksqlDB instances.

Imperative management of ksqlDB instances 🔗︎

The Supertubes CLI provides commands to deploy ksqlDB instances with either default or custom settings with ease.

Note: To deploy ksqlDB instances or manage existing ones, run the supertubes cluster ksql create and supertubes cluster ksql update commands.

Declarative management of ksqlDB instances 🔗︎

Managing ksqlDB instances with Supertubes is as simple as creating and updating the KsqlDB custom resource. Supertubes automatically monitors the ksqlDB deployment and configuration settings specified using the KsqlDB custom resource. For details on the custom resource, see the description of the custom resource.

These will perform the necessary steps to spin up new ksqlDB instances or reconfigure existing ones with the desired configuration.

Introduction to ksqlDB 🔗︎

For a detailed description on how to manage ksqlDB with Supertubes, see our Managing ksqlDB with Supertubes blog post.

Modes of operation 🔗︎

The ksqlDB server has two modes of operation: interactive and non-interactive (or headless) mode. For details, see the official ksqlDB documentation.

Supertubes supports both modes, and uses the interactive mode by default. To enable and configure ksqlDB in headless mode, see Running ksqlDB in headless mode.

Scaling by HPA 🔗︎

Supertubes takes care of scaling ksqlDB using a Horizontal Pod Autoscaler (HPA). The twist here is that by default, HPAs only support scaling through basic CPU or memory usage. While that’s generally enough for most workloads, in the case of ksqlDB it’s a much better to scale by consumer lag.

When ksqlDB cannot keep up with the rate of messages produced on your Kafka topics, it can fall behind in its processing of incoming data. Scaling by consumer lag helps solve this issue far better than scaling by any traditional metric. In the Supertubes ecosystem, we already track consumer lag in our Prometheus instance.

To enable HPA to understand the consumer lag metrics, deploy the kube-metrics-adapter helm chart. An already deployed and configured HPA will do the rest for you.

  # Default HPA configuration
  scaling:
    prometheusUrl: http://prometheus-operator-prometheus.supertubes-system.svc:9090
    # Name of the ksqlDB streams that the PrometheusMetric will be filtered by
    streams: []
    # Minimum number of replicas
    minValue: 1
    # Maximum number of replicas
    maxValue: 5
    # Threshold for the hpa to activate
    threshold: 30

Security 🔗︎

Supertubes security features (like Kafka ACLs) apply to the ksqlDB deployment as well. The following sections detail the additional options that allow you to configure security for ksqlDB.

Authorization 🔗︎

You can configure the authorization policy through the authorizations field of the KsqlDB custom resource. Only the listed principals can access the ksqlDB server.

You can list arbitrary number of KafkaUser or ServiceAccount entities in the specification.

Example authorization settings 🔗︎

Here’s an example authorization spec, that allows traffic to the ksqlDB server for the user-1 user and the default service account.

  Authorizations:
    - Principal:
        Kind: KafkaUser
        Namespace: kafka
        Name: user-1
    - Principal:
        Kind: ServiceAccount
        Namespace: kafka
        Name: default

Access ksqlDB from outside the service mesh 🔗︎

In order to access the ksqlDB from a CLI instance which is outside the service mesh, you have to configure the certificates manually.

Extract the certificates from Istio as described in Client applications outside the Istio mesh.
Use that certificate to configure the CLI as described in the ksqlDB’s documentation.

Access control 🔗︎

Supertubes manages ACLs for ksqlDB and even provides a way to fine grain your configuration through the KsqlDB Custom Resource Definition. For example:

...
Spec:
  # Input topics to be used in ksql queries for reading
  inputTopics: []
  
  # Output topics to be used in ksql queries for write and create
  outputTopics: []
...

The KsqlDB custom resource definition 🔗︎

apiVersion: kafka.banzaicloud.io/v1beta1
kind: KsqlDB
metadata:
  name: ksqldb-sample
  namespace: kafka
spec:
  # Name of the KafkaCluster custom resource that represents the Kafka cluster this ksqlDB instance to connect to
  clusterRef:
    name: kafka

  # Name of the SchemaRegistry custom resource that represents the Schema registry to be made available for ksqlDB
  schemaRegistryRef:

  # Name of the KafkaConnect custom resource that represents the Kafka Connect to be made available for ksqlDB
  kafkaConnectRef:

  # Controls whether mTLS is enforced between ksqlDB and client applications (default: true)
  MTLS: true

  # Affinity settings for ksqlDB pods
  # see https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity
  affinity:

  # Controls the list of principals who are authorized to access the ksqlDB REST API
  authorizations:

  # Settings for exposing ksqlDB REST API outside the Kubernetes cluster when running in interactive mode
  externalEndpoint:

  # Controls whether the ksqlDB is running in headless or interactive mode (default: false)
  headless: false

  # Heap settings for ksqlDB (default: -Xms512M -Xmx2G)
  heapOpts: -Xms512M -Xmx2G

  image:

  # PullPolicy describes a policy for if/when to pull a container image
  imagePullPolicy:

  imagePullSecrets:

  # Input topics to be used in ksql queries for reading
  inputTopics:
  
  # Output topics to be used in ksql queries for write and create
  outputTopics:

  # JmxExporterSpec defines the configuration for jmx exporter
  jmxExporter:

  # Defines the config values for ksqlDB
  ksqlDBConfig:

  # Node selector setting for ksqlDB pods
  # https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector
  nodeSelector:

  # Annotations to be applied to ksqlDB pod
  podAnnotations:

  # Labels to be applied to ksqlDB pod
  podLabels:

  # Controls the name of the configmap which contains the ksqldb queries executed in headless mode. (default: <ksqldb cr name>-ksql-queries-configmap) Inside the configmap the query should be named as `queries.sql`
  queryConfigMapName:

  # Resources describes the compute resource requirements 
  # default: 
  #   requests:
  #     cpu: 1 
  #     memory: 1.5Gi 
  #   limits: 
  #     cpu: 2 
  #     memory: 2.5Gi
  resources:

  # Defines HPA configurations
  scaling:

  # Service account for ksqlDB pod
  serviceAccountName:

  # Annotations to be applied on the service that exposes ksqlDB API on port `ServicePort`
  serviceAnnotations:

  # Labels to be applied to the service that exposes ksqlDB API on port `ServicePort`
  serviceLabels:

  # The port ksqlDB listens for REST API requests
  servicePort:

  # Toleration settings for ksqlDB pods
  # see (https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/)
  tolerations:

  # Volume mounts for ksqlDB pods
  # see (https://kubernetes.io/docs/concepts/storage/volumes/)
  volumeMounts:

  # Volumes for ksqlDB pods
  # see (https://kubernetes.io/docs/concepts/storage/volumes/)
  volumes:

The following KsqlDB configurations are computed and maintained by Supertubes, and cannot be overridden:

bootstrap.servers
listeners
ksql.schema.registry.url (if schemaRegistryRef is provided)
ksql.connect.url (if kafkaConnectRef is provided)

The default KsqlDB custom resource 🔗︎

apiVersion: kafka.banzaicloud.io/v1beta1
 kind: KsqlDB
 metadata:
   name: ksqldb-sample
 spec:
   clusterRef:
     name: "kafka"