This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Troubleshooting the operator

The following tips and commands can help you to troubleshoot your Koperator installation.

First things to do

  1. Verify that the Koperator pod is running. Issue the following command: kubectl get pods -n kafka|grep kafka-operator The output should include a running pod, for example:

    NAME                                          READY   STATUS      RESTARTS   AGE
    kafka-operator-operator-6968c67c7b-9d2xq   2/2     Running   0          10m
    
  2. Verify that the Kafka broker pods are running. Issue the following command: kubectl get pods -n kafka The output should include a numbered running pod for each broker, with names like kafka-0-zcxk7, kafka-1-2nhj5, and so on, for example:

    NAME                                       READY   STATUS    RESTARTS   AGE
    kafka-0-zcxk7                              1/1     Running   0          3h16m
    kafka-1-2nhj5                              1/1     Running   0          3h15m
    kafka-2-z4t84                              1/1     Running   0          3h15m
    kafka-cruisecontrol-7f77ccf997-cqhsw       1/1     Running   1          3h15m
    kafka-operator-operator-6968c67c7b-9d2xq   2/2     Running   0          3h17m
    prometheus-kafka-prometheus-0              2/2     Running   1          3h16m
    
  3. If you see any problems, check the logs of the affected pod, for example:

    kubectl logs kafka-0-zcxk7 -n kafka
    
  4. Check the status (State) of your resources. For example:

    kubectl get KafkaCluster kafka -n kafka -o jsonpath="{.status}" |jq
    
  5. Check the status of your ZooKeeper deployment, and the logs of the zookeeper-operator and zookeeper pods.

    kubectl get pods -n zookeeper
    

Check the KafkaCluster configuration

You can display the current configuration of your Kafka cluster using the following command: kubectl describe KafkaCluster kafka -n kafka

The output looks like the following:

apiVersion: kafka.banzaicloud.io/v1beta1
kind: KafkaCluster
metadata:
  creationTimestamp: "2022-11-21T16:02:55Z"
  finalizers:
  - finalizer.kafkaclusters.kafka.banzaicloud.io
  - topics.kafkaclusters.kafka.banzaicloud.io
  - users.kafkaclusters.kafka.banzaicloud.io
  generation: 4
  labels:
    controller-tools.k8s.io: "1.0"
  name: kafka
  namespace: kafka
  resourceVersion: "3474369"
  uid: f8744017-1264-47d4-8b9c-9ee982728ecc
spec:
  brokerConfigGroups:
    default:
      storageConfigs:
      - mountPath: /kafka-logs
        pvcSpec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi
      terminationGracePeriodSeconds: 120
  brokers:
  - brokerConfigGroup: default
    id: 0
  - brokerConfigGroup: default
    id: 1
  clusterImage: ghcr.io/banzaicloud/kafka:2.13-3.1.0
  cruiseControlConfig:
    clusterConfig: |
      {
        "min.insync.replicas": 3
      }      
    config: |
    ...
    cruiseControlTaskSpec:
      RetryDurationMinutes: 0    
  disruptionBudget: {}
  envoyConfig: {}
  headlessServiceEnabled: true
  istioIngressConfig: {}
  listenersConfig:
    externalListeners:
    - containerPort: 9094
      externalStartingPort: 19090
      name: external
      type: plaintext
    internalListeners:
    - containerPort: 29092
      name: plaintext
      type: plaintext
      usedForInnerBrokerCommunication: true
    - containerPort: 29093
      name: controller
      type: plaintext
      usedForControllerCommunication: true
      usedForInnerBrokerCommunication: false
  monitoringConfig: {}
  oneBrokerPerNode: false
  readOnlyConfig: |
    auto.create.topics.enable=false
    cruise.control.metrics.topic.auto.create=true
    cruise.control.metrics.topic.num.partitions=1
    cruise.control.metrics.topic.replication.factor=2    
  rollingUpgradeConfig:
    failureThreshold: 1
  zkAddresses:
  - zookeeper-server-client.zookeeper:2181
status:
  alertCount: 0
  brokersState:
    "0":
      configurationBackup: H4sIAAAAAAAA/6pWykxRsjLQUUoqys9OLXLOz0vLTHcvyi8tULJSSklNSyzNKVGqBQQAAP//D49kqiYAAAA=
      configurationState: ConfigInSync
      gracefulActionState:
        cruiseControlState: GracefulUpscaleSucceeded
        volumeStates:
          /kafka-logs:
            cruiseControlOperationReference:
              name: kafka-rebalance-bhs7n
            cruiseControlVolumeState: GracefulDiskRebalanceSucceeded
      image: ghcr.io/banzaicloud/kafka:2.13-3.1.0
      perBrokerConfigurationState: PerBrokerConfigInSync
      rackAwarenessState: ""
      version: 3.1.0
    "1":
      configurationBackup: H4sIAAAAAAAA/6pWykxRsjLUUUoqys9OLXLOz0vLTHcvyi8tULJSSklNSyzNKVGqBQQAAP//pYq+WyYAAAA=
      configurationState: ConfigInSync
      gracefulActionState:
        cruiseControlState: GracefulUpscaleSucceeded
        volumeStates:
          /kafka-logs:
            cruiseControlOperationReference:
              name: kafka-rebalance-bhs7n
            cruiseControlVolumeState: GracefulDiskRebalanceSucceeded
      image: ghcr.io/banzaicloud/kafka:2.13-3.1.0
      perBrokerConfigurationState: PerBrokerConfigInSync
      rackAwarenessState: ""
      version: 3.1.0
  cruiseControlTopicStatus: CruiseControlTopicReady
  listenerStatuses:
    externalListeners:
      external:
      - address: a0abb7ab2e4a142d793f0ec0cb9b58ae-1185784192.eu-north-1.elb.amazonaws.com:29092
        name: any-broker
      - address: a0abb7ab2e4a142d793f0ec0cb9b58ae-1185784192.eu-north-1.elb.amazonaws.com:19090
        name: broker-0
      - address: a0abb7ab2e4a142d793f0ec0cb9b58ae-1185784192.eu-north-1.elb.amazonaws.com:19091
        name: broker-1
    internalListeners:
      plaintext:
      - address: kafka-headless.kafka.svc.cluster.local:29092
        name: headless
      - address: kafka-0.kafka-headless.kafka.svc.cluster.local:29092
        name: broker-0
      - address: kafka-1.kafka-headless.kafka.svc.cluster.local:29092
        name: broker-1
  rollingUpgradeStatus:
    errorCount: 0
    lastSuccess: ""
  state: ClusterRunning

Getting Support

If you encounter any problems that the documentation does not address, file an issue or talk to us on the Banzai Cloud Slack channel #kafka-operator.

Various support channels are also available for Koperator.

Before asking for help, prepare the following information to make troubleshooting faster:

  • Koperator version
  • Kubernetes version (kubectl version)
  • Helm/chart version (if you installed the Koperator with Helm)
  • Koperator logs, for example kubectl logs kafka-operator-operator-6968c67c7b-9d2xq manager -n kafka and kubectl logs kafka-operator-operator-6968c67c7b-9d2xq kube-rbac-proxy -n kafka
  • Kafka broker logs
  • Koperator configuration
  • Kafka cluster configuration (kubectl describe KafkaCluster kafka -n kafka)
  • ZooKeeper configuration (kubectl describe ZookeeperCluster zookeeper-server -n zookeeper)
  • ZooKeeper logs (kubectl logs zookeeper-operator-5c9b597bcc-vkdz9 -n zookeeper) Do not forget to remove any sensitive information (for example, passwords and private keys) before sharing.

1 - Common errors

Upgrade failed

If you get the following error in the logs of the Koperator, update your KafkaCluster CRD. This error typically occurs when you upgrade your Koperator to a new version, but forget to update the KafkaCluster CRD.

Error: UPGRADE FAILED: cannot patch "kafka" with kind KafkaCluster: KafkaCluster.kafka.banzaicloud.io "kafka" is invalid

The recommended way to upgrade the Koperator is to upgrade the KafkaCluster CRD, then update the Koperator. For details, see Upgrade the operator.