Banzai Cloud is now part of Cisco

Banzai Cloud Logo Close
Home Products Benefits Blog Company Contact

Istio configuration validation with Backyards

Author Zsolt Varga

The content of this page hasn't been updated for years and might refer to discontinued products and projects.

If you’re an active Istio user, then there’s a good chance that Istio’s configuration reference is bookmarked in your browser, and that you’ve read the pages on VirtualServices, and ServiceEntries over and over, but still have to struggle to set up even simple configurations in your mesh.

Istio’s custom resource configuration is very powerful and flexible, but infamous for being overly complex. At its best, its YAML consists of lists of lists, cross-references, conflicting fields, and wildcards.

Even though Istio’s maintainers are aware of this hyper-complexity, and - at least in the last few releases - have tried to bring user friendliness into focus, Istio still routinely strands us in quagmires of minutia and uncertainty. We’re down to ~25 custom resources from ~50 a year ago, and some now have useful CLI features like istioctl analyze, but we feel that there’s more to be done.

That’s why we’ve added our own validation subsystem to our service mesh platform, Backyards (now Cisco Service Mesh Manager). The Backyards service mesh platform maintains total compatibility with upstream Istio, but also extends its feature set, while avoiding lock-in through a new abstraction layer. A good example of this is its validation subsystem, which takes Istio’s validation system to a whole new level. It does this by considering the cluster state, as a whole, rather than just Istio’s configuration.

Backyards (now Cisco Service Mesh Manager) is Banzai Cloud’s multi and hybrid-cloud enabled service mesh platform for constructing and observing modern infrastructure. It is an Istio distribution and an SRE toolbox in one neat package that takes you from constructing your service mesh to forming SLOs against Envoy produced metrics.

Istio configuration validation in Backyards 🔗︎

Validation results can be seen on the Overview page of the UI:

Validation Validation

Validation can be also checked from the CLI tool, as follows:

❯ backyards analyze
✓ 0 validation errors found

The UI displays the relevant parts of the configuration for each error that is detected, wherever that is applicable:

Validation Validation

Validation examples 🔗︎

Backyards performs a lot of validation checks for various aspects of the configuration, both syntactically and semantically. The validation checks are constantly curated and new checks added with every release. A few examples will be presented in this post to show how helpful this feature is.

Sidecar injection template validation 🔗︎

This check validates whether there are any pods within the environment that runs with an outdated sidecar proxy image or configuration. In this example the global configuration setting of the sidecar proxy image was changed from banzaicloud/istio-proxyv2:1.7.3-bzc to banzaicloud/istio-proxyv2:1.7.3-bzc.1.

❯ backyards analyze --namespace backyards-demo
pod backyards-demo/frontpage-v1-8f9d69c97-phv4k:
    Cluster: master
    Error: sidecar injector proxy image mismatch
        Control Plane: cp-v17x.istio-system
        Error ID: pod/sidecar-check/sidecar/proxy-image-mismatch
        Context:
            podImage: banzaicloud/istio-proxyv2:1.7.3-bzc
            configImage: banzaicloud/istio-proxyv2:1.7.3-bzc.1
...
...
✗ 4 validation errors were found

This helps operators to get information about outdated proxies within the environment.

Gateway port protocol configuration conflict validation 🔗︎

This example demonstrates a check for the common mistake of setting conflicting port configuration in different Gateway resources, which won’t be denied by Istio’s built-in validation, but can cause unwanted behavior at ingress. The 9443 port for the same ingress gateway has been set to TCP in one resource, and set to TLS in another.

The following YAMLs were applied:

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-port-conflict-01
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
  - hosts:
    - demo1.example.com
    port:
      name: tcp
      number: 9443
      protocol: TCP
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-port-conflict-02
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
  - hosts:
    - demo2.example.com
    port:
      name: tls
      number: 9443
      protocol: TLS
    tls:
      serverCertificate: /certs/cert.pem
      privateKey: /certs/key.pem
      mode: SIMPLE

Check the configuration’s validity by running the CLI tool’s analyze command.

❯ build/backyards-cli analyze --namespace istio-system
gateway istio-system/demo-gw-port-conflict-01:
    Cluster: master
    Error: Conflicting gateway port protocols
        Control Plane: cp-v17x.istio-system
        Error ID: gateway/port/gateway/port/protocol-conflict
        Path: servers[0]
        Context:
            port: 9443
            protocol: TCP

gateway istio-system/demo-gw-port-conflict-02:
    Cluster: master
    Error: Conflicting gateway port protocols
        Control Plane: cp-v17x.istio-system
        Error ID: gateway/port/gateway/port/protocol-conflict
        Path: servers[0]
        Context:
            port: 9443
            protocol: TLS

✗ 2 validation errors found

This result shows the issue exactly, and provides all the information necessary for the operator to quickly pinpoint the problem in the configuration.

Multiple gateways with the same TLS certificate validation 🔗︎

Configuring more than one gateway, using the same TLS certificate, will cause browsers that leverage HTTP/2 connection reuse (i.e., most browsers) to produce 404 errors when accessing a second host after a connection to another host has already been established.

You can read more about this issue in the Istio docs.

Let’s apply the following resources to demonstrate how this issue works:

apiVersion: istio.banzaicloud.io/v1beta1
kind: MeshGateway
metadata:
  labels:
    app: demo-gw
  name: demo-gw
  namespace: istio-system
spec:
  labels:
    app: demo-gw
  maxReplicas: 1
  minReplicas: 1
  ports:
  - name: http2
    port: 80
    protocol: TCP
    targetPort: 8080
  - name: https
    port: 443
    protocol: TCP
    targetPort: 8443
  replicaCount: 1
  runAsRoot: true
  serviceType: LoadBalancer
  type: ingress
---
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: example-wildcard-cert
  namespace: istio-system
spec:
  secretName: example-wildcard-cert
  duration: 2160h # 90d
  renewBefore: 360h # 15d
  commonName: "test wildcard certifcate"
  isCA: false
  keySize: 2048
  keyAlgorithm: rsa
  keyEncoding: pkcs1
  usages:
    - server auth
  dnsNames:
  - "*.example.com"
  issuerRef:
    name: selfsigned-issuer
    kind: ClusterIssuer
    group: cert-manager.io
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-tls-conflict-01
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
  - hosts:
    - demo1.example.com
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      credentialName: example-wildcard-cert
      httpsRedirect: false
      mode: SIMPLE
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-tls-conflict-02
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
  - hosts:
    - demo2.example.com
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      credentialName: example-wildcard-cert
      httpsRedirect: false
      mode: SIMPLE

The following resources were created:

  • an ingress gateway
  • an *.example.com wildcard certificate
  • two Gateway resources, both of which specify the same wildcard cert

Check the configuration’s validity by running the analyze command in the CLI tool.

❯ backyards analyze --namespace istio-system
gateway istio-system/demo-gw-demo1:
    Cluster: master
    Error: multiple gateways configured with same TLS certificate
        Control Plane: cp-v17x.istio-system
        Error ID: gateway/reused-cert/gateway/reused-cert
        Path: port[443]
        Context:
            reusedCertificateSecret: secret:master:istio-system:example-wildcard-cert

gateway istio-system/demo-gw-demo2:
    Cluster: master
    Error: multiple gateways configured with same TLS certificate
        Control Plane: cp-v17x.istio-system
        Error ID: gateway/reused-cert/gateway/reused-cert
        Path: port[443]
        Context:
            reusedCertificateSecret: secret:master:istio-system:example-wildcard-cert

✗ 2 validation errors were found

The analyze command can also produce JSON output.

❯ backyards analyze --namespace istio-system -o json
{
  "gateway.networking.istio.io:master:istio-system:demo-gw-demo1": [
    {
      "checkID": "gateway/reused-cert",
      "istioRevision": "cp-v17x.istio-system",
      "subjectContextKey": "gateway.networking.istio.io:master:istio-system:demo-gw-demo1",
      "passed": false,
      "error": {},
      "errorMessage": "multiple gateways configured with same TLS certificate"
    }
  ],
  "gateway.networking.istio.io:master:istio-system:demo-gw-demo2": [
    {
      "checkID": "gateway/reused-cert",
      "istioRevision": "cp-v17x.istio-system",
      "subjectContextKey": "gateway.networking.istio.io:master:istio-system:demo-gw-demo2",
      "passed": false,
      "error": {},
      "errorMessage": "multiple gateways configured with same TLS certificate"
    }
  ]
}

Future plans 🔗︎

With Backyards (now Cisco Service Mesh Manager), you can already identify numerous configuration issues with today’s validation, but we’re planning to take this functionality a step further. It would be great to catch misconfigurations before they were applied to the cluster, and not after.

Tip of the day: You can simply download the Backyards CLI tool and then run backyards analyze with KUBECONFIG set for your cluster to detect if there are any validation issues on your cluster. Please note, that only evaluation usage is allowed for free, contact us if you’d like to use Backyards in production.

Both from the UI and from the CLI tool, Backyards can manipulate the Istio configuration resources by setting traffic management rules, changing mutual TLS settings or restricting outbound configurations for Envoy proxies. When a user manipulates any of these Istio resources, the validations are run against the new hypothetical cluster state in which the manipulations would be applied, even before the Istio resources are actually changed on the cluster. If, in this state, any validation issue is present, then the user is notified and can cancel or modify the faulty modification that he/she was about to make.

Another use case is in a GitOps workflow, where the Istio resources would be modified via a PR. In this case, in an automated job, the backyards analyze command can be run against the new hypothetical cluster state and, if any issues are discovered, then the job fails and even the PR merge can be prevented.

With the implementation of these features, it will be possible to catch issues early, and Backyards users will be further protected from potential downtime and Istio misconfigurations.

Takeaway 🔗︎

This was just the tip of the iceberg. Backyards’ validation subsystem provides lots of checks that result in faster root cause analysis and more stable operation of the service mesh.

Check out Backyards in action on your own clusters!

Register for a free version

Want to know more? Get in touch with us, or delve into the details of the latest release.

Or just take a look at some of the Istio features that Backyards automates and simplifies for you, and which we’ve already blogged about.