If you’re an active Istio user, then there’s a good chance that Istio’s configuration reference is bookmarked in your browser, and that you’ve read the pages on VirtualServices
, and ServiceEntries
over and over, but still have to struggle to set up even simple configurations in your mesh.
Istio’s custom resource configuration is very powerful and flexible, but infamous for being overly complex. At its best, its YAML consists of lists of lists, cross-references, conflicting fields, and wildcards.
Even though Istio’s maintainers are aware of this hyper-complexity, and - at least in the last few releases - have tried to bring user friendliness into focus, Istio still routinely strands us in quagmires of minutia and uncertainty. We’re down to ~25 custom resources from ~50 a year ago, and some now have useful CLI features like istioctl analyze
, but we feel that there’s more to be done.
That’s why we’ve added our own validation subsystem to our service mesh platform, Backyards (now Cisco Service Mesh Manager). The Backyards service mesh platform maintains total compatibility with upstream Istio, but also extends its feature set, while avoiding lock-in through a new abstraction layer. A good example of this is its validation subsystem, which takes Istio’s validation system to a whole new level. It does this by considering the cluster state, as a whole, rather than just Istio’s configuration.
Backyards (now Cisco Service Mesh Manager) is Banzai Cloud’s multi and hybrid-cloud enabled service mesh platform for constructing and observing modern infrastructure. It is an Istio distribution and an SRE toolbox in one neat package that takes you from constructing your service mesh to forming SLOs against Envoy produced metrics.
Istio configuration validation in Backyards 🔗︎
Validation results can be seen on the Overview
page of the UI:
Validation can be also checked from the CLI tool, as follows:
❯ backyards analyze
✓ 0 validation errors found
The UI displays the relevant parts of the configuration for each error that is detected, wherever that is applicable:
Validation examples 🔗︎
Backyards performs a lot of validation checks for various aspects of the configuration, both syntactically and semantically. The validation checks are constantly curated and new checks added with every release. A few examples will be presented in this post to show how helpful this feature is.
Sidecar injection template validation 🔗︎
This check validates whether there are any pods within the environment that runs with an outdated sidecar proxy image or configuration. In this example the global configuration setting of the sidecar proxy image was changed from banzaicloud/istio-proxyv2:1.7.3-bzc
to banzaicloud/istio-proxyv2:1.7.3-bzc.1
.
❯ backyards analyze --namespace backyards-demo
pod backyards-demo/frontpage-v1-8f9d69c97-phv4k:
Cluster: master
Error: sidecar injector proxy image mismatch
Control Plane: cp-v17x.istio-system
Error ID: pod/sidecar-check/sidecar/proxy-image-mismatch
Context:
podImage: banzaicloud/istio-proxyv2:1.7.3-bzc
configImage: banzaicloud/istio-proxyv2:1.7.3-bzc.1
...
...
✗ 4 validation errors were found
This helps operators to get information about outdated proxies within the environment.
Gateway port protocol configuration conflict validation 🔗︎
This example demonstrates a check for the common mistake of setting conflicting port configuration in different Gateway
resources, which won’t be denied by Istio’s built-in validation, but can cause unwanted behavior at ingress. The 9443
port for the same ingress gateway has been set to TCP
in one resource, and set to TLS
in another.
The following YAMLs were applied:
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: demo-gw-port-conflict-01
namespace: istio-system
spec:
selector:
app: demo-gw
gateway-name: demo-gw
gateway-type: ingress
servers:
- hosts:
- demo1.example.com
port:
name: tcp
number: 9443
protocol: TCP
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: demo-gw-port-conflict-02
namespace: istio-system
spec:
selector:
app: demo-gw
gateway-name: demo-gw
gateway-type: ingress
servers:
- hosts:
- demo2.example.com
port:
name: tls
number: 9443
protocol: TLS
tls:
serverCertificate: /certs/cert.pem
privateKey: /certs/key.pem
mode: SIMPLE
Check the configuration’s validity by running the CLI tool’s analyze
command.
❯ build/backyards-cli analyze --namespace istio-system
gateway istio-system/demo-gw-port-conflict-01:
Cluster: master
Error: Conflicting gateway port protocols
Control Plane: cp-v17x.istio-system
Error ID: gateway/port/gateway/port/protocol-conflict
Path: servers[0]
Context:
port: 9443
protocol: TCP
gateway istio-system/demo-gw-port-conflict-02:
Cluster: master
Error: Conflicting gateway port protocols
Control Plane: cp-v17x.istio-system
Error ID: gateway/port/gateway/port/protocol-conflict
Path: servers[0]
Context:
port: 9443
protocol: TLS
✗ 2 validation errors found
This result shows the issue exactly, and provides all the information necessary for the operator to quickly pinpoint the problem in the configuration.
Multiple gateways with the same TLS certificate validation 🔗︎
Configuring more than one gateway, using the same TLS certificate, will cause browsers that leverage HTTP/2 connection reuse (i.e., most browsers) to produce 404 errors when accessing a second host after a connection to another host has already been established.
You can read more about this issue in the Istio docs.
Let’s apply the following resources to demonstrate how this issue works:
apiVersion: istio.banzaicloud.io/v1beta1
kind: MeshGateway
metadata:
labels:
app: demo-gw
name: demo-gw
namespace: istio-system
spec:
labels:
app: demo-gw
maxReplicas: 1
minReplicas: 1
ports:
- name: http2
port: 80
protocol: TCP
targetPort: 8080
- name: https
port: 443
protocol: TCP
targetPort: 8443
replicaCount: 1
runAsRoot: true
serviceType: LoadBalancer
type: ingress
---
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
name: selfsigned-issuer
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
name: example-wildcard-cert
namespace: istio-system
spec:
secretName: example-wildcard-cert
duration: 2160h # 90d
renewBefore: 360h # 15d
commonName: "test wildcard certifcate"
isCA: false
keySize: 2048
keyAlgorithm: rsa
keyEncoding: pkcs1
usages:
- server auth
dnsNames:
- "*.example.com"
issuerRef:
name: selfsigned-issuer
kind: ClusterIssuer
group: cert-manager.io
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: demo-gw-tls-conflict-01
namespace: istio-system
spec:
selector:
app: demo-gw
gateway-name: demo-gw
gateway-type: ingress
servers:
- hosts:
- demo1.example.com
port:
name: https
number: 443
protocol: HTTPS
tls:
credentialName: example-wildcard-cert
httpsRedirect: false
mode: SIMPLE
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: demo-gw-tls-conflict-02
namespace: istio-system
spec:
selector:
app: demo-gw
gateway-name: demo-gw
gateway-type: ingress
servers:
- hosts:
- demo2.example.com
port:
name: https
number: 443
protocol: HTTPS
tls:
credentialName: example-wildcard-cert
httpsRedirect: false
mode: SIMPLE
The following resources were created:
- an ingress gateway
- an *.example.com wildcard certificate
- two
Gateway
resources, both of which specify the same wildcard cert
Check the configuration’s validity by running the analyze
command in the CLI tool.
❯ backyards analyze --namespace istio-system
gateway istio-system/demo-gw-demo1:
Cluster: master
Error: multiple gateways configured with same TLS certificate
Control Plane: cp-v17x.istio-system
Error ID: gateway/reused-cert/gateway/reused-cert
Path: port[443]
Context:
reusedCertificateSecret: secret:master:istio-system:example-wildcard-cert
gateway istio-system/demo-gw-demo2:
Cluster: master
Error: multiple gateways configured with same TLS certificate
Control Plane: cp-v17x.istio-system
Error ID: gateway/reused-cert/gateway/reused-cert
Path: port[443]
Context:
reusedCertificateSecret: secret:master:istio-system:example-wildcard-cert
✗ 2 validation errors were found
The analyze
command can also produce JSON output.
❯ backyards analyze --namespace istio-system -o json
{
"gateway.networking.istio.io:master:istio-system:demo-gw-demo1": [
{
"checkID": "gateway/reused-cert",
"istioRevision": "cp-v17x.istio-system",
"subjectContextKey": "gateway.networking.istio.io:master:istio-system:demo-gw-demo1",
"passed": false,
"error": {},
"errorMessage": "multiple gateways configured with same TLS certificate"
}
],
"gateway.networking.istio.io:master:istio-system:demo-gw-demo2": [
{
"checkID": "gateway/reused-cert",
"istioRevision": "cp-v17x.istio-system",
"subjectContextKey": "gateway.networking.istio.io:master:istio-system:demo-gw-demo2",
"passed": false,
"error": {},
"errorMessage": "multiple gateways configured with same TLS certificate"
}
]
}
Future plans 🔗︎
With Backyards (now Cisco Service Mesh Manager), you can already identify numerous configuration issues with today’s validation, but we’re planning to take this functionality a step further. It would be great to catch misconfigurations before they were applied to the cluster, and not after.
Tip of the day: You can simply download the Backyards CLI tool and then run
backyards analyze
withKUBECONFIG
set for your cluster to detect if there are any validation issues on your cluster. Please note, that only evaluation usage is allowed for free, contact us if you’d like to use Backyards in production.
Both from the UI and from the CLI tool, Backyards can manipulate the Istio configuration resources by setting traffic management rules, changing mutual TLS settings or restricting outbound configurations for Envoy proxies. When a user manipulates any of these Istio resources, the validations are run against the new hypothetical cluster state in which the manipulations would be applied, even before the Istio resources are actually changed on the cluster. If, in this state, any validation issue is present, then the user is notified and can cancel or modify the faulty modification that he/she was about to make.
Another use case is in a GitOps workflow, where the Istio resources would be modified via a PR.
In this case, in an automated job, the backyards analyze
command can be run against the new hypothetical cluster state and, if any issues are discovered, then the job fails and even the PR merge can be prevented.
With the implementation of these features, it will be possible to catch issues early, and Backyards users will be further protected from potential downtime and Istio misconfigurations.
Takeaway 🔗︎
This was just the tip of the iceberg. Backyards’ validation subsystem provides lots of checks that result in faster root cause analysis and more stable operation of the service mesh.
Check out Backyards in action on your own clusters!
Want to know more? Get in touch with us, or delve into the details of the latest release.
Or just take a look at some of the Istio features that Backyards automates and simplifies for you, and which we’ve already blogged about.