Banzai Cloud is now part of Cisco

Banzai Cloud Logo Close
Home Products Benefits Blog Company Contact

The content of this page hasn't been updated for years and might refer to discontinued products and projects.

One of the main challenges enterprises currently face – whether they have an on-prem infrastructure or are running their workloads in the cloud – is application and network security. Data must be protected at all times, not just when it’s resting in a database or is placed in storage somewhere. Ensuring data protection while its moving means more than just encryption, it also means authentication and authorization.

A service mesh is a transparent infrastructure layer that sits between a network and microservices, as such it’s the perfect place to ensure data encryption, authentication and authorization.

Istio provides automatic mTLS and trusted identity between workloads by using SPIFFE IDs in X.509 certificates. Every workload in a Kubernetes environment runs under the name of a service account. Its identity is therefore based on the service account of the workload.

Identities in Istio conform to the SPIFFE standard and have the following format:

spiffe://<trust-domain>/ns/<namespace>/sa/<service-account>

Because identity is the basis for authorization and because the service account is the basis for identity, it’s important to set up an environment correctly, using proper policies. In Kubernetes, RBAC rules can and should be used to specify such policies and permission settings.

For starters, users and service accounts should be restricted to a set of namespaces. Of course, this is the bare minimum setting, and it is highly recommended that you use a more fine-grained RBAC setup. On the other hand, this does not prevents issues that might arise from misconfiguration or due to a malicious actor trying to run a workload in the name of a specific service account.

What could go wrong 🔗︎

Let’s imagine a system in which we have collected data protected by GDPR or a similar set of legal restrictions, such that it cannot leave the country or can only be used for very specific cases or under very specific circumstances. Beginning with the architecture, we have a data provider serving this data to allowed services and an analytics consumer performing calculations based on anonymized data.

RBAC rules restrict who can deploy to which Kubernetes namespaces. The Istio authorization policies are set so that only the analytics service has access to the data service or, more precisely, the pods running with the service account of the analytics service are able to reach the pods of the data service.

Even if this setup seems fine, the policies are based on service identities that, in of themselves, do not restrict which workload can run with which service account, or where. It’s easy to conceive of a developer making the honest mistake of running a different workload in the name of the analytics service that can reach the restricted data, using it in ways that it wasn’t meant to be used. Similarly, an attacker might use the same vector to do a great deal of harm to an organization.

It is now also common for an environment to have clusters from different geographic locations that are interconnected. For example, larger enterprises’ service meshes are generally expanded over more clusters, in multiple regions. This raises the question of being able to control and enforce workload placements within an environment, as there are more and more regulations about how and when data can leave certain regions, if at all.

Kubernetes RBAC is a good base for deployment restrictions; Istio authorization policies can help to restrict service to service communication based on identities, but we need something better for policy management, to secure the environment and make it as airtight as possible.

OPA to the rescue 🔗︎

Open Policy Agent is a general-purpose policy engine that can be used to enforce policies across a range of applications. It can act as a policy middleware application — policies can be defined in OPA, and applications can query OPA to make decisions. Essentially, it provides policy-as-a-service and decouples policies from application configurations.

Policies may be written in Rego (pronounced “ray-go”), which is purpose-built for expressing policies over complex hierarchical data structures. For detailed information on Rego see the Policy Language documentation.

OPA can be easily integrated with Kubernetes through the use of admission controllers. These enforce policies on objects during create, update, and delete operations.

Here’s a couple examples of what’s possible when using OPA as a validating admission controller:

  • Require specific labels or annotations on all resources.
  • Require container images to come from trusted sources only.
  • Require all workloads to specify resource requests and limits.
  • Prevent conflicting objects from being created.

By deploying OPA as a mutating admission controller you can, for example:

  • Inject sidecar containers into pods.
  • Set specific labels or annotations on resources.
  • Rewrite container images to point at the corporate image registry.
  • Include node and pod (anti-)affinity selectors on deployments.

How does this look like in practice? 🔗︎

OPA policies OPA policies

Assuming the previously described setup involving a data service and an analytics service, we would like to allow communication between the two. At the same time, we would like to restrict which workload can run with the analytics service account. Let’s take a look at how OPA might help us implement these kinds of controls.

We’re going to use Banzai Cloud’s Istio-based service mesh platform, Backyards (now Cisco Service Mesh Manager), for the purposes of this demonstration.

Setup 🔗︎

  1. Create a Kubernetes cluster.

    If you need a hand with this, you can create a cluster with our free version of Banzai Cloud’s Pipeline platform.

  2. Point KUBECONFIG at your cluster.

  3. Register for the free version and run the following command to install Backyards:

    Register for the free tier version of Cisco Service Mesh Manager (formerly called Banzai Cloud Backyards) and follow the Getting Started Guide for up-to-date instructions on the installation.

  4. Install OPA

    ~ ❯ helm repo add stable https://kubernetes-charts.storage.googleapis.com/
    "stable" has been added to your repositories
    
    ~ ❯ kubectl create namespace opa
    namespace/opa created
    
    ~ ❯ helm upgrade --install opa stable/opa --namespace opa \
        --values https://raw.githubusercontent.com/banzaicloud/opa-samples/master/helm-values.yaml
    

Deploy the analytics and data service 🔗︎

First deploy the data service, which runs with the data service account.

We are going to use the same application that is used for generic Backyards demonstrations, called Allspark.

~ ❯ kubectl apply -f https://raw.githubusercontent.com/banzaicloud/opa-samples/master/data-service-deploy.yaml
namespace/data created
serviceaccount/data created
deployment.apps/data created
service/data created

Configure an Istio authorization policy so that only the analytics service is allowed to reach the data service’s /api/v1/data endpoint.

~ ❯ kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: data
  namespace: data
spec:
  selector:
    matchLabels:
      app: data
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/analytics/sa/analytics"]
    to:
    - operation:
        methods: ["GET"]
        paths: ["/api/v1/data"]
EOF

Now let’s deploy the analytics service, which runs with the analytics service account and checks whether it’s possible to connect to the data service.

~ ❯ kubectl apply -f https://raw.githubusercontent.com/banzaicloud/opa-samples/master/analytics-service-deploy.yaml
namespace/analytics created
serviceaccount/analytics created
deployment.apps/analytics created
service/analytics created

Check service to service communication 🔗︎

Deploy a test pod and send an HTTP request to the analytics service. In the background, it should trigger communication with the data service.

~ ❯ kubectl create -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  namespace: analytics
  labels:
    app: test-pod
spec:
  containers:
  - name: app
    image: curlimages/curl:7.72.0
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 3000; done;" ]
EOF
pod/test-pod created

~ ❯ kubectl exec -ti test-pod -c app -- curl http://analytics:8080
analytics response

By checking the logs, we can determine that the analytics service was able to reach the data service.

~ ❯ kubectl logs -l app=analytics -c service | tail -2
time="2020-09-09T21:48:27Z" level=info msg="outgoing request" correlationID=70fad725-7791-4070-b655-1f80a85730f1 server=http url="http://data.data:8080/api/v1/data"
time="2020-09-09T21:48:27Z" level=info msg="response to outgoing request" correlationID=70fad725-7791-4070-b655-1f80a85730f1 responseCode=200 server=http url="http://data.data:8080/api/v1/data"

However, the test pod is not able to reach the data service directly, since the Istio authorization policy forbids that.

~ ❯ kubectl exec -ti test-pod -c app -- curl http://data.data:8080/api/v1/data
RBAC: access denied

We’ve anticipated everything that’s happened so far. Now let’s see what happens if we’re trying to use another test pod, but this time we start it with the analytics service account.

~ ❯ kubectl create -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: test-pod-with-analytics
  namespace: analytics
  labels:
    app: test-pod
spec:
  serviceAccount: analytics
  containers:
  - name: app
    image: curlimages/curl:7.72.0
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 3000; done;" ]
EOF
pod/test-pod-with-analytics created

Check the communication from the test-pod-with-analytics pod.

~ ❯ kubectl exec -ti test-pod-with-analytics -c app -- curl http://data.data:8080/api/v1/data
data service response

As expected, this shows that the Istio authorization policy allows the analytics service account to connect to the data service, regardless of the actual workload using that service account.

Use OPA to restrict which workloads can be used with which service account 🔗︎

OPA policy 1.

To prevent such a situation, we should restrict which docker images are allowed to run with the analytics service account. This is where OPA comes in. We already installed the OPA validating admission webhook handler in the setup phase, but we still need to configure it.

~ ❯ kubectl apply -f - <<EOF
kind: ConfigMap
apiVersion: v1
metadata:
  labels:
    openpolicyagent.org/policy: rego
  name: opa-main
  namespace: opa
data:
  main: |
    package system

    import data.kubernetes.admission

    main = {
      "apiVersion": "admission.k8s.io/v1beta1",
      "kind": "AdmissionReview",
      "response": response,
    }

    default uid = ""

    uid = input.request.uid

    response = {
        "allowed": false,
        "uid": uid,
        "status": {
            "reason": reason,
        },
    } {
        reason = concat(", ", admission.deny)
        reason != ""
    }
    else = {"allowed": true, "uid": uid}
EOF

Besides the main configuration, we can add policies as configmaps and label them for OPA to discover. After a configmap is applied, the OPA policy manager annotates it, using the key openpolicyagent.org/policy-status, which contains the actual policy status. Any errors in the policy are reported using this annotation.

The following OPA policy describes a scenario in which only the banzaicloud/allspark:0.1.2 and the banzaicloud/istio-proxyv2:1.7.0-bzc images are allowed to run with the analytics service account. The first is the actual workload, and the second is the Istio proxy container image.

~ ❯ kubectl apply -f - <<EOF
kind: ConfigMap
apiVersion: v1
metadata:
  labels:
    openpolicyagent.org/policy: rego
  name: opa-pod-allowlist
  namespace: opa
data:
  main: |
    package kubernetes.admission

    allowlist = [
        {
            "serviceAccount": "analytics",
            "images": {"banzaicloud/allspark:0.1.2", "banzaicloud/istio-proxyv2:1.7.0-bzc"},
        },
    ]

    deny[msg] {
        input.request.kind.kind == "Pod"
        input.request.operation == "CREATE"

        serviceAccount := input.request.object.spec.serviceAccountName

        # check whether the service account is restricted
        allowlist[a].serviceAccount == serviceAccount

        image := input.request.object.spec.containers[_].image

        # check whether the pod images allowed to run with the specified service account
        not imageWithServiceAccountAllowed(serviceAccount, image)

        msg := sprintf("pod with serviceAccount %q, image %q is not allowed", [serviceAccount, image])
    }

    imageWithServiceAccountAllowed(serviceAccount, image) {
        allowlist[a].serviceAccount == serviceAccount
        allowlist[a].images[image]
    }
EOF
configmap/opa-pod-allowlist created

~ ❯ kubectl -n opa get cm opa-pod-allowlist -o jsonpath='{.metadata.annotations.openpolicyagent\.org/policy-status}'
{"status":"ok"}

After applying the OPA policy, we should try to re-create the test pod with the analytics service account and watch it fail miserably.

~ ❯ kubectl delete pods test-pod-analytics --grace-period=0
pod "test-pod-analytics" deleted

~ ❯ kubectl create -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: test-pod-with-analytics
  namespace: analytics
  labels:
    app: test-pod
spec:
  serviceAccount: analytics
  containers:
  - name: app
    image: banzaicloud/allspark:0.1.2
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 3000; done;" ]
EOF
Error from server (pod with serviceAccount "analytics", image "curlimages/curl:7.72.0" is not allowed): error when creating "STDIN": admission webhook "webhook.openpolicyagent.org" denied the request: pod with serviceAccount "analytics", image "curlimages/curl:7.72.0" is not allowed

Use OPA to restrict where a workload with a specific service account can run 🔗︎

OPA policy 2.

Company policies may also require that some services run only in a specific region or may require the use of specific hardware (for encryption or cryptographic singing, for example). With the emerging utilization of multi-cluster, multi-regional service meshes it has become more and more common for a single environment to have resources available in multiple regions, thus policies should be in place to prevent the misplacement of workloads.

To demonstrate how that might happen, we can easily expand the existing Backyards service mesh by adding another Kubernetes cluster from another region, only the kubeconfig of the new cluster is needed.

The OPA validating admission controller has to be installed on the new cluster as well. To do this, repeat step 4. from the Setup section on the new cluster and apply the opa-main configmap.

~ ❯ backyards istio cluster attach ~/Download/waynz0r-0910-01.yaml
✓ creating service account and rbac permissions
...
✓ attaching cluster started successfully name=waynz0r-0910-01

~ ❯ backyards istio cluster status
Name     Type  Status     Gateway Address              Istio Control Plane  Message
Clusters in the mesh

Name             Type  Status     Gateway Address              Istio Control Plane   Message
waynz0r-0908-01  Host  Available  [18.195.59.67 52.57.74.102]  -
waynz0r-0910-01  Peer  Available  [35.204.169.33]              cp-v17x.istio-system

The `waynz0r-0908-01` cluster is running on AWS in the `eu-central-1` region, the `waynz0r-0910-01` cluster is a GKE cluster in the `europe-west4` region.

The following OPA policy was extended to support restrictions based on the `nodeSelector` property of the pods in question. Let's apply it to both clusters!

```bash
~ ❯ kubectl apply -f - <<EOF
kind: ConfigMap
apiVersion: v1
metadata:
  labels:
    openpolicyagent.org/policy: rego
  name: opa-pod-allowlist
  namespace: opa
data:
  main: |
    package kubernetes.admission

    allowlist = [
        {
            "serviceAccount": "analytics",
            "images": {"banzaicloud/allspark:0.1.2", "banzaicloud/istio-proxyv2:1.7.0-bzc"},
            "nodeSelector": [{"failure-domain.beta.kubernetes.io/region": "eu-central-1"}],
        },
    ]

    deny[msg] {
        input.request.kind.kind == "Pod"
        input.request.operation == "CREATE"

        serviceAccount := input.request.object.spec.serviceAccountName

        # check whether the service account is restricted
        allowlist[a].serviceAccount == serviceAccount

        image := input.request.object.spec.containers[_].image

        # check whether the pod images allowed to run with the specified service account
        not imageWithServiceAccountAllowed(serviceAccount, image)

        msg := sprintf("pod with serviceAccount %q, image %q is not allowed", [serviceAccount, image])
    }

    imageWithServiceAccountAllowed(serviceAccount, image) {
        allowlist[a].serviceAccount == serviceAccount
        allowlist[a].images[image]
    }

    deny[msg] {
        input.request.kind.kind == "Pod"
        input.request.operation == "CREATE"

        serviceAccount := input.request.object.spec.serviceAccountName

        # check whether the service account is restricted
        allowlist[a].serviceAccount == serviceAccount

        # check whether pod location is restricted
        count(allowlist[a].nodeSelector[ns]) > 0

        image := input.request.object.spec.containers[_].image
        nodeSelector := object.get(input.request.object.spec, "nodeSelector", [])

        # check whether pod location is allowed
        not podAtLocationAllowed(serviceAccount, nodeSelector)

        msg := sprintf("pod with serviceAccount %q, image %q is not allowed at the specified location", [serviceAccount, image])
    }

    podAtLocationAllowed(serviceAccount, nodeSelector) {
        allowlist[a].serviceAccount == serviceAccount

        # requires that at least one nodeSelector combination matches this image and serviceAccount combination
        selcount := count(allowlist[a].nodeSelector[ns])
        count({k | allowlist[a].nodeSelector[s][k] == nodeSelector[k]}) == selcount
    }
EOF
configmap/opa-pod-allowlist created

Now let’s see what happens when we try to deploy the analytics service to the peer cluster.

~ ❯ kubectl apply -f https://raw.githubusercontent.com/banzaicloud/opa-samples/master/analytics-service-deploy.yaml
namespace/analytics created
serviceaccount/analytics created
deployment.apps/analytics created
service/analytics created

This looks fine at first glance, but the pod hasn’t actually been created, since the OPA policy prevented that from happening.

~ ❯ kubectl get event
LAST SEEN   TYPE      REASON              OBJECT                            MESSAGE
2m24s       Warning   FailedCreate        replicaset/analytics-6cb4bfc97f   Error creating: admission webhook "webhook.openpolicyagent.org" denied the request: pod with serviceAccount "analytics", image "banzaicloud/allspark:0.1.2" is not allowed at the specified location, pod with serviceAccount "analytics", image "banzaicloud/istio-proxyv2:1.7.0-bzc" is not allowed at the specified location

It must be noted that it’s important to protect these OPA policies, so proper Kubernetes RBAC rules need to be applied to prevent unwanted access to the opa namespace and the validating webhook configuration resource!

Takeaway 🔗︎

Kubernetes provides the basic components necessary to enforce policies at the cluster level. The built-in Kubernetes RBAC rules provide a good base for securing an environment. When you have a service mesh, authorization policies are a natural next step towards implementing security policies for service to service communication. However, having a general-purpose policy engine introduces a whole other range of options and possibilities. Even though the OPA policy language has a steep learning curve, as was demonstrated in this blog post, it provides a lot of power and flexibility as a result.