Kubernetes persistent volume options

The content of this page hasn't been updated for years and might refer to discontinued products and projects.

At Banzai Cloud we push different types of workloads to Kubernetes with our open source PaaS, Pipeline. There are lots of deployments we support for which we have defined Helm charts, however, Pipeline is able to deploy applications from any repository. These deployments are pushed on-prem or in the cloud, but many of these deployments share one common feature, the need for persistent volumes. Kubernetes provides abundant options in this regard, and each cloud provider also offers custom/additional alternatives. This post will shed light on, and offer guidance in, choosing between these available options.

Volumes 🔗︎

In nearly every conceivable scenario, applications require some kind of storage, whether to store config files or the logs they produce, or even just to share results with other applications. This blog will provide a brief overview of Kubernetes supported volumes, especially Persistent Volumes with ReadWriteMany requirements.

Docker volumes

By default, Docker provides volumes for containers but they are not managed, so if the container crashes for any reason, all data written to them is lost. Kubernetes addresses this problem by providing various managed volumes, whose lifecycle is not dependent on the container that uses them.

Kubernetes EmptyDir

The simplest of these is the EmptyDir. Kubernetes creates Emptydir when an assigned Pod is created and when its name says it is empty. This volume outlives Container and even Pod failures. If the Pod is rescheduled to a different Node, all information will be lost and the dir will be deleted in the previous Node and created in the new one. By default, Emptydir uses the underlying machine’s storage, but it can be configured to use the machine’s memory instead. For details check the EmptyDir documentation.

Kubernetes Persistent Volumes 🔗︎

Persistent Volumes are the most durable storage solutions that Kubernetes has to offer. The lifetime of a PV is that of the Kubernetes cluster; as long as the cluster is healthy, these volumes can be reached. Various cloud providers support different storage solutions for Persistent Volumes. For example, on Azure there is AzureDisk and AzureFile, on Google, GooglePersistentDisks. Regardless of cloud provider, the following configurations are mandatory when using a Persistent Volume:

Dynamic or Static Provisioning

Persistent Volumes can be provisioned in two ways: \

Static: in which the admin creates the Persistent Volume, and the application developer needs to bind them by specifying their name in the Pod yaml file.
Dynamic: in which, unlike static, the volume is not set up in advance by an administrator, but rather the user specifies volume requirements through a Persistent Volume Claim (like requested storage size, access mode, etc). Also, administrators should create at least one Storage Class, which classifies underlying storage solutions by, for example, how redundant or fast the storage will be. Cloud providers that support managed Kubernetes set up a default storage class (though, the default storage class must have the following annotation storageclass.beta.kubernetes.io/is-default-class=true). App developers should specify a StorageClass’s name in the VolumeClaim, otherwise the default will be used, and the cluster will try to dynamically bind the Volume for the application.

Provisioner

If Static, Persistent Volume, otherwise a Storage Class needs to contain information about the provisioner. Provisioners are the Kubernetes plugins which bounds the required volume to the pods. Use supported provisioners, for example on a GKE cluster AzureFile cannot be used. There are cloud unbounded solutions as well, like GlusterFS, but the impact of their configurations tends to be significantly larger (complex, can be automated, etc - enter Pipeline).

Access Mode

Persistent Volumes can be mounted on a VM in three different ways.

Keep in mind not all modes are supported by all resource providers. This table lists supported modes by provider.

ReadWriteOnce (a volume that can be mounted as read-write by a single node)
ReadOnlyMany (a volume that can be mounted read-only by many nodes)
ReadWriteMany (a volume that can be mounted as read-write by many nodes)

Let’s try it out 🔗︎

Create the cluster 🔗︎

We are going to create an AKS cluster with a Storageclass that uses Azure File as provider and it enables acquiring a ReadWriteMany volume. To do that, first we need to create a Kubernetes cluster. Pipeline makes it easy to create a Kubernetes cluster from scratch in minutes on all major cloud providers, or to adopt an on-prem cluster. Just to recap, it:

Creates Kubernetes clusters on all major cloud providers
Provides an end-to-end language agnostic CI/CD solution
Manages application (Helm) repositories
Manages cluster profiles
Deploys applications using Helm and manage the app lifecycle
Deploys spotguides
Provides out of the box observability (log collection, tracing, monitoring)

Create an AKS cluster with the help of this Postman collection and by using the Cluster Create AKS API call. Please modify the body of the request to install 1.7.9 Kubernetes instead of 1.9.2; we will update the cluster in another step later.

If you need help creating a Kubernetes cluster with Pipeline, please read the following readme.

We are going to use a mysql chart with a dynamically allocated Azurefile with the access mode ReadWriteMany.

Those who are unfamiliar with Azurefile please follow this link to get more information.

Create the StorageClass 🔗︎

To use the AzureFile as a StorageClass, a Storage Account has to be created on the resource group where the cluster is located.

az group list --output table
MC_RGbaluchicken_azclusterbaluchicken0_westeurope    westeurope  Succeeded

If you’ve identified your resource group, create the Storage Account.

az storage account create --resource-group MC_RGbaluchicken_azclusterbaluchicken787_westeurope --name banzaicloudtest --location westeurope --sku Standard_LRS

Define a StorageClass.yaml:

kubectl create -f - <<EOF
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: azurefile
provisioner: kubernetes.io/azure-file
parameters:
  location: westeurope
  skuName: Standard_LRS
  storageAccount: banzaicloudtest
EOF

To interact with the created cluster, download the Kubernetes config using the Postman’s Cluster Config call.

Deploy Mysql 🔗︎

Use the given Postman collection to deploy the mysql chart to the cluster. Look for the Deployment Create API call and replace its body with this:

{
	"name": "stable/mysql",
	"values": {
		"persistence": {
			"accessMode": "ReadWriteMany",
			"storageClass": "azurefile"
		}
	}
}

Check if the pod is up and running:

kubectl get pods -w
NAME                                  READY     STATUS            RESTARTS   AGE
ardent-ferrit-mysql-140208387-f0mhb   0/1       PodInitializing   0          7s
ardent-ferrit-mysql-140208387-f0mhb   0/1       Running   0         20s
ardent-ferrit-mysql-140208387-f0mhb   1/1       Running   0         1m

Everything looks good; the volume has been successfully bound, and mysql is ready to be used. Now update the cluster to a more recent version of Kubernetes, 1.8.6.

We’ll be adding the programmatic cluster upgrade to AKS as well (as we already have for other providers). For more information, watch this issue

az aks upgrade --name azclusterbaluchicken787 --resource-group RGbaluchicken --kubernetes-version 1.8.6

If you want to upgrade to an even higher version of Kubernetes, version 1.9.2, you may. But keep in mind that AKS cannot upgrade to 1.9.2 directly from version 1.7.9. You need to update the cluster to 1.8.x first.

Pipeline’s next release, 0.4.0, will support updating Kubernetes.

Now check back on the cluster:

kubectl get pods
NAME                                  READY     STATUS             RESTARTS   AGE
ardent-ferrit-mysql-140208387-vhm4q   0/1       CrashLoopBackOff   7          17m

Something’s gone wrong; the default directory mode and file mode differ between Kubernetes versions. So while the the access mode is 0777 for Kubernetes v1.6.x, v1.7.x, it is 0755 for v1.8.6 and above.

The Helm chart for mysql uses root for the setup, so what’s the problem? If we check the Dockerfile, we find that it creates a mysql user, so chmod 0755 will not allow us to write anything to the required directory.

To solve this, modify the StorageClass created earlier, and add the following (this forces the access mode to 0777)

mountOptions:
  - dir_mode=0777
  - file_mode=0777

Why we didn’t simply put this option in the Storageclass declaration for Kubernetes version 1.7.9? Because Kubernetes only supports this feature in 1.8.5 and above.

We hope this has been helpful.

Related resources

Next generation integrated services

article

Kafka external access

article

Managing ksqlDB with Supertubes

article