Upgrading EKS cluster node pools managed by Pipeline

The following tutorial describes the manual process of upgrading the nodes of an EKS cluster managed by Banzai Cloud Pipeline. For Pipeline Enterprise users, there is an automated method as well.

This method is basically a rolling upgrade: a new node pool is created with the expected version, load is transferred, and the old node pool is destroyed. Please note that this method is currently under development, and always create a backup before making changes to production clusters.

Prerequisites 🔗︎

Only the node pools of clusters that were created from Banzai Cloud Pipeline can be upgraded. Upgrading the node pools of imported clusters is not possible.
Currently only Amazon EKS clusters are supported.

Upgrade a node pool 🔗︎

Select the cluster to upgrade:

The easiest way to execute subsequent kubectl and banzai commands on a Pipeline-managed cluster is to use the banzai cluster shell command. It will let you interactively select a cluster, and open a subshell with the proper Kubernetes context defined:
```
banzai cluster shell
```

List the cluster node pools to determine the node pool to upgrade

banzai cluster nodepool list --cluster-name <cluster-name>

The command returns the details of the cluster node pools such as the node pool name used as node pool identifying reference, for example:

Name   Size  Autoscaling  MinimumSize  MaximumSize  VolumeEncryption     VolumeSize  VolumeType  InstanceType  Image                  SpotPrice  SubnetID                  SecurityGroups  Status  StatusMessage
pool1  2     Enabled      1            2            AWS account default  50          gp2         t2.small      ami-03d9393d97f5959fe             subnet-0d922e468626e9e3f  READY
pool2  1     Enabled      1            2            Disabled             25          gp3         t2.small      ami-0644e90665b26316b  0.03       subnet-0d922e468626e9e3f  READY

Define new node pool in a local YAML file:

CAUTION:

Your worker nodes must not run a newer Kubernetes version than your control plane. You can check the Kubernetes version of the nodes in a node pool by running the following command:

kubectl get nodes -l nodepool.banzaicloud.io/name=<pool> -o json | jq '.items[].status.nodeInfo.kubeletVersion'

Mandatory values:

value	description	example
name	Name of node pool	pool2
size	Desired capacity	3
instanceType	EC2 instance type	t3.micro

Optional values:

value	description	example
labels	Additional node labels	labels:{label1: label1}
autoscaling.enabled	Enable or disable cluster autoscaler for this node pool	true
autoscaling.minSize	Minimum size of nodepool	3
autoscaling.maxSize	Maximum size of nodepool	5
image	AMI ID	ami-yyyyyyyyy
subnetId	Identifier of the subnet to associate the node pool with	subnet-xxxxx
spotPrice	The upper limit price for the requested spot instance. If this field is left empty or 0, on-demand instances will be used	“0.2”
useInstanceStore	Use instance store volumes (NVMe disks) for the node pool as Kubelet root, and provision emptyDir volumes on local instance storage disks. For details, see [useInstanceStore (true	false)](/docs/pipeline/clusters/create/eks/reference/#nodepool-useinstancestore). Default: false
volumeEncryption	Node EBS volume encryption	{“enabled”: true, encryptionKeyARN: “arn:aws:kms:aws-region:000000000000:key/00000000-0000-0000-0000-000000000000”}
volumeSize	Node EBS volume size in GiB. Default: original value	20
volumeType	Node EBS volume type. Default: Pipeline defined fallback value	gp3

Example new_nodepool.yaml:

name: pool2
size: 2
instanceType: t3.large

Add new node pool to the EKS cluster:

banzai cluster nodepool create --file=<new_nodepool.yaml>

Wait for nodes to become ready:

List nodes with the Banzai Cloud nodepool label (displayed in the last column):

kubectl get nodes -L nodepool.banzaicloud.io/name
NAME                                         STATUS   ROLES    AGE   VERSION              NAME
ip-xxx-xxx-xxx-xxx.region.compute.internal   Ready    <none>   4d    v1.14.x-eks-xxxxxx   pool1
ip-xxx-xxx-xxx-xxx.region.compute.internal   Ready    <none>   4d    v1.14.x-eks-xxxxxx   pool1
ip-xxx-xxx-xxx-xxx.region.compute.internal   Ready    <none>   10m   v1.14.y-eks-yyyyyy   pool2
ip-xxx-xxx-xxx-xxx.region.compute.internal   Ready    <none>   10m   v1.14.y-eks-yyyyyy   pool2

Wait for the new nodes to appear and become ready. (You may add --watch to the above command, or execute it multiple times to see the exact status of the cluster.)

kubectl wait --for=condition=ready node -l nodepool.banzaicloud.io/name=pool2

Mark old nodes unschedulable:

Cordon each of the old nodes. In this example pool1 is the old one. This will prevent new pods from being scheduled onto them.
```
kubectl cordon -l nodepool.banzaicloud.io/name=pool1
```
Drain nodes:

After the nodes of old node pool are made unschedulable, the drain command will try to evict the pods that are already running on that node. Run the following command to drain each node. This deletes all the pods on that node.
```
kubectl drain -l nodepool.banzaicloud.io/name=pool1
```
After you drain a node, make sure the new pods are up and running before moving on to the next one. If you have any issues during the migration, uncordon the old pool and then cordon and drain the new pool. The pods get rescheduled back to the old pool.

You may also drain the nodes one by one with kubectl drain <node_name>.
Delete old nodepool:

Once all the workloads are successfully moved to the new nodes, it’s time to delete the old pool pool1.
```
banzai cluster nodepool delete pool1
```