X

How to drain node in Kubernetes cluster for maintenance activity.


In this article we will discuss about, how to drain node in Kubernetes cluster. For maintenance activity such as patching etc.  requires downtime. In order to carry out such activities without impacting applications running within cluster, we need to follow some step by step command. We will discuss those commands in this article for impact less maintenance.


How to drain node in Kubernetes cluster?

Pre-requisite:

We have got Kubernetes cluster with one master node and two worker node as below:

$ kubectl get node
NAME          STATUS   ROLES    AGE    VERSION
k8s-control   Ready    master   167m   v1.18.5
k8s-worker1   Ready    <none>   166m   v1.18.5
k8s-worker2   Ready    <none>   166m   v1.18.5
$ 

Now lets run one pod and one deployment with following YAML definition as below for the testing purpose:

Pod Definition:

$ cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
 name: my-pod
spec:
 containers:
 - name: nginx
   image: nginx
   ports:
   - containerPort: 80

Deployment definition:

$ cat deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
  labels:
    app: my-deployment
spec:
 replicas: 2
 selector:
    matchLabels:
     app: my-deployment
 template:
   metadata:
    labels:
      app: my-deployment
   spec:
     containers:
     - name: nginx
       image: nginx:1.14.2
       ports:
       - containerPort: 80

Now lets run both the definition file for creating the objects:

$ kubectl create -f pod.yaml
pod/my-pod created
$ kubectl create -f deploy.yaml
deployment.apps/my-deployment created
$

Cross verify the nodes on which Pod and the deployment Pods are running using below command:

$ kubectl get pods -o wide
NAME                             READY   STATUS    RESTARTS   AGE     IP               NODE          NOMINATED NODE   READINESS GATES
my-deployment-655f58b4cb-2sn4r   1/1     Running   0          25s     192.168.194.70   k8s-worker1   <none>           <none>
my-deployment-655f58b4cb-t4s7t   1/1     Running   0          25s     192.168.126.6    k8s-worker2   <none>           <none>
my-pod                           1/1     Running   0          5m13s   192.168.126.5    k8s-worker2   <none>           <none>
$

You can also see from the above output that, single Pod is running on the worker node “k8s-worker2” and the deployment pods are running on the both the nodes of Kubernetes cluster.

Draining the node?

Now lets drain the node “k8s-worker2” since its running Pod and the deployment Pods also.

$ kubectl drain k8s-worker2 --ignore-daemonsets --force
node/k8s-worker2 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-svhvv, kube-system/kube-proxy-5nk2w; deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: default/my-pod
evicting pod default/my-deployment-655f58b4cb-hkdfh
evicting pod default/my-pod
pod/my-pod evicted
pod/my-deployment-655f58b4cb-hkdfh evicted
node/k8s-worker2 evicted
$

Note:

Make sure you use the option “–ignore-daemonsets” and “–force” in case you have standalone Pod and deployment running within your Kubernetes cluster other wise system wont allow you to drain the node and it throws following error:

$ kubectl drain k8s-worker2
node/k8s-worker2 cordoned
error: unable to drain node "k8s-worker2", aborting command...

There are pending nodes to be drained:
 k8s-worker2
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-svhvv, kube-system/kube-proxy-5nk2w
cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): default/my-pod
$

Once you drain the node successfully check the status of the node using below command:

$ kubectl get node
NAME          STATUS                     ROLES    AGE    VERSION
k8s-control   Ready                      master   172m   v1.18.5
k8s-worker1   Ready                      <none>   171m   v1.18.5
k8s-worker2   Ready,SchedulingDisabled   <none>   171m   v1.18.5
$

In the above output you can see that the node “k8s-worker2” has been marked as disabled for Scheduling the resource on it.  Now lets check the status of our Pod and Deployment Pods after draining the node.

$ kubectl get pods -o wide
NAME                             READY   STATUS    RESTARTS   AGE    IP               NODE          NOMINATED NODE   READINESS GATES
my-deployment-655f58b4cb-2sn4r   1/1     Running   0          116s   192.168.194.70   k8s-worker1   <none>           <none>
my-deployment-655f58b4cb-gtjt4   1/1     Running   0          20s    192.168.194.71   k8s-worker1   <none>           <none>
$

From the above output its cleared that, the standalone Pod “my-pod” being deleted due to Drain activity, however Pod that is being run using deployment got killed and rescheduled on the other node which is “k8s-worker1”.

Joining the maintenance node back again to the Kubernetes cluster: 

Now lets assume that, our maintenance activity on the node “k8s-worker2” is completed and we want to join the node to our Kubernetes cluster. Then we need to uncordon the node as below:

$ kubectl uncordon k8s-worker2
node/k8s-worker2 uncordoned
$

Now if you check the status of the node it will show Ready status and that to without “SchedulingDisabled” status as below:

$ kubectl get node
NAME          STATUS   ROLES    AGE    VERSION
k8s-control   Ready    master   174m   v1.18.5
k8s-worker1   Ready    <none>   174m   v1.18.5
k8s-worker2   Ready    <none>   174m   v1.18.5
$

And also if you check the status of the deployment Pods its remains unchanged.

$ kubectl get pods -o wide
NAME                             READY   STATUS    RESTARTS   AGE     IP               NODE          NOMINATED NODE   READINESS GATES
my-deployment-655f58b4cb-2sn4r   1/1     Running   0          3m37s   192.168.194.70   k8s-worker1   <none>           <none>
my-deployment-655f58b4cb-gtjt4   1/1     Running   0          2m1s    192.168.194.71   k8s-worker1   <none>           <none>
$

This states that, even if you join the maintenance node again back to Kubernetes cluster the earlier running Pods keeps running on the same node.

So this is all about question “How to drain node in Kubernetes?” for the maintenance activity.

In case you want to go through other Kubernetes articles follow this link.

Related Post