目錄
- 一.系統環境
- 二.前言
- 三.cordon節點
- 3.1 cordon節點概覽
- 3.2 cordon節點
- 3.3 uncordon節點
- 四.drain節點
- 4.1 drain節點概覽
- 4.2 drain 節點
- 4.3 uncordon節點
- 五.delete 節點
- 5.1 delete節點概覽
- 5.2 delete節點
一.系統環境
服務器版本 | docker軟件版本 | Kubernetes(k8s)集群版本 | CPU架構 |
---|---|---|---|
CentOS Linux release 7.4.1708 (Core) | Docker version 20.10.12 | v1.21.9 | x86_64 |
Kubernetes集群架構:k8scloude1作為master節點,k8scloude2,k8scloude3作為worker節點
服務器 | 操作系統版本 | CPU架構 | 進程 | 功能描述 |
---|---|---|---|---|
k8scloude1/192.168.110.130 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kube-apiserver,etcd,kube-scheduler,kube-controller-manager,kubelet,kube-proxy,coredns,calico | k8s master節點 |
k8scloude2/192.168.110.129 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kubelet,kube-proxy,calico | k8s worker節點 |
k8scloude3/192.168.110.128 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kubelet,kube-proxy,calico | k8s worker節點 |
二.前言
本文介紹cordon節點,drain驅逐節點,delete 節點,在對k8s集群節點執行維護(例如內核升級、硬件維護等)時候會用到。
cordon節點,drain驅逐節點,delete 節點的前提是已經有一套可以正常運行的Kubernetes集群,關于Kubernetes(k8s)集群的安裝部署,可以查看博客《Centos7 安裝部署Kubernetes(k8s)集群》
三.cordon節點
3.1 cordon節點概覽
cordon 節點會使其停止調度,會將node狀態調為SchedulingDisabled,之后再創建新pod,新pod不會被調度到該節點,原有的pod不會受到影響,仍正常對外提供服務。
3.2 cordon節點
創建目錄存放yaml文件
[root@k8scloude1 ~]# mkdir deploy [root@k8scloude1 ~]# cd deploy/
使用–dry-run生成deploy配置文件
[root@k8scloude1 deploy]# kubectl create deploy nginx --image=nginx --dry-run=client -o yaml >nginx.yaml [root@k8scloude1 deploy]# cat nginx.yaml apiVersion: apps/v1 kind: Deployment metadata: creationTimestamp: null labels: app: nginx name: nginx spec: replicas: 1 selector: matchLabels: app: nginx strategy: {} template: metadata: creationTimestamp: null labels: app: nginx spec: containers: - image: nginx name: nginx resources: {} status: {}
修改deploy配置文件,replicas: 5表示副本數為 5,deploy將創建5個pod
[root@k8scloude1 deploy]# vim nginx.yaml #修改配置文件: # replicas: 5 副本數修改為5 #terminationGracePeriodSeconds: 0 寬限期修改為0 # imagePullPolicy: IfNotPresent 鏡像下載策略為存在鏡像就不下載 [root@k8scloude1 deploy]# cat nginx.yaml apiVersion: apps/v1 kind: Deployment metadata: creationTimestamp: null labels: app: nginx name: nginx spec: replicas: 5 selector: matchLabels: app: nginx strategy: {} template: metadata: creationTimestamp: null labels: app: nginx spec: terminationGracePeriodSeconds: 0 containers: - image: nginx name: nginx imagePullPolicy: IfNotPresent resources: {} status: {}
創建deploy和使用pod yaml文件創建pod
[root@k8scloude1 deploy]# cat pod.yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: run: pod1 name: pod1 spec: terminationGracePeriodSeconds: 0 containers: - image: nginx imagePullPolicy: IfNotPresent name: n1 resources: {} dnsPolicy: ClusterFirst restartPolicy: Always status: {} [root@k8scloude1 deploy]# kubectl apply -f pod.yaml pod/pod1 created [root@k8scloude1 deploy]# kubectl apply -f nginx.yaml deployment.apps/nginx created
查看pod,可以看到deploy生成5個pod(nginx-6cf858f6cf-XXXXXXX),還有一個pod1。
[root@k8scloude1 deploy]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6cf858f6cf-fwhmh 1/1 Running 0 52s 10.244.251.217 k8scloude3 <none> <none> nginx-6cf858f6cf-hr6bn 1/1 Running 0 52s 10.244.251.218 k8scloude3 <none> <none> nginx-6cf858f6cf-j2ccs 1/1 Running 0 52s 10.244.112.161 k8scloude2 <none> <none> nginx-6cf858f6cf-l7n4w 1/1 Running 0 52s 10.244.112.162 k8scloude2 <none> <none> nginx-6cf858f6cf-t6qxq 1/1 Running 0 52s 10.244.112.163 k8scloude2 <none> <none> pod1 1/1 Running 0 60s 10.244.251.216 k8scloude3 <none> <none>
假設某天要對k8scloude2進行維護測試,不希望k8scloude2節點上被分配新的pod,可以對某個節點執行cordon之后,此節點就不會再調度新的pod了
cordon k8scloude2節點,k8scloude2節點變為SchedulingDisabled狀態
[root@k8scloude1 deploy]# kubectl cordon k8scloude2 node/k8scloude2 cordoned [root@k8scloude1 deploy]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8scloude1 Ready control-plane,master 8d v1.21.0 k8scloude2 Ready,SchedulingDisabled <none> 7d23h v1.21.0 k8scloude3 Ready <none> 7d23h v1.21.0
kubectl scale deploy命令使nginx deploy的副本數擴展為10個
[root@k8scloude1 deploy]# kubectl scale deploy nginx --replicas=10 deployment.apps/nginx scaled
查看pod,可以發現新生成的pod都被調度到到k8scloude3上,某個節點被cordon之后,新的pod將不被調度到該節點,原先的pod不變。
[root@k8scloude1 deploy]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6cf858f6cf-7fdnr 1/1 Running 0 4s 10.244.251.221 k8scloude3 <none> <none> nginx-6cf858f6cf-fwhmh 1/1 Running 0 9m9s 10.244.251.217 k8scloude3 <none> <none> nginx-6cf858f6cf-g92ls 1/1 Running 0 4s 10.244.251.219 k8scloude3 <none> <none> nginx-6cf858f6cf-hr6bn 1/1 Running 0 9m9s 10.244.251.218 k8scloude3 <none> <none> nginx-6cf858f6cf-j2ccs 1/1 Running 0 9m9s 10.244.112.161 k8scloude2 <none> <none> nginx-6cf858f6cf-l7n4w 1/1 Running 0 9m9s 10.244.112.162 k8scloude2 <none> <none> nginx-6cf858f6cf-lsvsg 1/1 Running 0 4s 10.244.251.223 k8scloude3 <none> <none> nginx-6cf858f6cf-mpwjl 1/1 Running 0 4s 10.244.251.222 k8scloude3 <none> <none> nginx-6cf858f6cf-s8x6b 1/1 Running 0 4s 10.244.251.220 k8scloude3 <none> <none> nginx-6cf858f6cf-t6qxq 1/1 Running 0 9m9s 10.244.112.163 k8scloude2 <none> <none> pod1 1/1 Running 0 9m17s 10.244.251.216 k8scloude3 <none> <none>
來個極端的例子,先把deploy的副本數變為0,再變為10,此時所有的pod都運行在k8scloude3節點了。
[root@k8scloude1 deploy]# kubectl scale deploy nginx --replicas=0 deployment.apps/nginx scaled [root@k8scloude1 deploy]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 10m 10.244.251.216 k8scloude3 <none> <none> [root@k8scloude1 deploy]# kubectl scale deploy nginx --replicas=10 deployment.apps/nginx scaled [root@k8scloude1 deploy]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6cf858f6cf-5cx9s 1/1 Running 0 8s 10.244.251.231 k8scloude3 <none> <none> nginx-6cf858f6cf-6cblj 1/1 Running 0 8s 10.244.251.228 k8scloude3 <none> <none> nginx-6cf858f6cf-827cz 1/1 Running 0 8s 10.244.251.233 k8scloude3 <none> <none> nginx-6cf858f6cf-b989n 1/1 Running 0 8s 10.244.251.229 k8scloude3 <none> <none> nginx-6cf858f6cf-kwxhn 1/1 Running 0 8s 10.244.251.224 k8scloude3 <none> <none> nginx-6cf858f6cf-ljjxz 1/1 Running 0 8s 10.244.251.225 k8scloude3 <none> <none> nginx-6cf858f6cf-ltrpr 1/1 Running 0 8s 10.244.251.227 k8scloude3 <none> <none> nginx-6cf858f6cf-lwf7g 1/1 Running 0 8s 10.244.251.230 k8scloude3 <none> <none> nginx-6cf858f6cf-xw84l 1/1 Running 0 8s 10.244.251.226 k8scloude3 <none> <none> nginx-6cf858f6cf-zpwhq 1/1 Running 0 8s 10.244.251.232 k8scloude3 <none> <none> pod1 1/1 Running 0 11m 10.244.251.216 k8scloude3 <none> <none>
3.3 uncordon節點
要讓節點恢復調度pod,uncordon即可。
uncordon k8scloude2節點,k8scloude2節點狀態變為Ready,恢復調度。
#需要uncordon [root@k8scloude1 deploy]# kubectl uncordon k8scloude2 node/k8scloude2 uncordoned [root@k8scloude1 deploy]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8scloude1 Ready control-plane,master 8d v1.21.0 k8scloude2 Ready <none> 8d v1.21.0 k8scloude3 Ready <none> 8d v1.21.0
四.drain節點
4.1 drain節點概覽
在對節點執行維護(例如內核升級、硬件維護等)之前, 可以使用 kubectl drain 從節點安全地逐出所有 Pods。 安全的驅逐過程允許 Pod 的容器 體面地終止, 并確保滿足指定的 PodDisruptionBudgets,PodDisruptionBudget 是一個對象,用于定義可能對一組 Pod 造成的最大干擾。。
說明: 默認情況下, kubectl drain 將忽略節點上不能殺死的特定系統 Pod;
'drain' 驅逐或刪除除鏡像 pod 之外的所有 pod(不能通過 API 服務器刪除)。如果有 daemon set-managed pods,drain 不會在沒有 –ignore-daemonsets 的情況下繼續進行,并且無論如何它都不會刪除任何 daemon set-managed pods,因為這些 pods 將立即被 daemon set 控制器替換,它會忽略不可調度的標記。如果有任何 pod 既不是鏡像 pod,也不是由復制控制器、副本集、守護程序集、有狀態集或作業管理的,那么除非您使用 –force,否則 drain 不會刪除任何 pod。如果一個或多個 pod 的管理資源丟失, –force 也將允許繼續刪除。
kubectl drain 的成功返回,表明所有的 Pods(除了上一段中描述的被排除的那些), 已經被安全地逐出(考慮到期望的終止寬限期和你定義的 PodDisruptionBudget)。 然后就可以安全地關閉節點, 比如關閉物理機器的電源,如果它運行在云平臺上,則刪除它的虛擬機。
4.2 drain 節點
查看node狀態和pod
[root@k8scloude1 deploy]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8scloude1 Ready control-plane,master 8d v1.21.0 k8scloude2 Ready <none> 8d v1.21.0 k8scloude3 Ready <none> 8d v1.21.0 [root@k8scloude1 deploy]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6cf858f6cf-58wnd 1/1 Running 0 65s 10.244.112.167 k8scloude2 <none> <none> nginx-6cf858f6cf-5rrk4 1/1 Running 0 65s 10.244.112.164 k8scloude2 <none> <none> nginx-6cf858f6cf-86wxr 1/1 Running 0 65s 10.244.251.237 k8scloude3 <none> <none> nginx-6cf858f6cf-89wj9 1/1 Running 0 65s 10.244.112.168 k8scloude2 <none> <none> nginx-6cf858f6cf-9njrj 1/1 Running 0 65s 10.244.251.236 k8scloude3 <none> <none> nginx-6cf858f6cf-hchtb 1/1 Running 0 65s 10.244.251.234 k8scloude3 <none> <none> nginx-6cf858f6cf-mb2ft 1/1 Running 0 65s 10.244.112.166 k8scloude2 <none> <none> nginx-6cf858f6cf-nq6zv 1/1 Running 0 65s 10.244.112.169 k8scloude2 <none> <none> nginx-6cf858f6cf-pl7ww 1/1 Running 0 65s 10.244.251.235 k8scloude3 <none> <none> nginx-6cf858f6cf-sf2w6 1/1 Running 0 65s 10.244.112.165 k8scloude2 <none> <none> pod1 1/1 Running 0 36m 10.244.251.216 k8scloude3 <none> <none>
drain驅逐節點:drain=cordon+evicted
drain k8scloude2節點,–delete-emptydir-data刪除數據,–ignore-daemonsets忽略daemonsets
[root@k8scloude1 deploy]# kubectl drain k8scloude2 node/k8scloude2 cordoned error: unable to drain node "k8scloude2", aborting command... There are pending nodes to be drained: k8scloude2 cannot delete Pods with local storage (use --delete-emptydir-data to override): kube-system/metrics-server-bcfb98c76-k5dmj cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-nsbfs, kube-system/kube-proxy-lpj8z [root@k8scloude1 deploy]# kubectl get node NAME STATUS ROLES AGE VERSION k8scloude1 Ready control-plane,master 8d v1.21.0 k8scloude2 Ready,SchedulingDisabled <none> 8d v1.21.0 k8scloude3 Ready <none> 8d v1.21.0 [root@k8scloude1 deploy]# kubectl drain k8scloude2 --ignore-daemonsets node/k8scloude2 already cordoned error: unable to drain node "k8scloude2", aborting command... There are pending nodes to be drained: k8scloude2 error: cannot delete Pods with local storage (use --delete-emptydir-data to override): kube-system/metrics-server-bcfb98c76-k5dmj [root@k8scloude1 deploy]# kubectl drain k8scloude2 --ignore-daemonsets --force --delete-emptydir-data node/k8scloude2 already cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-nsbfs, kube-system/kube-proxy-lpj8z evicting pod pod/nginx-6cf858f6cf-sf2w6 evicting pod pod/nginx-6cf858f6cf-5rrk4 evicting pod kube-system/metrics-server-bcfb98c76-k5dmj evicting pod pod/nginx-6cf858f6cf-58wnd evicting pod pod/nginx-6cf858f6cf-mb2ft evicting pod pod/nginx-6cf858f6cf-89wj9 evicting pod pod/nginx-6cf858f6cf-nq6zv pod/nginx-6cf858f6cf-5rrk4 evicted pod/nginx-6cf858f6cf-mb2ft evicted pod/nginx-6cf858f6cf-sf2w6 evicted pod/nginx-6cf858f6cf-58wnd evicted pod/nginx-6cf858f6cf-nq6zv evicted pod/nginx-6cf858f6cf-89wj9 evicted pod/metrics-server-bcfb98c76-k5dmj evicted node/k8scloude2 evicted
查看pod,k8scloude2節點被drain之后,pod都調度到了k8scloude3節點。
節點被drain驅逐的本質就是刪除節點上的pod,k8scloude2節點被drain驅逐之后,k8scloude2上運行的pod會被刪除。
deploy是一個控制器,會監控pod的副本數,當k8scloude2上的pod被驅逐之后,副本數少于10,于是在可調度的節點創建pod,補足副本數。
單獨的pod不具備再生性,刪除之后就真刪除了,如果k8scloude3被驅逐,則pod pod1會被刪除,其他可調度節點也不會再生一個pod1。
[root@k8scloude1 deploy]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6cf858f6cf-7gh4z 1/1 Running 0 84s 10.244.251.240 k8scloude3 <none> <none> nginx-6cf858f6cf-7lmfd 1/1 Running 0 85s 10.244.251.238 k8scloude3 <none> <none> nginx-6cf858f6cf-86wxr 1/1 Running 0 6m14s 10.244.251.237 k8scloude3 <none> <none> nginx-6cf858f6cf-9bn2b 1/1 Running 0 85s 10.244.251.243 k8scloude3 <none> <none> nginx-6cf858f6cf-9njrj 1/1 Running 0 6m14s 10.244.251.236 k8scloude3 <none> <none> nginx-6cf858f6cf-bqk2w 1/1 Running 0 84s 10.244.251.241 k8scloude3 <none> <none> nginx-6cf858f6cf-hchtb 1/1 Running 0 6m14s 10.244.251.234 k8scloude3 <none> <none> nginx-6cf858f6cf-hjddp 1/1 Running 0 84s 10.244.251.244 k8scloude3 <none> <none> nginx-6cf858f6cf-pl7ww 1/1 Running 0 6m14s 10.244.251.235 k8scloude3 <none> <none> nginx-6cf858f6cf-sgxfg 1/1 Running 0 84s 10.244.251.242 k8scloude3 <none> <none> pod1 1/1 Running 0 41m 10.244.251.216 k8scloude3 <none> <none>
查看node節點狀態
[root@k8scloude1 deploy]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8scloude1 Ready control-plane,master 8d v1.21.0 k8scloude2 Ready,SchedulingDisabled <none> 8d v1.21.0 k8scloude3 Ready <none> 8d v1.21.0
4.3 uncordon節點
要取消drain某個節點,直接uncordon即可,沒有undrain操作。
[root@k8scloude1 deploy]# kubectl undrain k8scloude2 Error: unknown command "undrain" for "kubectl" Did you mean this? drain Run 'kubectl --help' for usage.
uncordon k8scloude2節點,節點恢復調度
[root@k8scloude1 deploy]# kubectl uncordon k8scloude2 node/k8scloude2 uncordoned [root@k8scloude1 deploy]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8scloude1 Ready control-plane,master 8d v1.21.0 k8scloude2 Ready <none> 8d v1.21.0 k8scloude3 Ready <none> 8d v1.21.0
把deploy副本數變為0,再變為10,再觀察pod分布
[root@k8scloude1 deploy]# kubectl scale deploy nginx --replicas=0 deployment.apps/nginx scaled [root@k8scloude1 deploy]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod1 1/1 Running 0 52m 10.244.251.216 k8scloude3 <none> <none> [root@k8scloude1 deploy]# kubectl scale deploy nginx --replicas=10 deployment.apps/nginx scaled
k8scloude2節點恢復可調度pod狀態
[root@k8scloude1 deploy]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6cf858f6cf-4sqj8 1/1 Running 0 6s 10.244.112.172 k8scloude2 <none> <none> nginx-6cf858f6cf-cjqxv 1/1 Running 0 6s 10.244.112.176 k8scloude2 <none> <none> nginx-6cf858f6cf-fk69r 1/1 Running 0 6s 10.244.112.175 k8scloude2 <none> <none> nginx-6cf858f6cf-ghznd 1/1 Running 0 6s 10.244.112.173 k8scloude2 <none> <none> nginx-6cf858f6cf-hnxzs 1/1 Running 0 6s 10.244.251.246 k8scloude3 <none> <none> nginx-6cf858f6cf-hshnm 1/1 Running 0 6s 10.244.112.171 k8scloude2 <none> <none> nginx-6cf858f6cf-jb5sh 1/1 Running 0 6s 10.244.112.170 k8scloude2 <none> <none> nginx-6cf858f6cf-l9xlm 1/1 Running 0 6s 10.244.112.174 k8scloude2 <none> <none> nginx-6cf858f6cf-pgjlb 1/1 Running 0 6s 10.244.251.247 k8scloude3 <none> <none> nginx-6cf858f6cf-rlnh6 1/1 Running 0 6s 10.244.251.245 k8scloude3 <none> <none> pod1 1/1 Running 0 52m 10.244.251.216 k8scloude3 <none> <none>
刪除deploy,刪除pod。
[root@k8scloude1 deploy]# kubectl delete -f nginx.yaml deployment.apps "nginx" deleted [root@k8scloude1 deploy]# kubectl delete pod pod1 --force warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "pod1" force deleted [root@k8scloude1 deploy]# kubectl get pods -o wide No resources found in pod namespace.
五.delete 節點
5.1 delete節點概覽
delete 刪除節點就直接把一個節點就k8s集群中刪除了,delete 節點之前需要先drain 節點。
關于delete節點以及重裝節點的詳細內容,請查看博客《模擬重裝Kubernetes(k8s)集群:刪除k8s集群然后重裝》
5.2 delete節點
kubectl drain 安全驅逐節點上面所有的 pod,–ignore-daemonsets往往需要指定的,這是因為deamonset會忽略SchedulingDisabled標簽(使用kubectl drain時會自動給節點打上不可調度SchedulingDisabled標簽),因此deamonset控制器控制的pod被刪除后,可能馬上又在此節點上啟動起來,這樣就會成為死循環。因此這里忽略daemonset。
[root@k8scloude1 ~]# kubectl drain k8scloude3 --ignore-daemonsets node/k8scloude3 cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-wmz4r, kube-system/kube-proxy-84gcx evicting pod kube-system/calico-kube-controllers-6b9fbfff44-rl2mh pod/calico-kube-controllers-6b9fbfff44-rl2mh evicted node/k8scloude3 evicted
k8scloude3變為SchedulingDisabled
[root@k8scloude1 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8scloude1 Ready control-plane,master 64m v1.21.0 k8scloude2 Ready <none> 56m v1.21.0 k8scloude3 Ready,SchedulingDisabled <none> 56m v1.21.0
刪除節點k8scloude3
[root@k8scloude1 ~]# kubectl delete nodes k8scloude3 node "k8scloude3" deleted [root@k8scloude1 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8scloude1 Ready control-plane,master 65m v1.21.0 k8scloude2 Ready <none> 57m v1.21.0
以上就是cordon節點drain驅逐節點delete節點的詳細內容,更多關于cordon drain delete節點詳解的資料請關注其它相關文章!