目錄
- 正文
- kubernetes調(diào)度pod簡介
- kubelet 創(chuàng)建pod代碼及圖解說明
- kubelet 簡介
- kubelet創(chuàng)建及啟動pod流程
- kubelet 創(chuàng)建pod代碼調(diào)用圖解
- kubelet 創(chuàng)建pod詳細(xì)說明
- kubelet 調(diào)用cri說明
- kubelet創(chuàng)建pod整體架構(gòu)圖
- kubelet創(chuàng)建pod日志說明
正文
本文將從如下方面介紹kubelet創(chuàng)建pod的過程
- kubernetes調(diào)度pod簡介
- kubelet 創(chuàng)建pod代碼圖解說明 (本文重點)
- kubelet 調(diào)用cri創(chuàng)建容器說明 (本文重點)
- 通過日志來分析kubelet真實創(chuàng)建日志的全過程 (本文重點)
kubernetes調(diào)度pod簡介
kubernetes(后面簡稱k8s)主要有三種管理(創(chuàng)建)pod的方式:
- 一種是直接申明創(chuàng)建一個裸pod
- 另一種是通過controller 來申明創(chuàng)建pod:比如,deployments、replicationcontrollers、daemonsets或者replicasets
- 還有一種是static(靜態(tài)) pod 這種用的比較少,一般是把pod的申明文件放在對應(yīng)的kubernetes/manifest 目錄下,通常用來創(chuàng)建apiserver,controller-manager,scheduler這類k8s管理組件的pod。
k8s推薦使用controller來管理pod,這符合k8s管理pod的習(xí)慣,便于使用k8s相關(guān)功能,比如彈性擴縮容,pod故障自動拉起等。 我們也以controller管理的pod為例,簡單梳理下k8s創(chuàng)建及調(diào)度pod流程,如下圖
- 客戶端請求apiserver創(chuàng)建replicasets,apiserver通過認(rèn)證、鑒權(quán)、準(zhǔn)入后,會把請求相關(guān)信息持久化至etcd
- Controller-manager 管理的replicaset controller 通過list-watch機制,watch到有replicasets創(chuàng)建請求,通過label selector發(fā)現(xiàn)集群中與這個replicasets 關(guān)聯(lián)的pod當(dāng)前狀態(tài)與期望狀態(tài)不一致,則會進行調(diào)協(xié)(reconcile)向apiserver發(fā)起創(chuàng)建pod請求
- Scheduler 通過list-watch機制來發(fā)現(xiàn)未綁定的pod,并通過預(yù)選及優(yōu)選策略算法,來計算出pod最終可調(diào)度的node節(jié)點,并通過apiserver將數(shù)據(jù)更新至etcd
- Kubelet 通過list-watch發(fā)現(xiàn)有新的pod bound到本node上,則會發(fā)起創(chuàng)建pod相關(guān)流程
kubelet 創(chuàng)建pod代碼及圖解說明
kubelet 簡介
Kubelet 有點和controller類似,也是通過list-watch相關(guān)信息,或者輪詢本地pod相關(guān)信息及事件,來觸發(fā)相關(guān)動作,使pod處于”期望狀態(tài)”,并且向apiserver上報本node(宿主機)及node里所有pod的狀態(tài)信息。
kubelet 不同于其他controller的一點就是,它是部署在每個node節(jié)點上的agent,它需要與apiserver 打交道同樣也需要與cri(contain-runtime-interface)打交道來管理node上的容器。所以它需要通過apiserver來watch到對本地pod變更的事件,也需要不斷輪詢pod狀態(tài)信息,將狀態(tài)及時同步給apiserver,所以Kubelet整體工作邏輯是loop監(jiān)聽各類生產(chǎn)者產(chǎn)生的消息或者定時觸發(fā)消息,來調(diào)用相應(yīng)的消費者(不同的子模塊)完成不同的操作,比如watch 到apiserver的請求,PLEG(pod lifecycle event generator)產(chǎn)生的事件,定時觸發(fā)的任務(wù)等
kubelet創(chuàng)建及啟動pod流程
kubelet 創(chuàng)建pod代碼調(diào)用圖解
kubelet 創(chuàng)建pod詳細(xì)說明
- 1.kubelet 會listwatch所有namespace下、綁定到本node上的pod,并將信息傳入updatechannel。kubelet 的SyncLoop(是kubele的主循環(huán)函數(shù),來控制例行循環(huán)往復(fù)的事情:同步接收、更新、處理pod變更相關(guān)信息)下的syncLoopIteration方法會監(jiān)聽多方消息,會監(jiān)聽各個消息源,來觸發(fā)相應(yīng)的操作,這個方法會接收前面listwatch到的updatechannel信息,交由對應(yīng)的handler:如pod創(chuàng)建:調(diào)用HandlePodAdditions處理,pod刪除調(diào)用HandlePodUpdates處理(DELETE is treated as a UPDATE because of graceful deletion.)
- 2.HandlePodAdditions 會對pods 進行排序,判斷,準(zhǔn)入校驗,之后調(diào)用dispatchWork 把對某個pod的操作 分配給 podWorkers 做異步操作(pod創(chuàng)建、刪除、更新)處理
- 3.異步操作會調(diào)用kubelet syncPod(syncPod is the transaction script for the sync of a single pod.)方法,syncPod會做一些pod創(chuàng)建前的準(zhǔn)備工作
a.如果pod updateType 為podkill,立即執(zhí)行并返回(走pod刪除流程)
b.pod準(zhǔn)入檢查檢查pod是否能運行在本節(jié)點
c.更新狀態(tài)給 status manager ,status manager將pod狀態(tài)上報給apiserver
d.檢查網(wǎng)絡(luò)插件是否就緒
e.創(chuàng)建并更新pod cgroups配置
f.為pod創(chuàng)建對應(yīng)的目錄:pod目錄,volume目錄
g.等待pod sepc中的volme都被attach/mount
h.從apiserver中獲取pull secrets
i.調(diào)用 containerRuntime 的 SyncPod 方法開始創(chuàng)建容器
復(fù)制代碼
- 4.containerRuntime 的 SyncPod 會做如下主要工作
a.創(chuàng)建sandbox
b.Create ephemeral containers
c.Create init containers
d.Create normal containers
復(fù)制代碼
其中創(chuàng)建sandbox是關(guān)鍵,sandbox可以理解為pod的運行環(huán)境,是業(yè)務(wù)pod的父容器,在k8s里就是pause 容器,所有容器創(chuàng)建前都需要創(chuàng)建pause容器。首先會生成podsandbox相關(guān)配置:如dnsconfig,podhostname,設(shè)置sysctl,cgroups以及namespace
然后會調(diào)用CRI(container-runtime-interface)來調(diào)用底層container runtime來真實操作容器,之后還會調(diào)用CNI插件來為容器設(shè)置網(wǎng)絡(luò)。
- 5.我們再來看下創(chuàng)建sandbox:RunpodSandbox的步驟 (ds *dockerService) RunPodSandbox 是在是一個cri的是實現(xiàn),所以在dockershim下dockershim是內(nèi)置在kubelet里的cri實現(xiàn),用來銜接kubelet與docker,dockershim翻譯為docker"墊片",很形象)。kubelet通過grp call調(diào)用的dockershim來實現(xiàn)容器的創(chuàng)建管理。
a.調(diào)用docker API Pull the image for the sandbox.
(kubelet 的sandbox鏡像:defaultSandboxImage = "k8s.gcr.io/pause:3.2")b. 調(diào)用docker Create the sandbox container.
c.Create Sandbox Checkpoint.
d.調(diào)用docker Start the sandbox container.
e.Rewrite resolv.conf file generated by docker.
f. Setup networking for the sandbox. 調(diào)用cni插件為容器設(shè)置網(wǎng)絡(luò)
kubelet 調(diào)用cri說明
我們目前container-runtime為docker,docker并不支持CRI,所以要想調(diào)用docker 操作容器,k8s內(nèi)置了dockershim來調(diào)用docker,dockershim可以理解為一個滿足CRI標(biāo)準(zhǔn)的容器運行時,kubelet通過grpc call 來調(diào)用dockershim,dockershim收到kubelet的請求后,將其轉(zhuǎn)化為REST API請求,再發(fā)送給docker daemon,docker daemon 在通過組裝請求,調(diào)用docker API來完成container的最終創(chuàng)建、啟動等相關(guān)操作。
這塊有兩個地方需要說明下:
1是為啥會有dockershim? 這里有個小故事,首先k8s再具有一定市場規(guī)模后,想與docker 解耦,不想強依賴docker,同時為了支持多種container-runtime,故制定了CRI,只有滿足CRI,kubelet便可以直接完成調(diào)用來管理container,然而docker一開始并不支持CRI,故k8s想了個這種的方式,開發(fā)了一個dockershim(docker "墊片")來轉(zhuǎn)發(fā)請求,這樣k8s也完成了對docker的解耦,當(dāng)然這看起來較繁瑣且影響性能,故在kubernetes 1.24后,kubernetes宣布啟用dockershim,需要我們在該版本后主動配置container-runtime。
2.docker這面也很早就做了應(yīng)對,docker抽離出了支持CRI標(biāo)準(zhǔn)的containerd,通過containerd來管理容器。
所以如下圖,調(diào)用docker API創(chuàng)建容器后,docker還會調(diào)用docker-containerd來管理創(chuàng)建容器,docker-containerd通過docker-containerd-shim來間接管理container,這樣一個好處就是升級或重啟docker,我們的業(yè)務(wù)容器依然可以正常運行,最終docker-containerd-shim通過runc來創(chuàng)建container,runc是docker做的基于oci的實現(xiàn)就是以前的libcontainer,用于容器創(chuàng)建。
kubelet創(chuàng)建pod整體架構(gòu)圖
(container-runtime="docker",大多數(shù)企業(yè)目前應(yīng)該都是使用的這種方式)
kubelet創(chuàng)建pod日志說明
我們通過實戰(zhàn),開啟debug日志來看下kubelet在創(chuàng)建pod時做了哪些工作
注:日志僅保留主要輸出及過濾敏感信息
1.收到新pod創(chuàng)建時間,寫入updatechannel通道
I0921 18:10:00.486345 26075 config.go:414] Receiving a new pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
2.syncLoop: 收到add事件
I0921 18:10:00.757557 26075 kubelet.go:2007] SyncLoop (ADD, "api"): opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)
3.準(zhǔn)入驗證pod fit success
I0921 18:10:00.759786 26075 predicates.go:986] Pod: opslk1-xxx fit success. Node: xx.xx.10.9 has enough resources.
4.流轉(zhuǎn)至syncPod,SyncPodType=create
I0921 18:10:00.759956 26075 kubelet.go:1498] syncPod "xxx-3995-11ed-80a8-48df37244930" updateType:{{ } types.SyncPodType=create)
5.獲取pod狀態(tài)
I0921 18:10:00.760128 26075 kubelet_pods.go:1529] Generating status for "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
I0921 18:10:00.760148 26075 kubelet_pods.go:1494] pod waiting > 0, pending
I0921 18:10:00.760174 26075 kubelet.go:1603] apiPodStatus.Phase:Pending pod:"opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
6.配置cgroupConfig,設(shè)置cpu,內(nèi)存
I0921 18:10:00.760200 26075 kubelet_resources.go:149] Newest cgroupConfig for pod:"opslk1-5sfjn_lktest01(739e1c1a-3175-11ed-aff8-48df37244926)"
are kubelet.cgroupResource{cpuShares:xxx, cpuQuota:xxx, memoryLimit:xxx, memoryLimitSwap:xxx}.
7.等待pod相關(guān)volume attach及掛載
I0921 18:10:00.768211 26075 volume_manager.go:350] Waiting for volumes to attach and mount for pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
8.向apiserver同步狀態(tài),先GET后PATCH
I0921 18:10:00.791361 26075 round_trippers.go:419] curl -k -v -XGET 'https://xxx/api/v1/namespaces/lktest01/pods/opslk1-xxx'
I0921 18:10:00.794250 26075 round_trippers.go:419] curl -k -v -XPATCH 'https://xxx/api/v1/namespaces/lktest01/pods/opslk1-xxx/status'
I0921 18:10:00.798998 26075 status_manager.go:506] Status for pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)" updated successfully: (1, {Phase:Pending Conditions:[{Type:Initialized
9.根據(jù)期望狀態(tài)開始調(diào)協(xié),Reconcile Pod "Ready" condition if necessary. Trigger sync pod for reconciliation.
I0921 18:10:00.799365 26075 kubelet.go:2020] SyncLoop (RECONCILE, "api"): "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
10.mount volume
I0921 18:10:02.177479 26075 operation_generator.go:506] MountVolume.WaitForAttach succeeded for volume "volume" DevicePath "/dev/mapper/docker-xxx_3995_11ed_80a8_48df37244930"
I0921 18:10:03.136754 26075 operation_generator.go:527] MountVolume.MountDevice succeeded for volume "volume" device mount path "/export/kubelet/pods/xxx-3995-11ed-80a8-48df37244930/volumes/kubernetes.io~lvm/volume"
I0921 18:10:03.136851 26075 operation_generator.go:567] MountVolume.SetUp succeeded for volume "volume" (UniqueName: "flexvolume-kubernetes.io/lvm/xxx_3995_11ed_80a8_48df37244930") pod "opslk1-xxx"
11.volumes attached、mounted 完畢
I0921 18:10:03.168555 26075 volume_manager.go:384] All volumes are attached and mounted for pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
12.調(diào)用 containerRuntime 的 SyncPod 方法開始創(chuàng)建容器
I0921 18:10:03.168568 26075 kuberuntime_manager.go:468] Syncing Pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)": &Pod{}
13.創(chuàng)建sandbox容器:Setting cgroup parent,RunPodSandbox,Calling network plugin cni to set up pod
I0921 18:10:03.168833 26075 kuberuntime_manager.go:398] No sandbox for pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)" can be found. Need to start a new one"opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
I0921 18:10:03.168885 26075 kuberuntime_manager.go:605] SyncPod received new pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)", will create a sandbox for it
I0921 18:10:03.168891 26075 kuberuntime_manager.go:614] Stopping PodSandbox for "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)", will start new one
I0921 18:10:03.168901 26075 kuberuntime_manager.go:841] Stop app containers for pod:"opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)".
I0921 18:10:03.168913 26075 kuberuntime_manager.go:666] Creating sandbox for pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
I0921 18:10:03.170818 26075 docker_service.go:460] Setting cgroup parent to: "/kubepods/burstable/podxxx-3995-11ed-80a8-48df37244930"
I0921 18:10:03.170827 26075 docker_sandbox.go:108] RunPodSandbox PodName:opslk1-xxx PodUID:xxx-3995-11ed-80a8-48df37244930 NameSpace:lktest01
I0921 18:10:04.297831 26075 plugins.go:377] Calling network plugin cni to set up pod "opslk1-xxx_lktest01"
I0921 18:10:04.298323 26075 manager.go:1011] Added container: "/kubepods/burstable/podxxx-3995-11ed-80a8-48df37244930/805dda102e017247685240c2f740295396edcb7071dfe211979215eac0870e0b"
I0921 18:10:04.298535 26075 container.go:448] Start housekeeping for container "/kubepods/burstable/podxxx-3995-11ed-80a8-48df37244930/805dda102e017247685240c2f740295396edcb7071dfe211979215eac0870e0b"
I0921 18:10:04.298693 26075 cni.go:337] Got netns path /proc/26876/ns/net
I0921 18:10:04.298701 26075 cni.go:338] Using podns path lktest01
I0921 18:10:04.298820 26075 cni.go:307] About to add CNI network cni-loopback (type=loopback)
I0921 18:10:04.301399 26075 cni.go:337] Got netns path /proc/26876/ns/net
I0921 18:10:04.301405 26075 cni.go:338] Using podns path lktest01
I0921 18:10:04.301466 26075 cni.go:307] About to add CNI network cni (type=cni)
I0921 18:10:04.392172 26075 kuberuntime_manager.go:680] Created PodSandbox "805dda102e017247685240c2f740295396edcb7071dfe211979215eac0870e0b" for pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)"
I0921 18:10:04.396981 26075 kuberuntime_manager.go:699] Determined the ip "xx.xx.226.17" for pod "opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)" after sandbox changed
14,創(chuàng)建常規(guī)容器
I0921 18:10:04.397114 26075 kuberuntime_manager.go:750] Creating container &Container{} in pod opslk1-xxx_lktest01(xxx-3995-11ed-80a8-48df37244930)
I0921 18:10:04.398859 26075 kuberuntime_container.go:108] Generating ref for container opslk: &v1.ObjectReference{Kind:"Pod", Namespace:"lktest01", Name:"opslk1-xxx"}
I0921 18:10:04.398883 26075 kuberuntime_container.go:117] To determine whether to restart the old container. Pod:opslk1-xxx_lktest01 PodIP: PodSandboxId: NameSpace:lktest01
I0921 18:10:04.398888 26075 kuberuntime_container.go:258] pod:opslk1-xxx default KeepRootDirForPod: true
I0921 18:10:04.398935 26075 server.go:471] Event(v1.ObjectReference{Kind:"Pod", Namespace:"lktest01", Name:"opslk1-xxx", UID:"xxx-3995-11ed-80a8-48df37244930", APIVersion:"v1", ResourceVersion:"19846024411", FieldPath:"spec.containers{opslk}"})
以上就是詳解kubelet 創(chuàng)建pod流程代碼圖解及日志說明的詳細(xì)內(nèi)容,更多關(guān)于kubelet創(chuàng)建pod流程的資料請關(guān)注其它相關(guān)文章!