當(dāng)我們使用 Kubernetes 部署應(yīng)用后，會發(fā)現(xiàn)如果用戶增長速度超過預(yù)期，以至于計算資源不夠時，你會怎么做呢？Kubernetes 給出的解決方案就是：自動伸縮（auto-scaling），通過自動伸縮組件之間的配合，可以 7*24 小時的監(jiān)控著你的集群，動態(tài)變化負(fù)載，以適應(yīng)你的用戶需求。

如何使 Kubernetes 集群自動擴(kuò)容？Cluster Autoscaler 全面解析

自動伸縮組件

水平自動伸縮（Horizontal Pod Autoscaler，HPA）

HPA 可以基于實時的 CPU 利用率自動伸縮 Replication Controller、Deployment 和 Replica Set 中的 Pod 數(shù)量。也可以通過搭配 Metrics Server 基于其他的度量指標(biāo)。

垂直自動伸縮（Vertical Pod Autoscaler，VPA）

VPA 可以基于 Pod 的使用資源來自動設(shè)置 Pod 所需資源并且能夠在運行時自動調(diào)整資源。

集群自動伸縮（Cluster Autoscaler，CA）

CA 是一個可以自動伸縮集群 Node 的組件。如果集群中有未被調(diào)度的 Pod，它將會自動擴(kuò)展 Node 來使 Pod 可用，或是在發(fā)現(xiàn)集群中的 Node 資源使用率過低時，刪除 Node 來節(jié)約資源。

插件伸縮（Addon Resizer）

這是一個小插件，它以 Sidecar 的形式來垂直伸縮與自己同一個部署中的另一個容器，目前唯一的策略就是根據(jù)集群中節(jié)點的數(shù)量來進(jìn)行線性擴(kuò)展。通常與 [Metrics Server](
https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/metrics-server/metrics-server-deployment.yaml#L66) 配合使用，以保證其可以負(fù)擔(dān)不斷擴(kuò)大的整個集群的 metrics API 服務(wù)。

通過 HPA 伸縮無狀態(tài)應(yīng)用，VPA 伸縮有狀態(tài)應(yīng)用，CA 保證計算資源，它們的配合使用，構(gòu)成了一個完整的自動伸縮解決方案。

Cluster Autoscaler 詳細(xì)介紹

上面介紹的四個組件中，HPA 是在 kubernetes 代碼倉庫中的，隨著 kubernetes 的版本進(jìn)行更新發(fā)布，不需要部署，可以直接使用。其他的三個組件都在官方社區(qū)維護(hù)的倉庫(
https://github.com/kubernetes/autoscaler)中，Cluster Autoscaler 的 v1.0(GA) 版本已經(jīng)隨著 kubernetes 1.8 一起發(fā)布，剩下兩個則還是 beta 版本。

部署

Cluster Autoscaler 通常需要搭配云廠商使用，它提供了 Cloud Provider 接口供各個云廠商接入，云廠商通過伸縮組（Scaling Group）或節(jié)點池（Node Pool）的功能對 ECS 類產(chǎn)品節(jié)點進(jìn)行增加刪除等操作。

目前（v1.18.1）已接入的云廠商：

Alicloud：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/alicloud/README.md

Aws：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md

Azure：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/README.md

Baiducloud：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/baiducloud/README.md

Digitalocean：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/digitalocean/README.md

googleCloud GCE：https://kubernetes.io/docs/tasks/administer-cluster/cluster-management/#upgrading-google-compute-engine-clusters

GoogleCloud GKE：https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler

OpenStack Magnum：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/magnum/README.md

Packet：https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/packet/README.md

啟動參數(shù)：
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#
what-are-the-parameters-to-ca

工作原理

Cluster Autoscaler 抽象出了一個 NodeGroup 的概念，與之對應(yīng)的是云廠商的伸縮組服務(wù)。Cluster Autoscaler 通過 CloudProvider 提供的 NodeGroup 計算集群內(nèi)節(jié)點資源，以此來進(jìn)行伸縮。

在啟動后，Cluster Autoscaler 會定期（默認(rèn) 10s）檢查未調(diào)度的 Pod 和 Node 的資源使用情況，并進(jìn)行相應(yīng)的 Scale UP 和 Scale Down 操作。

Scale UP

當(dāng) Cluster Autoscaler 發(fā)現(xiàn)有 Pod 由于資源不足而無法調(diào)度時，就會通過調(diào)用 `Scale UP` 執(zhí)行擴(kuò)容操作。

在 Scale UP 中會只會計算在 NodeGroup 中存在的 Node，我們可以將 Worker Node 統(tǒng)一交由伸縮組進(jìn)行管理。并且由于伸縮組非同步加入的特性，也會考慮到 Upcoming Node。

為了業(yè)務(wù)需要，集群中可能會有不同規(guī)格的 Node，我們可以創(chuàng)建多個 NodeGroup，在擴(kuò)容時會根據(jù) --expander 選項配置指定的策略，選擇一個擴(kuò)容的節(jié)點組，支持如下[五種策略](
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-expanders)：

random：隨機(jī)選擇一個 NodeGroup。如果未指定，則默認(rèn)為此策略。

most-pods：選擇能夠調(diào)度最多 Pod 的 NodeGroup，比如有的 Pod 未調(diào)度是因為 nodeSelector，此策略會優(yōu)先選擇能滿足的 NodeGroup 來保證大多數(shù)的 Pod 可以被調(diào)度。

least-waste：為避免浪費，此策略會優(yōu)先選擇能滿足 Pod 需求資源的最小資源類型的 NodeGroup。

price：根據(jù) CloudProvider 提供的價格模型，選擇最省錢的 NodeGroup。

priority：通過配置優(yōu)先級來進(jìn)行選擇，用起來比較麻煩，需要額外的配置，可以看文檔(https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/expander/priority/readme.md)。

如果有需要，也可以平衡相似 NodeGroup 中的 Node 數(shù)量，避免 NodeGroup 達(dá)到 MaxSize 而導(dǎo)致無法加入新 Node。通過
--balance-similar-node-groups 選項配置，默認(rèn)為 false。

在經(jīng)過一系列的操作后，最終計算出要擴(kuò)容的 Node 數(shù)量及 NodeGroup，使用 CloudProvider 執(zhí)行 IncreaseSize 操作，增加云廠商的伸縮組大小，從而完成擴(kuò)容操作。

文字表達(dá)能力不足，如果有不清晰的地方，可以參考下面的 ScaleUP 源碼解析。

Scale Down

縮容是一個可選的功能，通過 --scale-down-enabled 選項配置，默認(rèn)為 true。

在 Cluster Autoscaler 監(jiān)控 Node 資源時，如果發(fā)現(xiàn)有 Node 滿足以下三個條件時，就會標(biāo)記這個 Node 為 unneeded：

Node 上運行的所有的 Pod 的 Cpu 和內(nèi)存之和小于該 Node 可分配容量的 50%。可通過 --scale-down-utilization-threshold 選項改變這個配置。

Node 上所有的 Pod 都可以被調(diào)度到其他節(jié)點。

Node 沒有表示不可縮容的 annotaition。

如果一個 Node 被標(biāo)記為 unneeded 超過 10 分鐘（可通過
--scale-down-unneeded-time 選項配置），則使用 CloudProvider 執(zhí)行 DeleteNodes 操作將其刪除。一次最多刪除一個 unneeded Node，但空 Node 可以批量刪除，每次最多刪除 10 個（通過 ----max-empty-bulk-delete 選項配置）。

實際上并不是只有這一個判定條件，還會有其他的條件來阻止刪除這個 Node，比如 NodeGroup 已達(dá)到 MinSize，或在過去的 10 分鐘內(nèi)有過一次 Scale UP 操作（通過
--scale-down-delay-after-add 選項配置）等等，更詳細(xì)可查看文檔(
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-scale-down-work)。

Cluster Autoscaler 的工作機(jī)制很復(fù)雜，但其中大部分都能通過 flags 進(jìn)行配置，如果有需要，請詳細(xì)閱讀文檔：
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md

如何實現(xiàn) CloudProvider

如果使用上述中已實現(xiàn)接入的云廠商，只需要通過 --cloud-provider 選項指定來自哪個云廠商就可以，如果想要對接自己的 IaaS 或有特定的業(yè)務(wù)邏輯，就需要自己實現(xiàn) CloudProvider Interface 與 NodeGroupInterface。并將其注冊到 builder 中，用于通過 --cloud-provider 參數(shù)指定。

builder 在 cloudprovider/builder 中的 builder_all.go (
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/builder/builder_all.go) 中注冊，也可以在其中新建一個自己的 build，通過 go 文件的 +build 編譯參數(shù)來指定使用的 CloudProvider。

CloudProvider 接口與 NodeGroup 接口在 cloud_provider.go (
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/cloud_provider.go) 中定義，其中需要注意的是 Refresh 方法，它會在每一次循環(huán)（默認(rèn) 10 秒）的開始時調(diào)用，可在此時請求接口并刷新 NodeGroup 狀態(tài)，通常的做法是增加一個 manager 用于管理狀態(tài)。有不理解的部分可參考其他 CloudProvider 的實現(xiàn)。

type CloudProvider interface {

	// Name returns name of the cloud provider.

	Name() string

	// NodeGroups returns all node groups configured for this cloud provider.

	// 會在一次循環(huán)中多次調(diào)用此方法，所以不適合每次都請求云廠商服務(wù)，可以在 Refresh 時存儲狀態(tài)

	NodeGroups() []NodeGroup

	// NodeGroupForNode returns the node group for the given node, nil if the node

	// should not be processed by cluster autoscaler, or non-nil error if such

	// occurred. Must be implemented.

	// 同上

	NodeGroupForNode(*apiv1.Node) (NodeGroup, error)

	// Pricing returns pricing model for this cloud provider or error if not available.

	// Implementation optional.

	// 如果不使用 price expander 就可以不實現(xiàn)此方法

	Pricing() (PricingModel, errors.AutoscalerError)

	// GetAvailablemachineTypes get all machine types that can be requested from the cloud provider.

	// Implementation optional.

	// 沒用，不需要實現(xiàn)

	GetAvailableMachineTypes() ([]string, error)

	// NewNodeGroup builds a theoretical node group based on the node definition provided. The node group is not automatically

	// created on the cloud provider side. The node group is not returned by NodeGroups() until it is created.

	// Implementation optional.

	// 通常情況下，不需要實現(xiàn)此方法，但如果你需要 ClusterAutoscaler 創(chuàng)建一個默認(rèn)的 NodeGroup 的話，也可以實現(xiàn)。

	// 但其實更好的做法是將默認(rèn) NodeGroup 寫入云端的伸縮組

	NewNodeGroup(machineType string, labels map[string]string, systemLabels map[string]string,

		taints []apiv1.Taint, extraResources map[string]resource.Quantity) (NodeGroup, error)

	// GetResourceLimiter returns struct containing limits (max, min) for resources (cores, memory etc.).

	// 資源限制對象，會在 build 時傳入，通常情況下不需要更改，除非在云端有顯示的提示用戶更改的地方，否則使用時會迷惑用戶

	GetResourceLimiter() (*ResourceLimiter, error)

	// GPULabel returns the label added to nodes with GPU resource.

	// GPU 相關(guān)，如果集群中有使用 GPU 資源，需要返回對應(yīng)內(nèi)容。 hack: we assume anything which is not cpu/memory to be a gpu.

	GPULabel() string

	// GetAvailableGPUTypes return all available GPU types cloud provider supports.

	// 同上

	GetAvailableGPUTypes() map[string]struct{}

	// Cleanup cleans up open resources before the cloud provider is destroyed, i.e. go routines etc.

	// CloudProvider 只會在啟動時被初始化一次，如果每次循環(huán)后有需要清除的內(nèi)容，在這里處理

	Cleanup() error

	// Refresh is called before every main loop and can be used to dynamically update cloud provider state.

	// In particular the list of node groups returned by NodeGroups can change as a result of CloudProvider.Refresh().

	// 會在 StaticAutoscaler RunOnce 中被調(diào)用

	Refresh() error

}

// NodeGroup contains configuration info and functions to control a set

// of nodes that have the same capacity and set of labels.

type NodeGroup interface {

	// MaxSize returns maximum size of the node group.

	MaxSize() int

	// MinSize returns minimum size of the node group.

	MinSize() int

	// TargetSize returns the current target size of the node group. It is possible that the

	// number of nodes in Kubernetes is different at the moment but should be equal

	// to Size() once everything stabilizes (new nodes finish startup and registration or

	// removed nodes are deleted completely). Implementation required.

	// 響應(yīng)的是伸縮組的節(jié)點數(shù)，并不一定與 kubernetes 中的節(jié)點數(shù)保持一致

	TargetSize() (int, error)

	// IncreaseSize increases the size of the node group. To delete a node you need

	// to explicitly name it and use DeleteNode. This function should wait until

	// node group size is updated. Implementation required.

	// 擴(kuò)容的方法，增加伸縮組的節(jié)點數(shù)

	IncreaseSize(delta int) error

	// DeleteNodes deletes nodes from this node group. Error is returned either on

	// failure or if the given node doesn't belong to this node group. This function

	// should wait until node group size is updated. Implementation required.

	// 刪除的節(jié)點一定要在該節(jié)點組中

	DeleteNodes([]*apiv1.Node) error

	// DecreaseTargetSize decreases the target size of the node group. This function

	// doesn't permit to delete any existing node and can be used only to reduce the

	// request for new nodes that have not been yet fulfilled. Delta should be negative.

	// It is assumed that cloud provider will not delete the existing nodes when there

	// is an option to just decrease the target. Implementation required.

	// 當(dāng) ClusterAutoscaler 發(fā)現(xiàn) kubernetes 節(jié)點數(shù)與伸縮組的節(jié)點數(shù)長時間不一致，會調(diào)用此方法來調(diào)整

	DecreaseTargetSize(delta int) error

	// Id returns an unique identifier of the node group.

	Id() string

	// Debug returns a string containing all information regarding this node group.

	Debug() string

	// Nodes returns a list of all nodes that belong to this node group.

	// It is required that Instance objects returned by this method have Id field set.

	// Other fields are optional.

	// This list should include also instances that might have not become a kubernetes node yet.

	// 返回伸縮組中的所有節(jié)點，哪怕它還沒有成為 kubernetes 的節(jié)點

	Nodes() ([]Instance, error)

	// TemplateNodeInfo returns a schedulernodeinfo.NodeInfo structure of an empty

	// (as if just started) node. This will be used in scale-up simulations to

	// predict what would a new node look like if a node group was expanded. The returned

	// NodeInfo is expected to have a fully populated Node object, with all of the labels,

	// capacity and allocatable information as well as all pods that are started on

	// the node by default, using manifest (most likely only kube-proxy). Implementation optional.

	// ClusterAutoscaler 會將節(jié)點信息與節(jié)點組對應(yīng)，來判斷資源條件，如果是一個空的節(jié)點組，那么就會通過此方法來虛擬一個節(jié)點信息。

	TemplateNodeInfo() (*schedulernodeinfo.NodeInfo, error)

	// Exist checks if the node group really exists on the cloud provider side. Allows to tell the

	// theoretical node group from the real one. Implementation required.

	Exist() bool

	// Create creates the node group on the cloud provider side. Implementation optional.

	// 與 CloudProvider.NewNodeGroup 配合使用

	Create() (NodeGroup, error)

	// Delete deletes the node group on the cloud provider side.

	// This will be executed only for autoprovisioned node groups, once their size drops to 0.

	// Implementation optional.

	Delete() error

	// Autoprovisioned returns true if the node group is autoprovisioned. An autoprovisioned group

	// was created by CA and can be deleted when scaled to 0.

	Autoprovisioned() bool

}

ScaleUP 源碼解析

func ScaleUp(context *context.AutoscalingContext, processors *ca_processors.AutoscalingProcessors, clusterStateRegistry *clusterstate.ClusterStateRegistry, unschedulablePods []*apiv1.Pod, nodes []*apiv1.Node, daemonSets []*Appsv1.DaemonSet, nodeInfos map[string]*schedulernodeinfo.NodeInfo, ignoredTaints taints.TaintKeySet) (*status.ScaleUpStatus, errors.AutoscalerError) {

	

	......

	// 驗證當(dāng)前集群中所有 ready node 是否來自于 nodeGroups，取得所有非組內(nèi)的 node

	nodesFromNotAutoscaledGroups, err := utils.FilterOutNodesFromNotAutoscaledGroups(nodes, context.CloudProvider)

	if err != nil {

		return &status.ScaleUpStatus{Result: status.ScaleUpError}, err.AddPrefix("failed to filter out nodes which are from not autoscaled groups: ")

	}

	nodeGroups := context.CloudProvider.NodeGroups()

	gpuLabel := context.CloudProvider.GPULabel()

	availableGPUTypes := context.CloudProvider.GetAvailableGPUTypes()

	// 資源限制對象，會在 build cloud provider 時傳入

	// 如果有需要可在 CloudProvider 中自行更改，但不建議改動，會對用戶造成迷惑

	resourceLimiter, errCP := context.CloudProvider.GetResourceLimiter()

	if errCP != nil {

		return &status.ScaleUpStatus{Result: status.ScaleUpError}, errors.ToAutoscalerError(

			errors.CloudProviderError,

			errCP)

	}

	// 計算資源限制

	// nodeInfos 是所有擁有節(jié)點組的節(jié)點與示例節(jié)點的映射

	// 示例節(jié)點會優(yōu)先考慮真實節(jié)點的數(shù)據(jù)，如果 NodeGroup 中還沒有真實節(jié)點的部署，則使用 Template 的節(jié)點數(shù)據(jù)

	scaleUpResourcesLeft, errLimits := computeScaleUpResourcesLeftLimits(context.CloudProvider, nodeGroups, nodeInfos, nodesFromNotAutoscaledGroups, resourceLimiter)

	if errLimits != nil {

		return &status.ScaleUpStatus{Result: status.ScaleUpError}, errLimits.AddPrefix("Could not compute total resources: ")

	}

	// 根據(jù)當(dāng)前節(jié)點與 NodeGroups 中的節(jié)點來計算會有多少節(jié)點即將加入集群中

	// 由于云服務(wù)商的伸縮組 increase size 操作并不是同步加入 node，所以將其統(tǒng)計，以便于后面計算節(jié)點資源

	upcomingNodes := make([]*schedulernodeinfo.NodeInfo, 0)

	for nodeGroup, numberOfNodes := range clusterStateRegistry.GetUpcomingNodes() {

		......

	}

	klog.V(4).Infof("Upcoming %d nodes", len(upcomingNodes))

	// 最終會進(jìn)入選擇的節(jié)點組

	expansionOptions := make(map[string]expander.Option, 0)

	......

	// 出于某些限制或錯誤導(dǎo)致不能加入新節(jié)點的節(jié)點組，例如節(jié)點組已達(dá)到 MaxSize

	skippedNodeGroups := map[string]status.Reasons{}

	// 綜合各種情況，篩選出節(jié)點組

	for _, nodeGroup := range nodeGroups {

	......

	}

	if len(expansionOptions) == 0 {

		klog.V(1).Info("No expansion options")

		return &status.ScaleUpStatus{

			Result:					status.ScaleUpNoOptionsAvailable,

			PodsRemainUnschedulable: getRemainingPods(podEquivalenceGroups, skippedNodeGroups),

			ConsideredNodeGroups:	nodeGroups,

		}, nil

	}

	......

	// 選擇一個最佳的節(jié)點組進(jìn)行擴(kuò)容，expander 用于選擇一個合適的節(jié)點組進(jìn)行擴(kuò)容，默認(rèn)為 RandomExpander，flag: expander

	// random 隨機(jī)選一個，適合只有一個節(jié)點組

	// most-pods 選擇能夠調(diào)度最多 pod 的節(jié)點組，比如有 noSchedulerPods 是有 nodeSelector 的，它會優(yōu)先選擇此類節(jié)點組以滿足大多數(shù) pod 的需求

	// least-waste 優(yōu)先選擇能滿足 pod 需求資源的最小資源類型的節(jié)點組

	// price 根據(jù)價格模型，選擇最省錢的

	// priority 根據(jù)優(yōu)先級選擇

	bestOption := context.ExpanderStrategy.BestOption(options, nodeInfos)

	if bestOption != nil && bestOption.NodeCount > 0 {

	......

		newNodes := bestOption.NodeCount

		// 考慮到 upcomingNodes, 重新計算本次新加入節(jié)點

		if context.MaxNodesTotal > 0 && len(nodes)+newNodes+len(upcomingNodes) > context.MaxNodesTotal {

			klog.V(1).Infof("Capping size to max cluster total size (%d)", context.MaxNodesTotal)

			newNodes = context.MaxNodesTotal - len(nodes) - len(upcomingNodes)

			if newNodes < 1 {

				return &status.ScaleUpStatus{Result: status.ScaleUpError}, errors.NewAutoscalerError(

					errors.TransientError,

					"max node total count already reached")

			}

		}

		createNodeGroupResults := make([]nodegroups.CreateNodeGroupResult, 0)

	

		// 如果節(jié)點組在云服務(wù)商端處不存在，會嘗試創(chuàng)建根據(jù)現(xiàn)有信息重新創(chuàng)建一個云端節(jié)點組

		// 但是目前所有的 CloudProvider 實現(xiàn)都沒有允許這種操作，這好像是個多余的方法

		// 云服務(wù)商不想，也不應(yīng)該將云端節(jié)點組的創(chuàng)建權(quán)限交給 ClusterAutoscaler

		if !bestOption.NodeGroup.Exist() {

			oldId := bestOption.NodeGroup.Id()

			createNodeGroupResult, err := processors.NodeGroupManager.CreateNodeGroup(context, bestOption.NodeGroup)

		......

		}

		// 得到最佳節(jié)點組的示例節(jié)點

		nodeInfo, found := nodeInfos[bestOption.NodeGroup.Id()]

		if !found {

			// This should never happen, as we already should have retrieved

			// nodeInfo for any considered nodegroup.

			klog.Errorf("No node info for: %s", bestOption.NodeGroup.Id())

			return &status.ScaleUpStatus{Result: status.ScaleUpError, CreateNodeGroupResults: createNodeGroupResults}, errors.NewAutoscalerError(

				errors.CloudProviderError,

				"No node info for best expansion option!")

		}

		// 根據(jù) CPU、Memory及可能存在的 GPU 資源（hack: we assume anything which is not cpu/memory to be a gpu.），計算出需要多少個 Nodes

		newNodes, err = applyScaleUpResourcesLimits(context.CloudProvider, newNodes, scaleUpResourcesLeft, nodeInfo, bestOption.NodeGroup, resourceLimiter)

		if err != nil {

			return &status.ScaleUpStatus{Result: status.ScaleUpError, CreateNodeGroupResults: createNodeGroupResults}, err

		}

		// 需要平衡的節(jié)點組

		targetNodeGroups := []cloudprovider.NodeGroup{bestOption.NodeGroup}

		// 如果需要平衡節(jié)點組，根據(jù) balance-similar-node-groups flag 設(shè)置。

		// 檢測相似的節(jié)點組，并平衡它們之間的節(jié)點數(shù)量

		if context.BalanceSimilarNodeGroups {

		......

		}

		// 具體平衡策略可以看 (b *BalancingNodeGroupSetProcessor) BalanceScaleUpBetweenGroups 方法

		scaleUpInfos, typedErr := processors.NodeGroupSetProcessor.BalanceScaleUpBetweenGroups(context, targetNodeGroups, newNodes)

		if typedErr != nil {

			return &status.ScaleUpStatus{Result: status.ScaleUpError, CreateNodeGroupResults: createNodeGroupResults}, typedErr

		}

		klog.V(1).Infof("Final scale-up plan: %v", scaleUpInfos)

		// 開始擴(kuò)容，通過 IncreaseSize 擴(kuò)容

		for _, info := range scaleUpInfos {

			typedErr := executeScaleUp(context, clusterStateRegistry, info, gpu.GetGpuTypeForMetrics(gpuLabel, availableGPUTypes, nodeInfo.Node(), nil), now)

			if typedErr != nil {

				return &status.ScaleUpStatus{Result: status.ScaleUpError, CreateNodeGroupResults: createNodeGroupResults}, typedErr

			}

		}

		......

	}

	......

}