本文介紹了GKE Kubernetes MySQL輸入/輸出錯(cuò)誤Ext4Error的處理方法,對(duì)大家解決問(wèn)題具有一定的參考價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)吧!
問(wèn)題描述
我在Kubernetes分區(qū)集群上部署了一個(gè)MySQL數(shù)據(jù)庫(kù)(狀態(tài)集),在Google Cloud平臺(tái)上作為服務(wù)(GKE)運(yùn)行。
分區(qū)群集由類(lèi)型為e2-Medium的3個(gè)實(shí)例組成。
由于以下錯(cuò)誤,MySQL容器無(wú)法啟動(dòng)。
kubectl logs mysql-statefulset-0
2022-02-07 05:55:38+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.35-1debian10 started.
find: '/var/lib/mysql/': Input/output error
上次查看的事件。
4m57s Warning Ext4Error gke-cluster-default-pool-rnfh kernel-monitor, gke-cluster-default-pool-rnfh EXT4-fs error (device sdb): __ext4_find_entry:1532: inode #2: comm mysqld: reading directory lblock 0 40d 8062 gke-cluster-default-pool-rnfh
3m22s Warning BackOff pod/mysql-statefulset-0 spec.containers{mysql} kubelet, gke-cluster-default-pool-rnfh Back-off restarting failed container
節(jié)點(diǎn)。
kubectl get node -owide
gke-cluster-default-pool-ayqo Ready <none> 54d v1.21.5-gke.1302 So.Me.I.P So.Me.I.P Container-Optimized OS from Google 5.4.144+ containerd://1.4.8
gke-cluster-default-pool-rnfh Ready <none> 54d v1.21.5-gke.1302 So.Me.I.P So.Me.I.P Container-Optimized OS from Google 5.4.144+ containerd://1.4.8
gke-cluster-default-pool-sc3p Ready <none> 54d v1.21.5-gke.1302 So.Me.I.P So.Me.I.P Container-Optimized OS from Google 5.4.144+ containerd://1.4.8
我還注意到rnfh節(jié)點(diǎn)內(nèi)存不足。
kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-cluster-default-pool-ayqo 117m 12% 992Mi 35%
gke-cluster-default-pool-rnfh 180m 19% 2953Mi 104%
gke-cluster-default-pool-sc3p 179m 19% 1488Mi 52%
MySQL Mainfest
# HEADLESS SERVICE
apiVersion: v1
kind: Service
metadata:
name: mysql-headless-service
labels:
kind: mysql-headless-service
spec:
clusterIP: None
selector:
tier: mysql-db
ports:
- name: 'mysql-http'
protocol: 'TCP'
port: 3306
---
# STATEFUL SET
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql-statefulset
spec:
selector:
matchLabels:
tier: mysql-db
serviceName: mysql-statefulset
replicas: 1
template:
metadata:
labels:
tier: mysql-db
spec:
terminationGracePeriodSeconds: 10
containers:
- name: my-mysql
image: my-mysql:latest
imagePullPolicy: Always
args:
- "--ignore-db-dir=lost+found"
ports:
- name: 'http'
protocol: 'TCP'
containerPort: 3306
volumeMounts:
- name: mysql-pvc
mountPath: /var/lib/mysql
env:
- name: MYSQL_ROOT_USER
valueFrom:
secretKeyRef:
name: mysql-secret
key: mysql-root-username
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: mysql-root-password
- name: MYSQL_USER
valueFrom:
configMapKeyRef:
name: mysql-config
key: mysql-username
- name: MYSQL_PASSWORD
valueFrom:
configMapKeyRef:
name: mysql-config
key: mysql-password
- name: MYSQL_DATABASE
valueFrom:
configMapKeyRef:
name: mysql-config
key: mysql-database
volumeClaimTemplates:
- metadata:
name: mysql-pvc
spec:
storageClassName: 'mysql-fast'
resources:
requests:
storage: 120Gi
accessModes:
- ReadWriteOnce
- ReadOnlyMany
MySQL存儲(chǔ)類(lèi)清單:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: mysql-fast
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: Immediate
為什么Kubernetes嘗試將Pod調(diào)度到內(nèi)存不足的節(jié)點(diǎn)?
更新
我已向MySQL
清單添加了請(qǐng)求和限制,以改進(jìn)Qos Class
。現(xiàn)在Qos Class
為Guaranteed
。
遺憾的是,Kubernetes仍在嘗試調(diào)度到內(nèi)存不足rnfh
節(jié)點(diǎn)。
kubectl describe po mysql-statefulset-0 | grep node -i
Node: gke-cluster-default-pool-rnfh/So.Me.I.P
kubectl describe po mysql-statefulset-0 | grep qos -i
QoS Class: Guaranteed
推薦答案
我又運(yùn)行了幾個(gè)測(cè)試,但無(wú)法復(fù)制。
要正確回答這個(gè)問(wèn)題,我們需要更多的日志。不確定你是否還留著它們。如果我能猜到哪個(gè)是這個(gè)問(wèn)題的根本原因,我會(huì)說(shuō)它與PersistentVolume有關(guān)。
在其中一個(gè)Github issue – Volume was remounted as read only after error #752中,我發(fā)現(xiàn)其行為與OP的行為非常相似。
您已經(jīng)為您的MySQL創(chuàng)建了special
存儲(chǔ)類(lèi)。您已設(shè)置reclaimPolicy: Retain
,因此未刪除PV。當(dāng)Statefulset
Pod(具有相同后綴-0
)重新創(chuàng)建(由于連接錯(cuò)誤、數(shù)據(jù)庫(kù)上的一些問(wèn)題而重新啟動(dòng),很難說(shuō))時(shí),它會(huì)嘗試重新認(rèn)領(lǐng)此卷。在提到的Github問(wèn)題中,用戶也有非常相似的情況。也有inode #262147: comm mysqld: reading directory lblock
問(wèn)題,但在下面也有條目[ +0.003695] EXT4-fs (sda): Remounting filesystem read-only
。可能在重新裝載時(shí)更改了權(quán)限?
您的volumeClaimTemplates
包含的另一件事
accessModes:
- ReadWriteOnce
- ReadOnlyMany
因此,一個(gè)PersistentVolume
可以被一個(gè)節(jié)點(diǎn)用作ReadWriteOnce
,也可以被多個(gè)節(jié)點(diǎn)僅用作ReadOnlyMany
。有可能使用Read-Only
評(píng)估模式在不同節(jié)點(diǎn)中重新創(chuàng)建POD。
[ +35.912075] EXT4-fs warning (device sda): htree_dirblock_to_tree:977: inode #2: lblock 0: comm mysqld: error -5 reading directory block
[ +6.294232] EXT4-fs error (device sda): ext4_find_entry:1436: inode #262147: comm mysqld: reading directory lblock ...
[ +0.005226] EXT4-fs error (device sda): ext4_find_entry:1436: inode #2: comm mysqld: reading directory lblock 0
[ +1.666039] EXT4-fs error (device sda): ext4_journal_check_start:61: Detected aborted journal
[ +0.003695] EXT4-fs (sda): Remounting filesystem read-only
它適合OP的評(píng)論:
兩天前,由于我不知道的原因,Kubernetes重新啟動(dòng)了容器,并一直嘗試在rnfa機(jī)器上運(yùn)行它。容器可能已從另一個(gè)節(jié)點(diǎn)逐出。
另外,可能會(huì)更新節(jié)點(diǎn)或群集(取決于是否打開(kāi)了自動(dòng)更新選項(xiàng)),這可能會(huì)強(qiáng)制重新啟動(dòng)Pod。
'/var/lib/mysql/': Input/output error
問(wèn)題可能指向數(shù)據(jù)庫(kù)損壞,如前面提到的here。
通常,該問(wèn)題已由cordoning
受影響的節(jié)點(diǎn)解決。有關(guān)cordon
和drain
之間差異的其他信息,請(qǐng)參閱here。
與添加一樣,要將實(shí)例分配給特定節(jié)點(diǎn)或具有指定標(biāo)簽的節(jié)點(diǎn),可以使用Affinity
這篇關(guān)于GKE Kubernetes MySQL輸入/輸出錯(cuò)誤Ext4Error的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,