目錄
- 1、安裝ansible
- 2、安裝k8s
- 3、檢查環境
- 3.1、檢查etcd
- 3.2、檢查flanneld
- 3.3、檢查Nginx和keepalived
- 3.4、檢查kube-apiserver
- 3.5、檢查 kube-controller-manager
- 3.6、檢查kube-scheduler
- 3.7、檢查kubelet
- 3.8、檢查kube-proxy
- 4、檢查附加組件
- 4.1、檢查coreDNS
- 4.2、檢查dashboard
- 4.3、檢查traefik
- 4.4、檢查metrics
- 4.5、檢查EFK
- 5、驗證集群
- 6、重啟所有組件
1、安裝ansible
# 系統改成阿里yum源,并更新系統
mv /etc/yum.repos.d/centos-Base.repo /etc/yum.repos.d/CentOS-Base.repo.$(date +%Y%m%d)
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
yum clean all && yum makecache && yum update -y
#安裝ansible
yum -y install epel-release
yum install ansible -y
ssh-keygen -t rsa
ssh-copy-id xx.xx.xx.xx
## 批量拷貝秘鑰
#### ##編寫機器ip 訪問端口 登錄密碼
cat <<EOF> hostname.txt
192.168.10.11 22 fana
192.168.10.12 22 fana
192.168.10.13 22 fana
192.168.10.14 22 fana
EOF
#### 不輸入yes,修改后重啟sshd
sed -i '/StrictHostKeyChecking/s/^#//; /StrictHostKeyChecking/s/ask/no/' /etc/ssh/ssh_config
#### 然后執行拷貝秘鑰
cat hostname.txt | while read ip port pawd;do sshpass -p $pawd ssh-copy-id -p $port root@$ip;done
#### 安裝sshpass
wget http://sourceforge.net/projects/sshpass/files/sshpass
tar xvzf sshpass-1.06.tar.gz
./configure
make
make install
## 升級內核參考:https://www.cnblogs.com/fan-gx/p/11006762.html
2、安裝k8s
## 下載ansible腳本
#鏈接:https://pan.baidu.com/s/1VKQ5txJ2xgwUVim_E2P9kA
#提取碼:3cq2
## ansible 安裝k8s
ansible-playbook -i inventory installK8s.yml
## 版本:
k8s: 1.14.8
etcd: 3.3.18
flanneld: 0.11.0
Docker: 19.03.5
nginx: 1.16.1
## 自簽TLS證書
etcd:ca.pem server.pem server-key.pem
flannel:ca.pem server.pem server-key.pem
kube-apiserver:ca.pem server.pem server-key.pem
kubelet:ca.pem ca-key.pem
kube-proxy:ca.pem kube-proxy.pem kube-proxy-key.pem
kubectl:ca.pem admin.pem admin-key.pem ------ 用于管理員訪問集群
## 檢查證書時長,官方建議一年最少升級一次k8s集群,升級的時候證書時長也會升級
openssl x509 -in ca.pem -text -noout
### 顯示如下
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
51:5c:66:8b:40:24:d7:bb:ea:94:e7:5a:33:fe:44:a2:e2:18:51:b3
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=CN, ST=ShangHai, L=ShangHai, O=k8s, OU=System, CN=kubernetes
Validity
Not Before: Dec 14 13:26:00 2019 GMT
Not After : Dec 11 13:26:00 2029 GMT #時長為10年
Subject: C=CN, ST=ShangHai, L=ShangHai, O=k8s, OU=System, CN=kubernetes
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
00:c2:5c:92:dd:36:67:3f:d4:f1:e0:5f:e0:48:40:
# 使用鏡像
kubelet: 243662875/pause-amd64:3.1
coredns: 243662875/coredns:1.3.1
dashboard: 243662875/kubernetes-dashboard-amd64:v1.10.1
metrics-server: 243662875/metrics-server-amd64:v0.3.6
traefik: traefik:latest
es: elasticsearch:6.6.1
fluentd-es: 243662875/fluentd-elasticsearch:v2.4.0
kibana: 243662875/kibana-oss:6.6.1
3、檢查環境
3.1、檢查etcd
etcd參考:https://www.cnblogs.com/winstom/p/11811373.html
systemctl status etcd|grep active
etcdctl --ca-file=/etc/kubernetes/ssl/ca.pem
--cert-file=/etc/kubernetes/ssl/etcd.pem
--key-file=/etc/kubernetes/ssl/etcd-key.pem cluster-health
##顯示如下:
member 1af68d968c7e3f22 is healthy: got healthy result from https://192.168.10.12:2379
member 7508c5fadccb39e2 is healthy: got healthy result from https://192.168.10.11:2379
member e8d9a97b17f26476 is healthy: got healthy result from https://192.168.10.13:2379
cluster is healthy
etcdctl --endpoints=https://192.168.10.11:2379,https://192.168.10.12:2379,https://192.168.10.13:2379
--ca-file=/etc/kubernetes/ssl/ca.pem
--cert-file=/etc/kubernetes/ssl/etcd.pem
--key-file=/etc/kubernetes/ssl/etcd-key.pem member list
ETCDCTL_API=3 etcdctl
-w table --cacert=/etc/kubernetes/ssl/ca.pem
--cert=/etc/kubernetes/ssl/etcd.pem
--key=/etc/kubernetes/ssl/etcd-key.pem
--endpoints="https://192.168.10.11:2379,https://192.168.10.12:2379,https://192.168.10.13:2379" endpoint status
### 顯示如下
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://192.168.10.11:2379 | 7508c5fadccb39e2 | 3.3.18 | 762 kB | false | 421 | 287371 |
| https://192.168.10.12:2379 | 1af68d968c7e3f22 | 3.3.18 | 762 kB | true | 421 | 287371 |
| https://192.168.10.13:2379 | e8d9a97b17f26476 | 3.3.18 | 762 kB | false | 421 | 287371 |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
#遇到報錯: cannot unmarshal event: proto: wrong wireType = 0 for field Key
#解決辦法參考:https://blog.csdn.net/dengxiafubi/article/details/102627341
#查詢etcd API3的鍵
ETCDCTL_API=3 etcdctl --endpoints="https://192.168.10.11:2379,https://192.168.10.12:2379,https://192.168.10.13:2379"
--cacert=/etc/kubernetes/ssl/ca.pem
--cert=/etc/kubernetes/ssl/etcd.pem
--key=/etc/kubernetes/ssl/etcd-key.pem get / --prefix --keys-only
3.2、檢查flanneld
systemctl status flanneld|grep Active
ip addr show|grep flannel
ip addr show|grep docker
cat /run/flannel/docker
cat /run/flannel/subnet.env
#### 列出鍵值存儲的目錄
etcdctl
--ca-file=/etc/kubernetes/ssl/ca.pem
--cert-file=/etc/kubernetes/ssl/flanneld.pem
--key-file=/etc/kubernetes/ssl/flanneld-key.pem ls -r
## 顯示如下
/kubernetes
/kubernetes/network
/kubernetes/network/config
/kubernetes/network/subnets
/kubernetes/network/subnets/172.30.12.0-24
/kubernetes/network/subnets/172.30.43.0-24
/kubernetes/network/subnets/172.30.9.0-24
#### 檢查分配的pod網段
etcdctl
--endpoints="https://192.168.10.11:2379,https://192.168.10.12:2379,https://192.168.10.13:2379"
--ca-file=/etc/kubernetes/ssl/ca.pem
--cert-file=/etc/kubernetes/ssl/flanneld.pem
--key-file=/etc/kubernetes/ssl/flanneld-key.pem
get /kubernetes/network/config
#### 檢查分配的pod子網列表
etcdctl
--endpoints="https://192.168.10.11:2379,https://192.168.10.12:2379,https://192.168.10.13:2379"
--ca-file=/etc/kubernetes/ssl/ca.pem
--cert-file=/etc/kubernetes/ssl/flanneld.pem
--key-file=/etc/kubernetes/ssl/flanneld-key.pem
ls /kubernetes/network/subnets
#### 檢查pod網段對于的IP和flannel接口
etcdctl
--endpoints="https://192.168.10.11:2379,https://192.168.10.12:2379,https://192.168.10.13:2379"
--ca-file=/etc/kubernetes/ssl/ca.pem
--cert-file=/etc/kubernetes/ssl/flanneld.pem
--key-file=/etc/kubernetes/ssl/flanneld-key.pem
get /kubernetes/network/subnets/172.30.74.0-24
3.3、檢查nginx和keepalived
ps -ef|grep nginx
ps -ef|grep keepalived
netstat -lntup|grep nginx
ip add|grep 192.168 # 查看VIP,顯示如下
inet 192.168.10.11/24 brd 192.168.10.255 scope global noprefixroute ens32
inet 192.168.10.100/32 scope global ens32
3.4、檢查kube-apiserver
netstat -lntup | grep kube-apiser
# 顯示如下
tcp 0 0 192.168.10.11:6443 0.0.0.0:* LISTEN 115454/kube-apiserv
kubectl cluster-info
# 顯示如下
Kubernetes master is running at https://192.168.10.100:8443
Elasticsearch is running at https://192.168.10.100:8443/api/v1/namespaces/kube-system/services/elasticsearch-logging/proxy
Kibana is running at https://192.168.10.100:8443/api/v1/namespaces/kube-system/services/kibana-logging/proxy
CoreDNS is running at https://192.168.10.100:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubernetes-dashboard is running at https://192.168.10.100:8443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
Metrics-server is running at https://192.168.10.100:8443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
kubectl get all --all-namespaces
kubectl get cs
# 顯示如下
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-1 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
#### 打印kube-apiserver寫入etcd數據
ETCDCTL_API=3 etcdctl
--endpoints="https://192.168.10.11:2379,https://192.168.10.12:2379,https://192.168.10.13:2379"
--cacert=/etc/kubernetes/ssl/ca.pem
--cert=/etc/kubernetes/ssl/etcd.pem
--key=/etc/kubernetes/ssl/etcd-key.pem
get /registry/ --prefix --keys-only
#### 遇到報錯
unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/kube-system/bootstrap-token-2z8s62": invalid padding on input
##### 原因,集群上的,kube-apiserver 的token 不一致 文件是:encryption-config.yaml 必須保證 secret的參數 一致
3.5、檢查 kube-controller-manager
netstat -lntup|grep kube-control
# 顯示如下
tcp 0 0 127.0.0.1:10252 0.0.0.0:* LISTEN 117775/kube-control
tcp6 0 0 :::10257 :::* LISTEN 117775/kube-control
kubectl get cs
kubectl get endpoints kube-controller-manager --namespace=kube-system -o yaml
# 顯示如下,可以看到 kube12變成leader
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"kube12_753e65bf-1e65-11ea-b9c4-000c293dd01c","leaseDurationSeconds":15,"acquireTime":"2019-12-14T11:32:49Z","renewTime":"2019-12-14T12:43:20Z","leaderTransitions":0}'
creationTimestamp: "2019-12-14T11:32:49Z"
name: kube-controller-manager
namespace: kube-system
resourceVersion: "8282"
selfLink: /api/v1/namespaces/kube-system/endpoints/kube-controller-manager
uid: 753d2be7-1e65-11ea-b980-000c29e3f448
3.6、檢查kube-scheduler
netstat -lntup|grep kube-sche
# 顯示如下
tcp 0 0 127.0.0.1:10251 0.0.0.0:* LISTEN 119678/kube-schedul
tcp6 0 0 :::10259 :::* LISTEN 119678/kube-schedul
kubectl get cs
kubectl get endpoints kube-scheduler --namespace=kube-system -o yaml
# 顯示如下,可以看到kube12變成leader
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"kube12_89050e00-1e65-11ea-8f5e-000c293dd01c","leaseDurationSeconds":15,"acquireTime":"2019-12-14T11:33:23Z","renewTime":"2019-12-14T12:45:22Z","leaderTransitions":0}'
creationTimestamp: "2019-12-14T11:33:23Z"
name: kube-scheduler
namespace: kube-system
resourceVersion: "8486"
selfLink: /api/v1/namespaces/kube-system/endpoints/kube-scheduler
uid: 899d1625-1e65-11ea-b980-000c29e3f448
3.7、檢查kubelet
netstat -lntup|grep kubelet
# 顯示如下
tcp 0 0 127.0.0.1:35173 0.0.0.0:* LISTEN 123215/kubelet
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 123215/kubelet
tcp 0 0 192.168.10.11:10250 0.0.0.0:* LISTEN 123215/kubelet
kubeadm token list --kubeconfig ~/.kube/config
# 查看創建的token
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
hf0fa4.ta6haf1wsz1fnobf 22h 2019-12-15T19:33:26+08:00 authentication,signing kubelet-bootstrap-token system:bootstrAppers:kube11
oftjgn.01tob30h8v9l05lm 22h 2019-12-15T19:33:26+08:00 authentication,signing kubelet-bootstrap-token system:bootstrappers:kube12
zuezc4.7kxhmayoue16pycb 22h 2019-12-15T19:33:26+08:00 authentication,signing kubelet-bootstrap-token system:bootstrappers:kube13
kubectl get csr
# 已經批準
NAME AGE REQUESTOR CONDITION
node-csr-Oarn7xdWDiq7-CLn7yrE3fkTtmJtoSenmlGj3XL85lM 72m system:bootstrap:zuezc4 Approved,Issued
node-csr-hJrfQXlhIqJTROLD1ExmcXq74J78uu6rjHuh5ZyVlMg 72m system:bootstrap:zuezc4 Approved,Issued
node-csr-s-BAbqc8hOKfDj8xqdJ6fWjwdustqG9LhwbpYxa9x68 72m system:bootstrap:zuezc4 Approved,Issued
kubectl get nodes
# 顯示如下
NAME STATUS ROLES AGE VERSION
192.168.10.11 Ready <none> 73m v1.14.8
192.168.10.12 Ready <none> 73m v1.14.8
192.168.10.13 Ready <none> 73m v1.14.8
systemctl status kubelet
#### 1.遇到報錯:
Failed to connect to apiserver: the server has asked for the client to provide credentials
#### 檢查api是不是有問題,如沒有問題,需要重新生成kubelet-bootstrap.kubeconfig文件,然后重啟kubelet
#### 2.啟動不起來,沒有報錯信息
#檢查kubelet.config.json 文件 "address": "192.168.10.12", 是不是本機IP
#### 3.遇到問題:
failed to ensure node lease exists, will retry in 7s, error: leases.coordination.k8s.io "192.168.10.12" is forbidden: User "system:node:192.168.10.11" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-node-lease": can only access node lease with the same name as the requesting node
Unable to register node "192.168.10.12" with API server: nodes "192.168.10.12" is forbidden: node "192.168.10.11" is not allowed to modify node "192.168.10.12"
#檢查kubelet.config.json 文件 "address": "192.168.10.12", 是不是本機IP
3.8、檢查kube-proxy
netstat -lnpt|grep kube-proxy
# 顯示如下
tcp 0 0 192.168.10.11:10249 0.0.0.0:* LISTEN 125459/kube-proxy
tcp 0 0 192.168.10.11:10256 0.0.0.0:* LISTEN 125459/kube-proxy
tcp6 0 0 :::32698 :::* LISTEN 125459/kube-proxy
tcp6 0 0 :::32699 :::* LISTEN 125459/kube-proxy
tcp6 0 0 :::32700 :::* LISTEN 125459/kube-proxy
ipvsadm -ln
4、檢查附加組件
4.1、檢查coredns
kubectl get pods -n kube-system #查看pod是否都啟動完成
#使用容器驗證
kubectl run dig --rm -it --image=docker.io/azukiapp/dig /bin/sh
#ping 百度
ping www.baidu.com
PING www.baidu.com (180.101.49.11): 56 data bytes
64 bytes from 180.101.49.11: seq=0 ttl=127 time=10.772 ms
64 bytes from 180.101.49.11: seq=1 ttl=127 time=9.347 ms
64 bytes from 180.101.49.11: seq=2 ttl=127 time=10.937 ms
64 bytes from 180.101.49.11: seq=3 ttl=127 time=11.149 ms
64 bytes from 180.101.49.11: seq=4 ttl=127 time=10.677 ms
cat /etc/resolv.conf #查看
nameserver 10.254.0.2
search default.svc.cluster.local. svc.cluster.local. cluster.local.
options ndots:5
nslookup www.baidu.com
#顯示如下
Server: 10.254.0.2
Address: 10.254.0.2#53
Non-authoritative answer:
www.baidu.com canonical name = www.a.shifen.com.
Name: www.a.shifen.com
Address: 180.101.49.12
Name: www.a.shifen.com
Address: 180.101.49.11
nslookup kubernetes.default #執行
Server: 10.254.0.2
Address: 10.254.0.2#53
Name: kubernetes.default.svc.cluster.local
Address: 10.254.0.1
nslookup kubernetes #執行
Server: 10.254.0.2
Address: 10.254.0.2#53
Name: kubernetes.default.svc.cluster.local
Address: 10.254.0.1
4.2、檢查dashboard
### 使用谷歌瀏覽器訪問https://192.168.10.13:10250/metrics 報Unauthorized 是需要使用證書,生成證書方式參考如下
#1.windows機器,需要安裝jdk然后使用keytool工具在bin目錄下, 需要把ca.pem拷貝下來,我放在E盤了,執行導入證書命令
.keytool -import -v -trustcacerts -alias appmanagement -file "E:ca.pem" -storepass password -keystore cacerts #導入證書
.keytool -delete -v -trustcacerts -alias appmanagement -file "E:ca.pem" -storepass password -keystore cacerts #刪除證書
#2.執行過后,然后在linux上執行如下:
openssl pkcs12 -export -out admin.pfx -inkey admin-key.pem -in admin.pem -certfile ca.pem
#3.然后通過瀏覽器把admin.pfx證書導進去,就可以正常訪問了。
# 然后訪問dashboard
https://192.168.10.13:32700
#### 或者
https://192.168.10.100:8443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
#### 需要使用kubeconfig:已經自動生成了在/etc/kubernetes/dashboard.kubeconfig
#令牌保存在 {{k8s_home}}/dashboard_login_token.txt文件里,也可以用下面的命令獲取token
kubectl -n kube-system describe secret `kubectl -n kube-system get secret|grep dashboard | awk '{print $1}'`
4.3、檢查traefik
#每個node節點上部署一個traefik
kubectl get pod,deploy,daemonset,service,ingress -n kube-system | grep traefik
### 顯示如下
pod/traefik-ingress-controller-gl7vs 1/1 Running 0 43m
pod/traefik-ingress-controller-qp26j 1/1 Running 0 43m
pod/traefik-ingress-controller-x99ls 1/1 Running 0 43m
daemonset.extensions/traefik-ingress-controller 3 3 3 3 3 <none> 43m
service/traefik-ingress-service ClusterIP 10.254.148.220 <none> 80/TCP,8080/TCP 43m
service/traefik-web-ui ClusterIP 10.254.139.95 <none> 80/TCP 43m
ingress.extensions/traefik-web-ui traefik-ui 80 43m
# 訪問返回如下:
curl -H 'host:traefik-ui' 192.168.10.11
<a href="/dashboard/">Found</a>.
curl -H 'host:traefik-ui' 192.168.10.12
<a href="/dashboard/">Found</a>.
curl -H 'host:traefik-ui' 192.168.10.13
<a href="/dashboard/">Found</a>.
#查看端口
netstat -lntup|grep traefik
tcp6 0 0 :::8080 :::* LISTEN 66426/traefik
tcp6 0 0 :::80 :::* LISTEN 66426/traefik
#然后訪問http://192.168.10.11:8080/
4.4、檢查metrics
kubectl top node
###報錯:Error from server (Forbidden): forbidden: User "system:anonymous" cannot get path "/apis/metrics.k8s.io/v1beta1"
Error from server (Forbidden): nodes.metrics.k8s.io is forbidden: User "system:anonymous" cannot list resource "nodes" in API group "metrics.k8s.io" at the cluster scope
###解決辦法
kubectl create clusterrolebinding the-boss --user system:anonymous --clusterrole cluster-admin
### 遇到報錯:Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
4.5、檢查EFK
es: http://192.168.10.11:32698/
Kibana: http://192.168.10.11:32699
5、驗證集群
# 部署glusterfs 參考:https://www.cnblogs.com/fan-gx/p/12101686.html
kubectl create ns myapp
kubectl apply -f nginx.yaml
kubectl get pod,svc,ing -n myapp -o wide
###顯示如下
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/my-nginx-69f8f65796-zd777 1/1 Running 0 19m 172.30.36.15 192.168.10.11 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/my-nginx ClusterIP 10.254.131.1 <none> 80/TCP 21m app=my-nginx
NAME HOSTS ADDRESS PORTS AGE
ingress.extensions/my-nginx myapp.nginx.com 80 21m
#驗證訪問是否正常
curl http://172.30.36.15
curl http://10.254.131.1
curl -H "host:myapp.nginx.com" 192.168.10.11
### 通過谷歌瀏覽器訪問:http://192.168.10.100:8088/
### 我們部署的時候已經通過nginx代理了traefik地址 /data/nginx/conf/nginx.conf
kubectl exec -it my-nginx-69f8f65796-zd777 -n myapp bash
echo "hello world" >/usr/share/nginx/html/index.html #然后瀏覽器訪問http://192.168.10.100:8088/ 顯示 hello world
6、重啟所有組件
systemctl restart etcd && systemctl status etcd
systemctl restart flanneld && systemctl status flanneld
systemctl restart docker && systemctl status docker
systemctl stop nginx && systemctl start nginx && systemctl status nginx
systemctl restart keepalived && systemctl status keepalived
systemctl restart kube-apiserver && systemctl status kube-apiserver
systemctl restart kube-controller-manager && systemctl status kube-controller-manager
systemctl restart kube-scheduler && systemctl status kube-scheduler
systemctl restart kubelet && systemctl status kubelet
systemctl restart kube-proxy && systemctl status kube-proxy
作者:Fantasy
出處:http://dwz.date/bWku
20個免費 K8S 名額:http://dwz.date/bUTc