一、安裝簡介
1.1 安裝目的
MySQL官方提供了InnoDB Cluster,該集群由MySQL MGR和MySQL Router組成。MySQL MGR在數(shù)據(jù)庫層面實(shí)現(xiàn)自主高可用性,而MySQL Router則負(fù)責(zé)代理訪問。在部署完成后,MySQL Router將形成單點(diǎn),如果出現(xiàn)故障,將會(huì)影響數(shù)據(jù)庫集群的可用性。因此,為了提高數(shù)據(jù)庫系統(tǒng)的可用性,需要搭建MySQL Router的高可用性方案。
1.2 MySQL router高可用組件介紹
本篇文章中的高可用方案,主要是通過Corosync和Pacemaker是兩個(gè)開源軟件項(xiàng)目實(shí)現(xiàn),它們結(jié)合起來為高可用性集群提供了通信、同步、資源管理和故障轉(zhuǎn)移等服務(wù)。
1.2.1 corosync
Corosync是一個(gè)開源的高可用性集群通信和同步服務(wù),可以實(shí)現(xiàn)集群節(jié)點(diǎn)之間的通信和數(shù)據(jù)同步,同時(shí)提供了可靠的消息傳遞機(jī)制和成員管理功能,以確保在分布式環(huán)境下集群的穩(wěn)定運(yùn)行。 Corosync基于可靠的UDP多播協(xié)議進(jìn)行通信,并提供了可插拔的協(xié)議棧接口,可以支持多種協(xié)議和網(wǎng)絡(luò)環(huán)境。它還提供了一個(gè)API,可以讓其他應(yīng)用程序使用Corosync的通信和同步服務(wù)。
1.2.2 pacemaker
Pacemaker是一個(gè)開源的高可用性集群資源管理和故障轉(zhuǎn)移工具,可以實(shí)現(xiàn)在集群節(jié)點(diǎn)之間自動(dòng)管理資源(如虛擬IP、文件系統(tǒng)、數(shù)據(jù)庫等),并在節(jié)點(diǎn)或資源故障時(shí)進(jìn)行自動(dòng)遷移,從而確保整個(gè)系統(tǒng)的高可用性和連續(xù)性。 Pacemaker支持多種資源管理策略,可以根據(jù)不同的需求進(jìn)行配置。它還提供了一個(gè)靈活的插件框架,可以支持不同的集群環(huán)境和應(yīng)用場景,比如虛擬化、云計(jì)算等。
將Corosync和Pacemaker結(jié)合起來,可以提供一個(gè)完整的高可用性集群解決方案。它通過Corosync實(shí)現(xiàn)集群節(jié)點(diǎn)之間的通信和同步,通過Pacemaker實(shí)現(xiàn)集群資源管理和故障轉(zhuǎn)移,從而確保整個(gè)系統(tǒng)的高可用性和連續(xù)性。 它們結(jié)合起來為高可用性集群提供了可靠的通信、同步、資源管理和故障轉(zhuǎn)移等服務(wù),是構(gòu)建可靠、高效的分布式系統(tǒng)的重要基礎(chǔ)。
1.2.3 ldirectord
ldirectord是一個(gè)用于linux系統(tǒng)的負(fù)載均衡工具,它可以管理多個(gè)服務(wù)器上的服務(wù),并將客戶端請求分發(fā)到這些服務(wù)器中的一個(gè)或多個(gè)上,以提高服務(wù)的可用性和性能。ldirectord通常是與Heartbeat或Keepalived等集群軟件一起使用,以確保高可用性和負(fù)載均衡。 ldirectord主要用途包括:
- 負(fù)載均衡:ldirectord可以基于不同的負(fù)載均衡算法進(jìn)行請求分發(fā),例如輪詢、加權(quán)輪詢、最少連接、源地址哈希等。它可以將客戶端請求分發(fā)到多個(gè)后端服務(wù)器中的一個(gè)或多個(gè)上,從而實(shí)現(xiàn)負(fù)載均衡。
- 健康檢查:ldirectord可以定期檢查后端服務(wù)器的可用性,并將不可用的服務(wù)器從服務(wù)池中排除,從而確保服務(wù)的高可用性和穩(wěn)定性。
- 會(huì)話保持:ldirectord可以根據(jù)客戶端的IP地址、Cookie等標(biāo)識,將客戶端請求路由到相同的后端服務(wù)器上,從而實(shí)現(xiàn)會(huì)話保持,確保客戶端與后端服務(wù)器之間的連接不會(huì)被中斷。
- 動(dòng)態(tài)配置:ldirectord支持動(dòng)態(tài)添加、刪除、修改后端服務(wù)器和服務(wù),管理員可以通過命令行或配置文件等方式進(jìn)行操作,從而實(shí)現(xiàn)動(dòng)態(tài)配置。
ldirectord是專門為LVS監(jiān)控而編寫的,用來監(jiān)控lvs架構(gòu)中服務(wù)器池(server pool) 的服務(wù)器狀態(tài)。 ldirectord 運(yùn)行在 IPVS 節(jié)點(diǎn)上, ldirectord作為一個(gè)守護(hù)進(jìn)程啟動(dòng)后會(huì)對服務(wù)器池中的每個(gè)真實(shí)服務(wù)器發(fā)送請求進(jìn)行監(jiān)控,如果服務(wù)器沒有響應(yīng) ldirectord 的請求,那么ldirectord 認(rèn)為該服務(wù)器不可用, ldirectord 會(huì)運(yùn)行 ipvsadm 對 IPVS表中該服務(wù)器進(jìn)行刪除,如果等下次再次檢測有相應(yīng)則通過ipvsadm 進(jìn)行添加。
2、安裝規(guī)劃
MySQL及MySQL Router版本均為8.0.32
IP |
主機(jī)名 |
安裝組件 |
使用端口 |
172.17.140.25 |
gdb1 |
MySQL MySQL Router ipvsadm ldirectord pcs pacemaker corosync |
MySQL:3309 MySQL Router:6446 MySQL Router:6447 pcs_tcp:13314 pcs_udp:13315 |
172.17.140.24 |
gdb2 |
MySQL MySQL Router ipvsadm ldirectord pcs pacemaker corosync |
MySQL:3309 MySQL Router:6446 MySQL Router:6447 pcs_tcp:13314 pcs_udp:13315 |
172.17.139.164 |
gdb3 |
MySQL MySQL Router ipvsadm ldirectord pcs pacemaker corosync |
MySQL:3309 MySQL Router:6446 MySQL Router:6447 pcs_tcp:13314 pcs_udp:13315 |
172.17.129.1 |
|
VIP |
6446、6447 |
172.17.139.62 |
|
MySQL client |
|
大概安裝步驟如下
二、高可用搭建
2.1 基礎(chǔ)環(huán)境設(shè)置(三臺服務(wù)器都做)
- 分別在三臺服務(wù)器上根據(jù)規(guī)劃設(shè)置主機(jī)名
hostnamectl set-hostname gdb1
hostnamectl set-hostname gdb2
hostnamectl set-hostname gdb3
- 將下面內(nèi)容追加保存在三臺服務(wù)器的文件/etc/hosts中
172.17.140.25 gdb1
172.17.140.24 gdb2
172.17.139.164 gdb3
- 在三臺服務(wù)器上禁用防火墻
systemctl stop firewalld
systemctl disable firewalld
- 在三臺服務(wù)器上禁用selinux,如果selinux未關(guān)閉,修改配置文件后,需要重啟服務(wù)器才會(huì)生效
如下輸出表示完成關(guān)閉
- 在三臺服務(wù)器上分別執(zhí)行下面命令,用戶建立互相
建立互信,僅僅是為了服務(wù)器間傳輸文件方便,不是集群搭建的必要基礎(chǔ)。
ssh-keygen -t dsa
ssh-copy-id gdb1
ssh-copy-id gdb2
ssh-copy-id gdb3
執(zhí)行情況如下
[#19#root@gdb1 ~ 16:16:54]19 ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/root/.ssh/id_dsa): ## 直接回車
/root/.ssh/id_dsa already exists.
Overwrite (y/n)? y ## 如果原來有ssh配置文件,可以輸入y覆蓋
Enter passphrase (empty for no passphrase): ## 直接回車
Enter same passphrase again: ## 直接回車
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
SHA256:qwJXgfN13+N1U5qvn9fC8pyhA29iuXvQVhCupExzgTc root@gdb1
The key's randomart image is:
+---[DSA 1024]----+
| . .. .. |
| o . o Eo. .|
| o ooooo.o o.|
| oo = .. *.o|
| . S .. o +o|
| . . .o o . .|
| o . * ....|
| . . + *o+o+|
| .. .o*.+++o|
+----[SHA256]-----+
[#20#root@gdb1 ~ 16:17:08]20 ssh-copy-id gdb1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_dsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@gdb1's password: ## 輸入gdb1服務(wù)器的root用戶對應(yīng)密碼
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'gdb1'"
and check to make sure that only the key(s) you wanted were added.
[#21#root@gdb1 ~ 16:17:22]21 ssh-copy-id gdb2
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_dsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@gdb2's password: ## 輸入gdb2服務(wù)器的root用戶對應(yīng)密碼
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'gdb2'"
and check to make sure that only the key(s) you wanted were added.
[#22#root@gdb1 ~ 16:17:41]22 ssh-copy-id gdb3
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_dsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@gdb3's password: ## 輸入gdb3服務(wù)器的root用戶對應(yīng)密碼
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'gdb3'"
and check to make sure that only the key(s) you wanted were added.
[#23#root@gdb1 ~ 16:17:44]23
任意切換服務(wù)器,不需要輸入密碼,則說明互相建立成功
[#24#root@gdb1 ~ 16:21:16]24 ssh gdb1
Last login: Tue Feb 21 16:21:05 2023 from 172.17.140.25
[#1#root@gdb1 ~ 16:21:19]1 logout
Connection to gdb1 closed.
[#25#root@gdb1 ~ 16:21:19]25 ssh gdb2
Last login: Tue Feb 21 16:21:09 2023 from 172.17.140.25
[#1#root@gdb2 ~ 16:21:21]1 logout
Connection to gdb2 closed.
[#26#root@gdb1 ~ 16:21:21]26 ssh gdb3
Last login: Tue Feb 21 10:53:47 2023
[#1#root@gdb3 ~ 16:21:22]1 logout
Connection to gdb3 closed.
[#27#root@gdb1 ~ 16:21:24]27
- 時(shí)鐘同步,對于分布式、集中式集群,時(shí)鐘同步都非常重要,時(shí)間不一致會(huì)引發(fā)各種異常情況
yum -y install ntpdate // 安裝ntpdate客戶端
ntpdate npt1.aliyun.com // 如果連通外網(wǎng),可以指定阿里云ntp服務(wù)器,或者指定內(nèi)網(wǎng)ntp server
hwclock -w // 更新BIOS時(shí)間
2.2 通過MySQL Router搭建讀寫分離MGR集群
具體參考文章https://gitee.com/GreatSQL/GreatSQL-Doc/blob/master/deep-dive-mgr/deep-dive-mgr-07.md
2.3 在三臺服務(wù)器上分別進(jìn)行進(jìn)行MySQL Router部署并啟動(dòng),MySQL Router配置文件如下
# File automatically generated during MySQL Router bootstrap
[DEFAULT]
name=system
user=root
keyring_path=/opt/software/mysql-router-8.0.32-linux-glibc2.17-x86_64-minimal/var/lib/mysqlrouter/keyring
master_key_path=/opt/software/mysql-router-8.0.32-linux-glibc2.17-x86_64-minimal/mysqlrouter.key
connect_timeout=5
read_timeout=30
dynamic_state=/opt/software/mysql-router-8.0.32-linux-glibc2.17-x86_64-minimal/bin/../var/lib/mysqlrouter/state.json
client_ssl_cert=/opt/software/mysql-router-8.0.32-linux-glibc2.17-x86_64-minimal/var/lib/mysqlrouter/router-cert.pem
client_ssl_key=/opt/software/mysql-router-8.0.32-linux-glibc2.17-x86_64-minimal/var/lib/mysqlrouter/router-key.pem
client_ssl_mode=DISABLED
server_ssl_mode=AS_CLIENT
server_ssl_verify=DISABLED
unknown_config_optinotallow=error
[logger]
level=INFO
[metadata_cache:bootstrap]
cluster_type=gr
router_id=1
user=mysql_router1_g9c62rk29lcn
metadata_cluster=gdbCluster
ttl=0.5
auth_cache_ttl=-1
auth_cache_refresh_interval=2
use_gr_notificatinotallow=0
[routing:bootstrap_rw]
bind_address=0.0.0.0
bind_port=6446
destinatinotallow=metadata-cache://gdbCluster/?role=PRIMARY
routing_strategy=first-available
protocol=classic
[routing:bootstrap_ro]
bind_address=0.0.0.0
bind_port=6447
destinatinotallow=metadata-cache://gdbCluster/?role=SECONDARY
routing_strategy=round-robin-with-fallback
protocol=classic
[routing:bootstrap_x_rw]
bind_address=0.0.0.0
bind_port=6448
destinatinotallow=metadata-cache://gdbCluster/?role=PRIMARY
routing_strategy=first-available
protocol=x
[routing:bootstrap_x_ro]
bind_address=0.0.0.0
bind_port=6449
destinatinotallow=metadata-cache://gdbCluster/?role=SECONDARY
routing_strategy=round-robin-with-fallback
protocol=x
[http_server]
port=8443
ssl=1
ssl_cert=/opt/software/mysql-router-8.0.32-linux-glibc2.17-x86_64-minimal/var/lib/mysqlrouter/router-cert.pem
ssl_key=/opt/software/mysql-router-8.0.32-linux-glibc2.17-x86_64-minimal/var/lib/mysqlrouter/router-key.pem
[http_auth_realm:default_auth_realm]
backend=default_auth_backend
method=basic
name=default_realm
[rest_router]
require_realm=default_auth_realm
[rest_api]
[http_auth_backend:default_auth_backend]
backend=metadata_cache
[rest_routing]
require_realm=default_auth_realm
[rest_metadata_cache]
require_realm=default_auth_realm
2.4 驗(yàn)證三臺MySQL Router連接測試
[#12#root@gdb2 ~ 14:12:45]12 mysql -uroot -pAbc1234567* -h172.17.140.25 -P6446 -N -e 'select now()' 2> /dev/null
+---------------------+
| 2023-03-17 14:12:46 |
+---------------------+
[#13#root@gdb2 ~ 14:12:46]13 mysql -uroot -pAbc1234567* -h172.17.140.25 -P6447 -N -e 'select now()' 2> /dev/null
+---------------------+
| 2023-03-17 14:12:49 |
+---------------------+
[#14#root@gdb2 ~ 14:12:49]14 mysql -uroot -pAbc1234567* -h172.17.140.24 -P6446 -N -e 'select now()' 2> /dev/null
+---------------------+
| 2023-03-17 14:12:52 |
+---------------------+
[#15#root@gdb2 ~ 14:12:52]15 mysql -uroot -pAbc1234567* -h172.17.140.24 -P6447 -N -e 'select now()' 2> /dev/null
+---------------------+
| 2023-03-17 14:12:55 |
+---------------------+
[#16#root@gdb2 ~ 14:12:55]16 mysql -uroot -pAbc1234567* -h172.17.139.164 -P6446 -N -e 'select now()' 2> /dev/null
+---------------------+
| 2023-03-17 14:12:58 |
+---------------------+
[#17#root@gdb2 ~ 14:12:58]17 mysql -uroot -pAbc1234567* -h172.17.139.164 -P6447 -N -e 'select now()' 2> /dev/null
+---------------------+
| 2023-03-17 14:13:01 |
+---------------------+
[#18#root@gdb2 ~ 14:13:01]18
2.5 安裝pacemaker
- 安裝pacemaker
安裝pacemaker會(huì)依賴corosync這個(gè)包,所以直接安裝pacemaker這一個(gè)包就可以了
[#1#root@gdb1 ~ 10:05:55]1 yum -y install pacemaker
- 安裝pcs管理工具
[#1#root@gdb1 ~ 10:05:55]1 yum -y install pcs
- 創(chuàng)建集群認(rèn)證操作系統(tǒng)用戶,用戶名為hacluster,密碼設(shè)置為abc123
[#13#root@gdb1 ~ 10:54:13]13 echo abc123 | passwd --stdin hacluster
更改用戶 hacluster 的密碼 。
passwd:所有的身份驗(yàn)證令牌已經(jīng)成功更新。
- 啟動(dòng)pcsd,并且設(shè)置開機(jī)自啟動(dòng)
[#16#root@gdb1 ~ 10:55:30]16 systemctl enable pcsd
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[#17#root@gdb1 ~ 10:56:03]17 systemctl start pcsd
[#18#root@gdb1 ~ 10:56:08]18 systemctl status pcsd
● pcsd.service - PCS GUI and remote configuration interface
Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
Active: active (running) since 三 2023-02-22 10:56:08 CST; 6s ago
Docs: man:pcsd(8)
man:pcs(8)
Main PID: 27677 (pcsd)
Tasks: 4
Memory: 29.9M
CGroup: /system.slice/pcsd.service
└─27677 /usr/bin/ruby /usr/lib/pcsd/pcsd
2月 22 10:56:07 gdb1 systemd[1]: Starting PCS GUI and remote configuration interface...
2月 22 10:56:08 gdb1 systemd[1]: Started PCS GUI and remote configuration interface.
[#19#root@gdb1 ~ 10:56:14]19
- 修改pcsd的TCP端口為指定的13314
sed -i '/#PCSD_PORT=2224/a
PCSD_PORT=13314' /etc/sysconfig/pcsd
重啟pcsd服務(wù),讓新端口生效
[#23#root@gdb1 ~ 11:23:20]23 systemctl restart pcsd
[#24#root@gdb1 ~ 11:23:39]24 systemctl status pcsd
● pcsd.service - PCS GUI and remote configuration interface
Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
Active: active (running) since 三 2023-02-22 11:23:39 CST; 5s ago
Docs: man:pcsd(8)
man:pcs(8)
Main PID: 30041 (pcsd)
Tasks: 4
Memory: 27.3M
CGroup: /system.slice/pcsd.service
└─30041 /usr/bin/ruby /usr/lib/pcsd/pcsd
2月 22 11:23:38 gdb1 systemd[1]: Starting PCS GUI and remote configuration interface...
2月 22 11:23:39 gdb1 systemd[1]: Started PCS GUI and remote configuration interface.
[#25#root@gdb1 ~ 11:23:45]25
- 設(shè)置集群認(rèn)證信息,通過操作系統(tǒng)用戶hacluster進(jìn)行認(rèn)證
[#27#root@gdb1 ~ 11:31:43]27 cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
[#28#root@gdb1 ~ 11:32:15]28 pcs cluster auth gdb1:13314 gdb2:13314 gdb3:13314 -u hacluster -p 'abc123'
gdb1: Authorized
gdb2: Authorized
gdb3: Authorized
[#29#root@gdb1 ~ 11:33:18]29
- 創(chuàng)建集群,任意節(jié)點(diǎn)執(zhí)行即可
## 名稱為gdb_ha , udp協(xié)議為13315, 掩碼為24 ,集群成員為主機(jī)gdb1, gdb2, gdb3
[#31#root@gdb1 ~ 11:41:48]31 pcs cluster setup --force --name gdb_ha --transport=udp --addr0 24 --mcastport0 13315 gdb1 gdb2 gdb3
Destroying cluster on nodes: gdb1, gdb2, gdb3...
gdb1: Stopping Cluster (pacemaker)...
gdb2: Stopping Cluster (pacemaker)...
gdb3: Stopping Cluster (pacemaker)...
gdb2: Successfully destroyed cluster
gdb1: Successfully destroyed cluster
gdb3: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'gdb1', 'gdb2', 'gdb3'
gdb2: successful distribution of the file 'pacemaker_remote authkey'
gdb3: successful distribution of the file 'pacemaker_remote authkey'
gdb1: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
gdb1: Succeeded
gdb2: Succeeded
gdb3: Succeeded
Synchronizing pcsd certificates on nodes gdb1, gdb2, gdb3...
gdb1: Success
gdb2: Success
gdb3: Success
Restarting pcsd on the nodes in order to reload the certificates...
gdb1: Success
gdb2: Success
gdb3: Success
- 確認(rèn)完整的集群配置,在任意節(jié)點(diǎn)查看即可
[#21#root@gdb2 ~ 11:33:18]21 more /etc/corosync/corosync.conf
totem {
version: 2
cluster_name: gdb_ha
secauth: off
transport: udp
rrp_mode: passive
interface {
ringnumber: 0
bin.NETaddr: 24
mcastaddr: 239.255.1.1
mcastport: 13315
}
}
nodelist {
node {
ring0_addr: gdb1
nodeid: 1
}
node {
ring0_addr: gdb2
nodeid: 2
}
node {
ring0_addr: gdb3
nodeid: 3
}
}
quorum {
provider: corosync_votequorum
}
logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
}
[#22#root@gdb2 ~ 14:23:50]22
- 啟動(dòng)所有集群節(jié)點(diǎn)的pacemaker 相關(guān)服務(wù),任意節(jié)點(diǎn)執(zhí)行即可
[#35#root@gdb1 ~ 15:30:51]35 pcs cluster start --all
gdb1: Starting Cluster (corosync)...
gdb2: Starting Cluster (corosync)...
gdb3: Starting Cluster (corosync)...
gdb3: Starting Cluster (pacemaker)...
gdb1: Starting Cluster (pacemaker)...
gdb2: Starting Cluster (pacemaker)...
關(guān)閉服務(wù)時(shí),使用pcs cluster stop --all,或者用pcs cluster stop 《server》關(guān)閉某一臺
- 在每個(gè)節(jié)點(diǎn)上設(shè)置pacemaker相關(guān)服務(wù)開機(jī)自啟動(dòng)
[#35#root@gdb1 ~ 15:30:51]35 systemctl enable pcsd corosync pacemaker
[#36#root@gdb1 ~ 15:30:53]36 pcs cluster enable --all
- 沒有STONITH 設(shè)備時(shí),禁用STONITH 組件功能
禁用STONITH 組件功能后,分布式鎖管理器DLM等資源以及依賴DLM的所有服務(wù):例如cLVM2,GFS2,OCFS2等都將無法啟動(dòng),不禁用時(shí)會(huì)有錯(cuò)誤信息
pcs property set stonith-enabled=false
完整的命令執(zhí)行過程如下
[#32#root@gdb1 ~ 15:48:20]32 systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
Active: active (running) since 三 2023-02-22 15:35:48 CST; 1min 54s ago
Docs: man:pacemakerd
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html
Main PID: 25661 (pacemakerd)
Tasks: 7
Memory: 51.1M
CGroup: /system.slice/pacemaker.service
├─25661 /usr/sbin/pacemakerd -f
├─25662 /usr/libexec/pacemaker/cib
├─25663 /usr/libexec/pacemaker/stonithd
├─25664 /usr/libexec/pacemaker/lrmd
├─25665 /usr/libexec/pacemaker/attrd
├─25666 /usr/libexec/pacemaker/pengine
└─25667 /usr/libexec/pacemaker/crmd
2月 22 15:35:52 gdb1 crmd[25667]: notice: Fencer successfully connected
2月 22 15:36:11 gdb1 crmd[25667]: notice: State transition S_ELECTION -> S_INTEGRATION
2月 22 15:36:12 gdb1 pengine[25666]: error: Resource start-up disabled since no STONITH resources have been defined
2月 22 15:36:12 gdb1 pengine[25666]: error: Either configure some or disable STONITH with the stonith-enabled option
2月 22 15:36:12 gdb1 pengine[25666]: error: NOTE: Clusters with shared data need STONITH to ensure data integrity
2月 22 15:36:12 gdb1 pengine[25666]: notice: Delaying fencing operations until there are resources to manage
2月 22 15:36:12 gdb1 pengine[25666]: notice: Calculated transition 0, saving inputs in /var/lib/pacemaker/pengine/pe-input-0.bz2
2月 22 15:36:12 gdb1 pengine[25666]: notice: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues.
2月 22 15:36:12 gdb1 crmd[25667]: notice: Transition 0 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-0.bz2): Complete
2月 22 15:36:12 gdb1 crmd[25667]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
[#33#root@gdb1 ~ 15:37:43]33 pcs property set stonith-enabled=false
[#34#root@gdb1 ~ 15:48:20]34 systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
Active: active (running) since 三 2023-02-22 15:35:48 CST; 12min ago
Docs: man:pacemakerd
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html
Main PID: 25661 (pacemakerd)
Tasks: 7
Memory: 51.7M
CGroup: /system.slice/pacemaker.service
├─25661 /usr/sbin/pacemakerd -f
├─25662 /usr/libexec/pacemaker/cib
├─25663 /usr/libexec/pacemaker/stonithd
├─25664 /usr/libexec/pacemaker/lrmd
├─25665 /usr/libexec/pacemaker/attrd
├─25666 /usr/libexec/pacemaker/pengine
└─25667 /usr/libexec/pacemaker/crmd
2月 22 15:36:12 gdb1 pengine[25666]: notice: Calculated transition 0, saving inputs in /var/lib/pacemaker/pengine/pe-input-0.bz2
2月 22 15:36:12 gdb1 pengine[25666]: notice: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues.
2月 22 15:36:12 gdb1 crmd[25667]: notice: Transition 0 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-0.bz2): Complete
2月 22 15:36:12 gdb1 crmd[25667]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
2月 22 15:48:20 gdb1 crmd[25667]: notice: State transition S_IDLE -> S_POLICY_ENGINE
2月 22 15:48:21 gdb1 pengine[25666]: warning: Blind faith: not fencing unseen nodes
2月 22 15:48:21 gdb1 pengine[25666]: notice: Delaying fencing operations until there are resources to manage
2月 22 15:48:21 gdb1 pengine[25666]: notice: Calculated transition 1, saving inputs in /var/lib/pacemaker/pengine/pe-input-1.bz2
2月 22 15:48:21 gdb1 crmd[25667]: notice: Transition 1 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1.bz2): Complete
2月 22 15:48:21 gdb1 crmd[25667]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
[#35#root@gdb1 ~ 15:48:31]35
- 驗(yàn)證pcs集群狀態(tài)正常,無異常信息輸出
[#35#root@gdb1 ~ 15:48:31]35 crm_verify -L
[#36#root@gdb1 ~ 17:33:31]36
2.6 安裝ldirectord(三臺都做)
- ldirectord下載
下載地址 https://rpm.pbone.net/info_idpl_23860919_distro_centos_6_com_ldirectord-3.9.5-%203.1.x86_64.rpm.html
新標(biāo)簽打開獲取到地址后,可以用迅雷下載
- 下載依賴包ipvsadm
[#10#root@gdb1 ~ 19:51:20]10 wget http://mirror.centos.org/altarch/7/os/aarch64/Packages/ipvsadm-1.27-8.el7.aarch64.rpm
- 執(zhí)行安裝,如果安裝過程中,還需要其他依賴,需要自行處理
[#11#root@gdb1 ~ 19:51:29]11 yum -y install ldirectord-3.9.5-3.1.x86_64.rpm ipvsadm-1.27-8.el7.aarch64.rpm
- 創(chuàng)建配置文件/etc/ha.d/ldirectord.cf,編寫內(nèi)容如下
checktimeout=3
checkinterval=1
autoreload=yes
logfile="/var/log/ldirectord.log"
quiescent=no
virtual=172.17.129.1:6446
real=172.17.140.25:6446 gate
real=172.17.140.24:6446 gate
real=172.17.139.164:6446 gate
scheduler=rr
service=mysql
protocol=tcp
checkport=6446
checktype=connect
login="root"
passwd="Abc1234567*"
database="information_schema"
request="SELECT 1"
virtual=172.17.129.1:6447
real=172.17.140.25:6447 gate
real=172.17.140.24:6447 gate
real=172.17.139.164:6447 gate
scheduler=rr
service=mysql
protocol=tcp
checkport=6447
checktype=connect
login="root"
passwd="Abc1234567*"
database="information_schema"
request="SELECT 1"
參數(shù)說明
- checktimeout=3 :后端服務(wù)器健康檢查等待時(shí)間
- checkinterval=5 :兩次檢查間隔時(shí)間
- autoreload=yes :自動(dòng)添加或者移除真實(shí)服務(wù)器
- logfile="/var/log/ldirectord.log" :日志文件全路徑
- quiescent=no :故障時(shí)移除服務(wù)器的時(shí)候中斷所有連接
- virtual=172.17.129.1:6446 :VIP
- real=172.17.140.25:6446 gate :真實(shí)服務(wù)器
- scheduler=rr :指定調(diào)度算法:rr為輪詢,wrr為帶權(quán)重的輪詢
- service=mysql :健康檢測真實(shí)服務(wù)器時(shí)ldirectord使用的服務(wù)
- protocol=tcp :服務(wù)協(xié)議
- checktype=connect :ldirectord守護(hù)進(jìn)程使用什么方法監(jiān)視真實(shí)服務(wù)器
- checkport=16310 :健康檢測使用的端口
- login="root" :健康檢測使用的用戶名
- passwd="a123456" :健康檢測使用的密碼
- database="information_schema" :健康檢測訪問的默認(rèn)database
- request="SELECT1" :健康檢測執(zhí)行的檢測命令
將編寫好的配置文件,分發(fā)到另外兩個(gè)服務(wù)器
[#22#root@gdb1 ~ 20:51:57]22 cd /etc/ha.d/
[#23#root@gdb1 /etc/ha.d 20:52:17]23 scp ldirectord.cf gdb2:`pwd`
ldirectord.cf 100% 1300 1.1MB/s 00:00
[#24#root@gdb1 /etc/ha.d 20:52:26]24 scp ldirectord.cf gdb3:`pwd`
ldirectord.cf 100% 1300 1.4MB/s 00:00
[#25#root@gdb1 /etc/ha.d 20:52:29]25
2.7 配置回環(huán)網(wǎng)卡上配置VIP(三臺都做)
此操作用于pcs內(nèi)部負(fù)載均衡,在lo網(wǎng)卡上配置VIP用于pcs cluster內(nèi)部通信,如果不操作,則無法進(jìn)行負(fù)載均衡,腳本內(nèi)容如下vip.sh,放在mysql_bin目錄即可
#!/bin/bash
. /etc/init.d/functions
SNS_VIP=172.16.50.161
case "$1" in
start)
ifconfig lo:0 $SNS_VIP netmask 255.255.240.0 broadcast $SNS_VIP
# /sbin/route add -host $SNS_VIP dev lo:0
echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore
echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce
echo "1" >/proc/sys/net/ipv4/conf/all/arp_ignore
echo "2" >/proc/sys/net/ipv4/conf/all/arp_announce
sysctl -p >/dev/null 2>&1
echo "RealServer Start OK"
;;
stop)
ifconfig lo:0 down
# route del $SNS_VIP >/dev/null 2>&1
echo "0" >/proc/sys/net/ipv4/conf/lo/arp_ignore
echo "0" >/proc/sys/net/ipv4/conf/lo/arp_announce
echo "0" >/proc/sys/net/ipv4/conf/all/arp_ignore
echo "0" >/proc/sys/net/ipv4/conf/all/arp_announce
echo "RealServer Stoped"
;;
*)
echo "Usage: $0 {start|stop}"
exit 1
esac
exit 0
啟動(dòng)配置
# sh vip.sh start
停止配置
# sh vip.sh stop
2.8 集群資源添加(任意節(jié)點(diǎn)執(zhí)行即可)
- pcs中添加vip資源
[#6#root@gdb1 ~ 11:27:30]6 pcs resource create vip --disabled ocf:heartbeat:IPaddr nic=eth0 ip=172.17.129.1 cidr_netmask=24 broadcast=172.17.143.255 op monitor interval=5s timeout=20s
命令解析
- pcs resource create:pcs創(chuàng)建資源對象的起始命令
- vip: 虛擬IP(VIP)資源對象的名稱,可以根據(jù)需要自定義
- --disable: 表示在創(chuàng)建資源對象時(shí)將其禁用。這是為了避免資源在尚未完全配置的情況下被Pacemaker集群所使用
- ocf:heartbeat:IPaddr:告訴Pacemaker使用Heartbeat插件(即ocf:heartbeat)中的IPaddr插件來管理這個(gè)VIP資源
- nic=eth0:這個(gè)選項(xiàng)指定了網(wǎng)絡(luò)接口的名稱,即將VIP綁定到哪個(gè)網(wǎng)卡上
- ip=172.17.129.1:指定了要分配給VIP的IP地址
- cidr_netmask=24:指定了VIP的子網(wǎng)掩碼。在這個(gè)例子中,CIDR格式的子網(wǎng)掩碼為24,相當(dāng)于255.255.255.0
- broadcast=172.17.143.255:指定了廣播地址
- op monitor interval=5s timeout=20s:定義了用于監(jiān)視這個(gè)VIP資源的操作。interval=5s表示Pacemaker將每5秒檢查一次資源的狀態(tài),timeout=20s表示Pacemaker將在20秒內(nèi)等待資源的響應(yīng)。如果在這20秒內(nèi)資源沒有響應(yīng),Pacemaker將視為資源不可用。
- pcs中添加lvs資源
[#7#root@gdb1 ~ 11:34:50]7 pcs resource create lvs --disabled ocf:heartbeat:ldirectord op monitor interval=10s timeout=10s
命令解析
- pcs resource create:pcs創(chuàng)建資源對象的起始命令
- lvs: 虛擬IP(VIP)資源對象的名稱,可以根據(jù)需要自定義
- --disable: 表示在創(chuàng)建資源對象時(shí)將其禁用。這是為了避免資源在尚未完全配置的情況下被Pacemaker集群所使用
- ocf:heartbeat:ldirectord:告訴Pacemaker使用Heartbeat插件(即ocf:heartbeat)中的ldirectord插件來管理LVS的負(fù)載均衡器,使用的配置文件為上面配置的/etc/ha.d/ldirectord.cf
- op monitor interval=10s timeout=10s:定義了用于監(jiān)視這個(gè)LVS資源的操作。interval=10s表示Pacemaker將每10秒檢查一次資源的狀態(tài),timeout=10s表示Pacemaker將在10秒內(nèi)等待資源的響應(yīng)。如果在這10秒內(nèi)資源沒有響應(yīng),Pacemaker將視為資源不可用。
- 創(chuàng)建完成后檢測resource狀態(tài)
[#9#root@gdb1 ~ 11:35:42]9 pcs resource show
vip (ocf::heartbeat:IPaddr): Stopped (disabled)
lvs (ocf::heartbeat:ldirectord): Stopped (disabled)
[#10#root@gdb1 ~ 11:35:48]10
- 創(chuàng)建resource group,并添加resource
[#10#root@gdb1 ~ 11:37:36]10 pcs resource group add dbservice vip
[#11#root@gdb1 ~ 11:37:40]11 pcs resource group add dbservice lvs
[#12#root@gdb1 ~ 11:37:44]12
2.9 集群啟停
集群啟動(dòng)
- 啟動(dòng)resource
# pcs resource enable vip lvs 或者 pcs resource enable dbservice
如果之前有異常,可以通過下面的命令清理異常信息,然后再啟動(dòng)
# pcs resource cleanup vip
# pcs resource cleanup lvs
- 啟動(dòng)狀態(tài)確認(rèn),執(zhí)行命令 pcs status
[#54#root@gdb1 /etc/ha.d 15:54:22]54 pcs status
Cluster name: gdb_ha
Stack: corosync
Current DC: gdb1 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Thu Feb 23 15:55:27 2023
Last change: Thu Feb 23 15:53:55 2023 by hacluster via crmd on gdb82
3 nodes configured
2 resource instances configured
Online: [ gdb1 gdb2 gdb3 ]
Full list of resources:
Resource Group: dbservice
lvs (ocf::heartbeat:ldirectord): Started gdb2
vip (ocf::heartbeat:IPaddr): Started gdb3
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[#55#root@gdb1 /etc/ha.d 15:55:27]55
輸出結(jié)果說明
Cluster name: gdb_ha: 集群的名稱為 gdb_ha。
Stack: corosync:該集群使用的通信協(xié)議棧為 corosync。
`Current DC: gdb3 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum ``:當(dāng)前的集群控制器(DC)為 gdb3,其版本為 1.1.23-1.el7_9.1-9acf116022,并且該節(jié)點(diǎn)所在的分區(qū)具有投票權(quán)。
Last updated: Thu Feb 23 15:55:27 2023:最后一次更新集群狀態(tài)信息的時(shí)間為 2023 年 2 月 23 日 15:55:27。
Last change: Thu Feb 23 15:53:55 2023 by hacluster via crmd on gdb2 :最后一次更改集群配置的時(shí)間為 2023 年 2 月 23 日 15:53:55,由用戶 hacluster 通過 crmd 在節(jié)點(diǎn) gdb2 上執(zhí)行。
3 nodes configured:該集群配置了 3 個(gè)節(jié)點(diǎn)。
2 resource instances configured:該集群中配置了 2 個(gè)資源實(shí)例。
Online: [ gdb1 gdb2 gdb3 ]:當(dāng)前在線的節(jié)點(diǎn)為 gdb1、gdb2 和 gdb3。
Full list of resources:列出了該集群中所有的資源,包括資源名稱、資源類型和所在節(jié)點(diǎn),以及資源的啟動(dòng)狀態(tài)和當(dāng)前狀態(tài)。其中,dbservice 是資源組名稱,lvs 是類型為 ocf::heartbeat:ldirectord 的資源,vip 是類型為 ocf::heartbeat:IPaddr 的資源。
Daemon Status:列出了 Pacemaker 各個(gè)組件的運(yùn)行狀態(tài),包括 corosync、pacemaker 和 pcsd。corosync、pacemaker 和 pcsd 均為 active/enabled 狀態(tài),表示它們都在運(yùn)行并且已經(jīng)啟用。
- 在上面pcs status輸出的vip Started gdb3的gdb3服務(wù)器上啟動(dòng)ldirectord服務(wù)
[#19#root@gdb3 ~ 11:50:51]19 systemctl start ldirectord
[#20#root@gdb3 ~ 11:50:58]20
[#20#root@gdb3 ~ 11:50:59]20 systemctl status ldirectord
● ldirectord.service - LSB: Control Linux Virtual Server via ldirectord on non-heartbeat systems
Loaded: loaded (/etc/rc.d/init.d/ldirectord; bad; vendor preset: disabled)
Active: active (running) since 四 2023-02-23 11:50:58 CST; 2s ago
Docs: man:systemd-sysv-generator(8)
Process: 1472 ExecStop=/etc/rc.d/init.d/ldirectord stop (code=exited, status=0/SUCCESS)
Process: 1479 ExecStart=/etc/rc.d/init.d/ldirectord start (code=exited, status=0/SUCCESS)
Tasks: 1
Memory: 15.8M
CGroup: /system.slice/ldirectord.service
└─1484 /usr/bin/perl -w /usr/sbin/ldirectord start
2月 23 11:50:58 gdb3 ldirectord[1479]: at /usr/sbin/ldirectord line 838.
2月 23 11:50:58 gdb3 ldirectord[1479]: Subroutine main::unpack_sockaddr_in6 redefined at /usr/share/perl5/vendor_perl/Exporter.pm line 66.
2月 23 11:50:58 gdb3 ldirectord[1479]: at /usr/sbin/ldirectord line 838.
2月 23 11:50:58 gdb3 ldirectord[1479]: Subroutine main::sockaddr_in6 redefined at /usr/share/perl5/vendor_perl/Exporter.pm line 66.
2月 23 11:50:58 gdb3 ldirectord[1479]: at /usr/sbin/ldirectord line 838.
2月 23 11:50:58 gdb3 ldirectord[1479]: Subroutine main::pack_sockaddr_in6 redefined at /usr/sbin/ldirectord line 3078.
2月 23 11:50:58 gdb3 ldirectord[1479]: Subroutine main::unpack_sockaddr_in6 redefined at /usr/sbin/ldirectord line 3078.
2月 23 11:50:58 gdb3 ldirectord[1479]: Subroutine main::sockaddr_in6 redefined at /usr/sbin/ldirectord line 3078.
2月 23 11:50:58 gdb3 ldirectord[1479]: success
2月 23 11:50:58 gdb3 systemd[1]: Started LSB: Control Linux Virtual Server via ldirectord on non-heartbeat systems.
[#21#root@gdb3 ~ 11:51:01]21
通過上述操作即完成集群啟動(dòng)。
集群停止
- 停止resource
# pcs resource disable vip lvs 或者 pcs resource disable dbservice
# systemctl stop corosync pacemaker pcsd ldirectord
卸載集群
# pcs cluster stop
# pcs cluster destroy
# systemctl stop pcsd pacemaker corosync ldirectord
# systemctl disable pcsd pacemaker corosync ldirectord
# yum remove -y pacemaker corosync pcs ldirectord
# rm -rf /var/lib/pcsd/* /var/lib/corosync/*
# rm -f /etc/ha.d/ldirectord.cf
三、高可用及負(fù)載均衡測試
- 在172.17.139.62上通過for循環(huán),訪問VIP,觀察負(fù)載均衡情況
注意:VIP無法在real server服務(wù)器上進(jìn)行訪問,因此需要第4臺服務(wù)器進(jìn)行訪問驗(yàn)證
# for x in {1..100}; do mysql -uroot -pAbc1234567* -h172.17.129.1 -P6446 -N -e 'select sleep(60)' 2> /dev/null & done
在pcs resource lvs運(yùn)行的服務(wù)器上,執(zhí)行ipvsadm -Ln
[#26#root@gdb1 ~ 15:52:28]26 ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 172.17.129.1:6446 rr
-> 172.17.139.164:6446 Route 1 33 0
-> 172.17.140.24:6446 Route 1 34 0
-> 172.17.140.25:6446 Route 1 33 0
TCP 172.17.129.1:6447 rr
-> 172.17.139.164:6447 Route 1 0 0
-> 172.17.140.24:6447 Route 1 0 0
-> 172.17.140.25:6447 Route 1 0 0
[#27#root@gdb1 ~ 15:52:29]27
可以看到訪問被平均負(fù)載到每個(gè)服務(wù)器上了。
在每個(gè)服務(wù)器上,通過netstat -alntp| grep 172.17.139.62確認(rèn)請求的存在,其中172.17.139.62是發(fā)起請求的IP地址。
[#28#root@gdb1 ~ 15:53:10]28 netstat -alntp| grep 172.17.139.62 | grep 6446
tcp 0 0 172.17.129.1:6446 172.17.139.62:54444 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54606 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54592 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54492 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54580 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54432 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54586 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54552 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54404 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54566 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54516 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54560 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54450 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54480 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54540 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54522 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54462 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54528 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54534 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54598 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54498 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54426 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54510 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54504 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54412 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54612 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54456 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54468 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54474 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54486 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54574 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54438 ESTABLISHED 1902/./mysqlrouter
tcp 0 0 172.17.129.1:6446 172.17.139.62:54546 ESTABLISHED 1902/./mysqlrouter
[#29#root@gdb1 ~ 15:53:13]29
- 停止gdb3服務(wù)器上的MySQl Router,重新發(fā)起100個(gè)新的請求,觀察路由轉(zhuǎn)發(fā)情況
[#29#root@gdb1 ~ 15:55:02]29 ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 172.17.129.1:6446 rr
-> 172.17.140.24:6446 Route 1 0 34
-> 172.17.140.25:6446 Route 1 0 33
TCP 172.17.129.1:6447 rr
-> 172.17.140.24:6447 Route 1 0 0
-> 172.17.140.25:6447 Route 1 0 0
[#30#root@gdb1 ~ 15:55:03]30
[#30#root@gdb1 ~ 15:55:21]30 ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 172.17.129.1:6446 rr
-> 172.17.140.24:6446 Route 1 0 34
-> 172.17.140.25:6446 Route 1 0 33
TCP 172.17.129.1:6447 rr
-> 172.17.140.24:6447 Route 1 50 0
-> 172.17.140.25:6447 Route 1 50 0
[#31#root@gdb1 ~ 15:55:21]31
通過上述結(jié)果可以看到,gdb3服務(wù)器的MySQL Router停止后,路由規(guī)則從集群中剔除,再次發(fā)起的100個(gè)請求,平均分配到了剩下的兩個(gè)服務(wù)器上,符合預(yù)期效果。
四、問題處理
- pcs cluster啟動(dòng)異常
# pcs cluster start --all
報(bào)錯(cuò):unable to connect to [node], try setting higher timeout in --request-timeout option
添加超時(shí)參數(shù),再次啟動(dòng)
# pcs cluster start --all --request-timeout 120000
# pcs cluster enable --all
也有可能是其他節(jié)點(diǎn)的pcsd服務(wù)沒有啟動(dòng)成功,啟動(dòng)其他節(jié)點(diǎn)pcsd服務(wù)后再啟動(dòng)pcs cluster
- 兩個(gè)節(jié)點(diǎn)的pcs集群,需要關(guān)閉投票機(jī)制
# pcs property set no-quorum-policy=ignore
- 日志文件查看,如果啟動(dòng)、運(yùn)行異常,可以查看下面兩個(gè)日志文件,分析具體異常原因
# tail -n 30 /var/log/ldirectord.log
# tail -n 30 /var/log/pacemaker.log
- pcs status輸出offline節(jié)點(diǎn)
[#4#root@gdb1 ~ 11:21:23]4 pcs status
Cluster name: db_ha_lvs
Stack: corosync
Current DC: gdb2 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Thu Mar 2 11:21:27 2023
Last change: Wed Mar 1 16:01:56 2023 by root via cibadmin on gdb1
3 nodes configured
2 resource instances configured (2 DISABLED)
Online: [ gdb1 gdb2 ]
OFFLINE: [ gdb3 ]
Full list of resources:
Resource Group: dbservice
vip (ocf::heartbeat:IPaddr): Stopped (disabled)
lvs (ocf::heartbeat:ldirectord): Stopped (disabled)
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[#5#root@gdb1 ~ 11:21:27]5
- 啟動(dòng)sh vip.sh start后節(jié)點(diǎn)退出集群
[#28#root@gdb3 /data/dbscale/lvs 10:06:10]28 ifconfig -a
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.17.139.164 netmask 255.255.240.0 broadcast 172.17.143.255
inet6 fe80::216:3eff:fe07:3778 prefixlen 64 scopeid 0x20<link>
ether 00:16:3e:07:37:78 txqueuelen 1000 (Ethernet)
RX packets 17967625 bytes 2013372790 (1.8 GiB)
RX errors 0 dropped 13 overruns 0 frame 0
TX packets 11997866 bytes 7616182902 (7.0 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 177401 bytes 16941285 (16.1 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 177401 bytes 16941285 (16.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255
ether 52:54:00:96:cf:dd txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
virbr0-nic: flags=4098<BROADCAST,MULTICAST> mtu 1500
ether 52:54:00:96:cf:dd txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
但是由于real server的是172.17.140.24、172.17.140.25、172.17.139.164,此時(shí)使用255.255.240.0無法通信,將其修改為255.255.0.0,再次啟動(dòng)后訪問正常。
#!/bin/bash
. /etc/init.d/functions
SNS_VIP=172.17.129.1
case "$1" in
start)
ifconfig lo:0 $SNS_VIP netmask 255.255.0.0 broadcast $SNS_VIP
# /sbin/route add -host $SNS_VIP dev lo:0
echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore
echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce
echo "1" >/proc/sys/net/ipv4/conf/all/arp_ignore
echo "2" >/proc/sys/net/ipv4/conf/all/arp_announce
sysctl -p >/dev/null 2>&1
echo "RealServer Start OK"
;;
stop)
ifconfig lo:0 down
# route del $SNS_VIP >/dev/null 2>&1
echo "0" >/proc/sys/net/ipv4/conf/lo/arp_ignore
echo "0" >/proc/sys/net/ipv4/conf/lo/arp_announce
echo "0" >/proc/sys/net/ipv4/conf/all/arp_ignore
echo "0" >/proc/sys/net/ipv4/conf/all/arp_announce
echo "RealServer Stoped"
;;
*)
echo "Usage: $0 {start|stop}"
exit 1
esac
exit 0