MySQLd_safe 命令啟動,kill 父進程(非會話首進程),不會導致子進程退出;重新啟動父進程,報錯子進程已存在??梢允褂胢ysqld_safe命令啟動 改造mysqld_safe腳本,實現【啟、停mysqld_safe進程,不會影響mysqld進程】。
1、背景
公司內部看到一則問題
- ill -9 mysqld_safe 進程
- systemd 檢測到 mysqld_safe 進程不存在后,重新拉起 mysqld_safe 進程
- mysqld_safe 進程啟動后,發現 mysqld 進程也被重啟
期望:啟、停 mysqld_safe 進程,不會影響 mysqld 進程
2、systemd 服務啟動
2.1、復現問題
1)、查看數據庫服務狀態
[greatsql@greatsql-1 ~]$ sudo systemctl status db-4306
● db-4306.service - db-4306 Server
Loaded: loaded (/usr/lib/systemd/system/db-4306.service; disabled; vendor preset: disabled)
Active: active (running) since Wed 2023-07-19 11:15:18 CST; 6h ago
MAIn PID: 14917 (mysqld_safe)
CGroup: /system.slice/db-4306.service
├─14917 /bin/sh /greatsql/svr/greatsql/bin/mysqld_safe --defaults-file=/greatsql/conf/greatsql4306.cnf
└─16340 /greatsql/svr/greatsql/bin/mysqld --defaults-file=/greatsql/conf/greatsql4306.cnf --basedir=/greatsql/svr/greatsql --datadir=/greatsql/dbdata/data4306/data -...
Jul 19 11:15:18 greatsql-1 systemd[1]: Started db-4306 Server.
Jul 19 11:15:19 greatsql-1 mysqld_safe[14917]: mysqld_safe Adding '/greatsql/svr/GreatSQL-8.0.32-24-linux-glibc2.17-x86_64/lib/mysql/libjemalloc.so.1' t...or mysqldJul 19 11:15:19 greatsql-1 mysqld_safe[14917]: 2023-07-19T03:15:19.907338Z mysqld_safe Logging to '/greatsql/logs/error4306.log'.
Jul 19 11:15:19 greatsql-1 mysqld_safe[14917]: 2023-07-19T03:15:19.953728Z mysqld_safe Starting mysqld daemon with databases from /greatsql/dbdata/data4306/data
Hint: Some lines were ellipsized, use -l to show in full.
2)、kill -9 mysqld_safe 進程,并再次查看數據庫服務狀態
[greatsql@greatsql-1 ~]$ kill -9 14917
[greatsql@greatsql-1 ~]$ sudo systemctl status db-4306
● db-4306.service - db-4306 Server
Loaded: loaded (/usr/lib/systemd/system/db-4306.service; disabled; vendor preset: disabled)
Active: active (running) since Wed 2023-07-19 18:00:33 CST; 43s ago
Main PID: 15195 (mysqld_safe)
Tasks: 50
CGroup: /system.slice/db-4306.service
├─15195 /bin/sh /greatsql/svr/greatsql/bin/mysqld_safe --defaults-file=/greatsql/conf/greatsql4306.cnf
└─16613 /greatsql/svr/greatsql/bin/mysqld --defaults-file=/greatsql/conf/greatsql4306.cnf --basedir=/greatsql/svr/greatsql --datadir=/greatsql/dbdata/data4306/data -...
Jul 19 18:00:33 greatsql-1 systemd[1]: Started db-4306 Server.
Jul 19 18:00:34 greatsql-1 mysqld_safe[15195]: mysqld_safe Adding '/greatsql/svr/GreatSQL-8.0.32-24-Linux-glibc2.17-x86_64/lib/mysql/libjemalloc.so.1' t...or mysqldJul 19 18:00:34 greatsql-1 mysqld_safe[15195]: 2023-07-19T10:00:34.640240Z mysqld_safe Logging to '/greatsql/logs/error4306.log'.
Jul 19 18:00:34 greatsql-1 mysqld_safe[15195]: 2023-07-19T10:00:34.679333Z mysqld_safe Starting mysqld daemon with databases from /greatsql/dbdata/data4306/data
Hint: Some lines were ellipsized, use -l to show in full.
確實在 kill -9 mysqld_safe 后,重新拉起了 mysqld_safe 和 mysqld 進程(它們的 PID 和之前不一樣)
3)、查看數據庫錯誤日志
2023-07-19T18:00:31.933020+08:00 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.32-24).
4)、查看 service 文件
[greatsql@greatsql-1 ~]$ cat /usr/lib/systemd/system/db-4306.service
[Unit]
Description=db-4306 Server
After.NETwork.target
[Install]
WantedBy=multi-user.target
[Service]
User=greatsql
Group=greatsql
Type=simple
ExecStart=/greatsql/svr/greatsql/bin/mysqld_safe --defaults-file=/greatsql/conf/greatsql4306.cnf
Restart=on-failure
LimitNOFILE=1024000
LimitNPROC=1024000
TimeoutStopSec=15
PrivateTmp=false
2.2、分析原因
1)、查看進程信息
[greatsql@greatsql-1 ~]$ ps axj |head -1;ps axj |grep 4306 |grep -v grep
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 15195 15195 15195 ? -1 Ss 986 0:00 /bin/sh /greatsql/svr/greatsql/bin/mysqld_safe --defaults-file=/greatsql/conf/greatsql4306.cnf
15195 16613 15195 15195 ? -1 Sl 986 0:06 /greatsql/svr/greatsql/bin/mysqld --defaults-file=/greatsql/conf/greatsql4306.cnf --basedir=/greatsql/svr/greatsql --datadir=/greatsql/dbdata/data4306/data --plugin-dir=/greatsql/svr/greatsql/lib/plugin --log-error=/greatsql/logs/error4306.log --open-files-limit=65535 --pid-file=/greatsql/dbdata/data4306/data/mysql.pid --socket=/greatsql/dbdata/data4306/data/mysql.sock --port=4306
mysqld_safe:PID(進程 ID)=PGID(進程組 ID)=SID(會話 ID),說明它是會話首進程,也是該進程組的組長
mysqld_safe 的 PID(進程 ID)=mysqld 的 PPID(父進程 ID),說明 mysqld_safe 是 mysqld 的父進程
kill -9 mysqld_safe (會話首進程),會向該進程組的每一個進程發送 SIGKILL,導致組中的進程被中止
2)、整體流程
- mysqld_safe 是會話首進程,kill -9 mysqld_safe,導致組中所有進程被 kill
- systemd 檢測到 mysqld_safe 異常退出,Restart=on-failure 觸發重新拉起 mysqld_safe
- mysqld_safe 拉起子進程 mysqld
3、mysqld_safe 命令啟動
1)、使用 mysqld_safe 啟動數據庫
[greatsql@greatsql-1 ~]$ /greatsql/svr/greatsql/bin/mysqld_safe --defaults-file=/greatsql/conf/greatsql4306.cnf &
[1] 18229
[greatsql@greatsql-1 ~]$ mysqld_safe Adding '/greatsql/svr/GreatSQL-8.0.32-24-Linux-glibc2.17-x86_64/lib/mysql/libjemalloc.so.1' to LD_PRELOAD for mysqld
2023-07-19T14:20:19.135297Z mysqld_safe Logging to '/greatsql/logs/error4306.log'.
2023-07-19T14:20:19.173594Z mysqld_safe Starting mysqld daemon with databases from /greatsql/dbdata/data4306/data
2)、查看進程信息
[greatsql@greatsql-1 ~]$ ps axj |head -1;ps axj |grep 4306 |grep -v grep
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
17360 18229 18229 17206 pts/7 17360 S 986 0:00 /bin/sh /greatsql/svr/greatsql/bin/mysqld_safe --defaults-file=/greatsql/conf/greatsql4306.cnf
18229 19658 18229 17206 pts/7 17360 Sl 986 0:02 /greatsql/svr/greatsql/bin/mysqld --defaults-file=/greatsql/conf/greatsql4306.cnf --basedir=/greatsql/svr/greatsql --datadir=/greatsql/dbdata/data4306/data --plugin-dir=/greatsql/svr/greatsql/lib/plugin --log-error=/greatsql/logs/error4306.log --open-files-limit=65535 --pid-file=/greatsql/dbdata/data4306/data/mysql.pid --socket=/greatsql/dbdata/data4306/data/mysql.sock --port=4306
PID≠SID,不是會話首進程PGID≠TPGID,是后臺進程組mysqld_safe 的 PID=mysqld 的 PPID,說明 mysqld_safe 是 mysqld 的父進程
3)、kill -9 mysqld_safe 進程,并再次查看進程信息
[greatsql@greatsql-1 ~]$ kill -9 18229
[greatsql@greatsql-1 ~]$ ps axj |head -1;ps axj |grep 4306 |grep -v grep
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 19658 18229 17206 pts/7 17360 Sl 986 0:07 /greatsql/svr/greatsql/bin/mysqld --defaults-file=/greatsql/conf/greatsql4306.cnf --basedir=/greatsql/svr/greatsql --datadir=/greatsql/dbdata/data4306/data --plugin-dir=/greatsql/svr/greatsql/lib/plugin --log-error=/greatsql/logs/error4306.log --open-files-limit=65535 --pid-file=/greatsql/dbdata/data4306/data/mysql.pid --socket=/greatsql/dbdata/data4306/data/mysql.sock --port=4306
kill -9 mysqld_safe (非會話首進程),不影響同組的進程(mysqld),此時 init 進程會自動領養 mysqld 進程
4)、重新啟動 mysqld_safe 進程,并再次查看進程信息
[greatsql@greatsql-1 ~]$ /greatsql/svr/greatsql/bin/mysqld_safe --defaults-file=/greatsql/conf/greatsql4306.cnf &
[1] 31401
[greatsql@greatsql-1 ~]$ mysqld_safe Adding '/greatsql/svr/GreatSQL-8.0.32-24-Linux-glibc2.17-x86_64/lib/mysql/libjemalloc.so.1' to LD_PRELOAD for mysqld
2023-07-19T14:38:42.429733Z mysqld_safe Logging to '/greatsql/logs/error4306.log'.
2023-07-19T14:38:42.493870Z mysqld_safe A mysqld process already exists
[1]+ Exit 1 /greatsql/svr/greatsql/bin/mysqld_safe --defaults-file=/greatsql/conf/greatsql4306.cnf
[greatsql@greatsql-1 ~]$ ps axj |head -1;ps axj |grep 4306 |grep -v grep
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 19658 18229 17206 pts/7 17360 Sl 986 0:09 /greatsql/svr/greatsql/bin/mysqld --defaults-file=/greatsql/conf/greatsql4306.cnf --basedir=/greatsql/svr/greatsql --datadir=/greatsql/dbdata/data4306/data --plugin-dir=/greatsql/svr/greatsql/lib/plugin --log-error=/greatsql/logs/error4306.log --open-files-limit=65535 --pid-file=/greatsql/dbdata/data4306/data/mysql.pid --socket=/greatsql/dbdata/data4306/data/mysql.sock --port=4306
mysqld_safe 退出,原因是已存在 mysqld 進程
4、總結
- mysqld_safe 進程和 mysqld 進程是父子進程關系
- systemd 服務啟動,kill 父進程(會話首進程),會導致子進程也退出
- mysqld_safe 命令啟動,kill 父進程(非會話首進程),不會導致子進程退出;重新啟動父進程,報錯子進程已存在
- 可以使用mysqld_safe命令啟動 + 改造mysqld_safe腳本,實現【啟、停mysqld_safe進程,不會影響mysqld進程】。此時不要混合使用systemd啟動數據庫,需要維護這個特殊的mysqld_safe