MHA架构搭建中遇到的问题

1. 两个包:mha4mysql-manager-0.56-0.el6.noarch.rpm 和 mha4mysql-node-0.56-0.el6.norch.rpm

地址://code.google.com/archive/p/mysql-master-ha/

 

2. 一些依赖包

yum install perl-DBD-MySQL
yum install perl-Config-Tiny
yum install perl-Log-Dispatch
yum install perl-Parallel-ForkManager

所有节点全装,不然可能报错;

 

3. manager节点的一些工具:

masterha_check_ssh:MHA依赖的ssh环境检测

masterha_check_repl:MHA复制环境检测

masterha_manager:服务主程序

masterha_check_status:MHA运行状态检测

masterha_stop:关闭MHA

 

4. masterha_check_ssh检测中遇到的问题:

[root@manager ~]# masterha_check_ssh -conf=/etc/mha_master/mha.cnf
Can’t locate MHA/SSHCheck.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/bin/masterha_check_ssh line 25.
BEGIN failed–compilation aborted at /usr/bin/masterha_check_ssh line 25.

应该是环境变量的问题;

[root@manager ~]# find / -name SSHCheck.pm

/usr/lib/perl5/vendor_perl/MHA/SSHCheck.pm

将相关路径加入PERL5LIB,(根本问题是MHA和OS版本不匹配)

export PERL5LIB=$PERL5LIB:/usr/lib/perl5/vendor_perl/

 

5. materha_check_repl检测遇到的问题:

[root@manager ~]# masterha_check_repl -conf=/etc/mha_master/mha.cnf
Mon Mar 1 12:27:17 2021 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Mar 1 12:27:17 2021 – [info] Reading application default configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 12:27:17 2021 – [info] Reading server configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 12:27:17 2021 – [info] MHA::MasterMonitor version 0.55.
Creating directory /etc/mha_master/app1.. done.
Mon Mar 1 12:27:17 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/ServerManager.pm, ln255] Got MySQL error when connecting 192.168.10.30(192.168.10.30:3306) :1130:Host ‘192.168.10.10’ is not allowed to connect to this MariaDB server, but this is not mysql crash. Check MySQL server settings.
at /usr/lib/perl5/vendor_perl//MHA/ServerManager.pm line 251.
Mon Mar 1 12:27:18 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/ServerManager.pm, ln263] Got fatal error, stopping operations
Mon Mar 1 12:27:18 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. at /usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm line 300.
Mon Mar 1 12:27:18 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Mon Mar 1 12:27:18 2021 – [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

检查每个节点是不是都安装了依赖包;

 

6. materha_check_repl检测遇到的问题:

[root@manager ~]# masterha_check_repl -conf=/etc/mha_master/mha.cnf
Mon Mar 1 12:29:06 2021 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Mar 1 12:29:06 2021 – [info] Reading application default configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 12:29:06 2021 – [info] Reading server configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 12:29:06 2021 – [info] MHA::MasterMonitor version 0.55.
Mon Mar 1 12:29:06 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/ServerManager.pm, ln255] Got MySQL error when connecting 192.168.10.30(192.168.10.30:3306) :1045:Access denied for user ‘mhaadmin’@’192.168.10.10’ (using password: YES), but this is not mysql crash. Check MySQL server settings.
at /usr/lib/perl5/vendor_perl//MHA/ServerManager.pm line 251.
Mon Mar 1 12:29:07 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/ServerManager.pm, ln263] Got fatal error, stopping operations
Mon Mar 1 12:29:07 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. at /usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm line 300.
Mon Mar 1 12:29:07 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Mon Mar 1 12:29:07 2021 – [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

查看每个MySQL中是不是都有对manager节点的授权了,如果有的话刷新一下授权表;

 

7. materha_check_repl检测遇到的问题:

[root@manager ~]# masterha_check_repl -conf=/etc/mha_master/mha.cnf
Mon Mar 1 12:34:06 2021 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Mar 1 12:34:06 2021 – [info] Reading application default configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 12:34:06 2021 – [info] Reading server configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 12:34:06 2021 – [info] MHA::MasterMonitor version 0.55.
Mon Mar 1 12:34:07 2021 – [info] Dead Servers:
Mon Mar 1 12:34:07 2021 – [info] Alive Servers:
Mon Mar 1 12:34:07 2021 – [info] 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 12:34:07 2021 – [info] 192.168.10.30(192.168.10.30:3306)
Mon Mar 1 12:34:07 2021 – [info] 192.168.10.40(192.168.10.40:3306)
Mon Mar 1 12:34:07 2021 – [info] Alive Slaves:
Mon Mar 1 12:34:07 2021 – [info] 192.168.10.30(192.168.10.30:3306) Version=5.5.68-MariaDB (oldest major version between slaves) log-bin:enabled
Mon Mar 1 12:34:07 2021 – [info] Replicating from 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 12:34:07 2021 – [info] Primary candidate for the new Master (candidate_master is set)
Mon Mar 1 12:34:07 2021 – [info] 192.168.10.40(192.168.10.40:3306) Version=5.5.68-MariaDB (oldest major version between slaves) log-bin:enabled
Mon Mar 1 12:34:07 2021 – [info] Replicating from 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 12:34:07 2021 – [info] Primary candidate for the new Master (candidate_master is set)
Mon Mar 1 12:34:07 2021 – [info] Current Alive Master: 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 12:34:07 2021 – [info] Checking slave configurations..
Mon Mar 1 12:34:07 2021 – [info] Checking replication filtering settings..
Mon Mar 1 12:34:07 2021 – [info] binlog_do_db= , binlog_ignore_db=
Mon Mar 1 12:34:07 2021 – [info] Replication filtering check ok.
Mon Mar 1 12:34:07 2021 – [info] Starting SSH connection tests..
Mon Mar 1 12:34:10 2021 – [info] All SSH connection tests passed successfully.
Mon Mar 1 12:34:10 2021 – [info] Checking MHA Node version..
Mon Mar 1 12:34:10 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/ManagerUtil.pm, ln122] Got error when getting node version. Error:
Mon Mar 1 12:34:10 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/ManagerUtil.pm, ln123]
Can’t locate MHA/BinlogManager.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/bin/apply_diff_relay_logs line 24.
BEGIN failed–compilation aborted at /usr/bin/apply_diff_relay_logs line 24.
Mon Mar 1 12:34:10 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/ManagerUtil.pm, ln151] node version on 192.168.10.30 not found! Maybe MHA Node package is not installed?
at /usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm line 346.
Mon Mar 1 12:34:10 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. node version on 192.168.10.30 not found! Maybe MHA Node package is not installed?
at /usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm line 346.
…propagated at /usr/lib/perl5/vendor_perl//MHA/ManagerUtil.pm line 152.
Mon Mar 1 12:34:10 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Mon Mar 1 12:34:10 2021 – [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

解决办法:在每个节点上设置软链接:ln -s /usr/lib/perl5/vendor_perl/MHA /usr/lib64/perl5/vendor_perl/

 

8. materha_check_repl检测遇到的问题:

[root@manager ~]# masterha_check_repl -conf=/etc/mha_master/mha.cnf
Mon Mar 1 15:16:22 2021 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Mar 1 15:16:22 2021 – [info] Reading application default configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 15:16:22 2021 – [info] Reading server configurations from /etc/mha_master/mha.cnf..
Mon Mar 1 15:16:22 2021 – [info] MHA::MasterMonitor version 0.55.
Mon Mar 1 15:16:23 2021 – [info] Dead Servers:
Mon Mar 1 15:16:23 2021 – [info] Alive Servers:
Mon Mar 1 15:16:23 2021 – [info] 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 15:16:23 2021 – [info] 192.168.10.30(192.168.10.30:3306)
Mon Mar 1 15:16:23 2021 – [info] 192.168.10.40(192.168.10.40:3306)
Mon Mar 1 15:16:23 2021 – [info] Alive Slaves:
Mon Mar 1 15:16:23 2021 – [info] 192.168.10.30(192.168.10.30:3306) Version=5.5.68-MariaDB (oldest major version between slaves) log-bin:enabled
Mon Mar 1 15:16:23 2021 – [info] Replicating from 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 15:16:23 2021 – [info] Primary candidate for the new Master (candidate_master is set)
Mon Mar 1 15:16:23 2021 – [info] 192.168.10.40(192.168.10.40:3306) Version=5.5.68-MariaDB (oldest major version between slaves) log-bin:enabled
Mon Mar 1 15:16:23 2021 – [info] Replicating from 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 15:16:23 2021 – [info] Primary candidate for the new Master (candidate_master is set)
Mon Mar 1 15:16:23 2021 – [info] Current Alive Master: 192.168.10.20(192.168.10.20:3306)
Mon Mar 1 15:16:23 2021 – [info] Checking slave configurations..
Mon Mar 1 15:16:23 2021 – [info] Checking replication filtering settings..
Mon Mar 1 15:16:23 2021 – [info] binlog_do_db= , binlog_ignore_db=
Mon Mar 1 15:16:23 2021 – [info] Replication filtering check ok.
Mon Mar 1 15:16:23 2021 – [info] Starting SSH connection tests..
Mon Mar 1 15:16:25 2021 – [info] All SSH connection tests passed successfully.
Mon Mar 1 15:16:25 2021 – [info] Checking MHA Node version..
Mon Mar 1 15:16:26 2021 – [info] Version check ok.
Mon Mar 1 15:16:26 2021 – [info] Checking SSH publickey authentication settings on the current master..
Mon Mar 1 15:16:26 2021 – [info] HealthCheck: SSH to 192.168.10.20 is reachable.
Mon Mar 1 15:16:27 2021 – [info] Master MHA Node version is 0.54.
Mon Mar 1 15:16:27 2021 – [info] Checking recovery script configurations on the current master..
Mon Mar 1 15:16:27 2021 – [info] Executing command: save_binary_logs –command=test –start_pos=4 –binlog_dir=/var/lib/mysql,/var/log/mysql –output_file=/mydata/mha_masterapp1/save_binary_logs_test –manager_version=0.55 –start_file=mysql-bin.000002
Mon Mar 1 15:16:27 2021 – [info] Connecting to [email protected](192.168.10.20)..
Creating /mydata/mha_masterapp1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000002
Mon Mar 1 15:16:27 2021 – [info] Master setting check done.
Mon Mar 1 15:16:27 2021 – [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Mar 1 15:16:27 2021 – [info] Executing command : apply_diff_relay_logs –command=test –slave_user=’mhaadmin’ –slave_host=192.168.10.30 –slave_ip=192.168.10.30 –slave_port=3306 –workdir=/mydata/mha_masterapp1 –target_version=5.5.68-MariaDB –manager_version=0.55 –relay_log_info=/var/lib/mysql/relay-log.info –relay_dir=/var/lib/mysql/ –slave_pass=xxx
Mon Mar 1 15:16:27 2021 – [info] Connecting to [email protected](192.168.10.30:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info … ok.
Relay log found at /var/lib/mysql, up to mysql-relay-bin.000005
Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000005
Testing mysql connection and privileges..ERROR 1045 (28000): Access denied for user ‘mhaadmin’@’slave1’ (using password: YES)
mysql command failed with rc 1:0!
at /usr/bin/apply_diff_relay_logs line 367.
main::check() called at /usr/bin/apply_diff_relay_logs line 486
eval {…} called at /usr/bin/apply_diff_relay_logs line 466
main::main() called at /usr/bin/apply_diff_relay_logs line 112
Mon Mar 1 15:16:28 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln195] Slaves settings check failed!
Mon Mar 1 15:16:28 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln375] Slave configuration failed.
Mon Mar 1 15:16:28 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. at /usr/bin/masterha_check_repl line 48.
Mon Mar 1 15:16:28 2021 – [error][/usr/lib/perl5/vendor_perl//MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
Mon Mar 1 15:16:28 2021 – [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

这里可能是MHA自动识别主机名对主机名进行解析,在/etc/hosts下添加解析就行了;

 

9. 启动MHA时候的报错:

Mon Mar 1 16:03:14 2021 – [warning] Got error on MySQL connect: 2003 (Can’t connect to MySQL server on ‘192.168.10.20’ (4))
Mon Mar 1 16:03:14 2021 – [warning] Connection failed 1 time(s)..
Mon Mar 1 16:03:15 2021 – [warning] Got error on MySQL connect: 2003 (Can’t connect to MySQL server on ‘192.168.10.20’ (4))
Mon Mar 1 16:03:15 2021 – [warning] Connection failed 2 time(s)..
Mon Mar 1 16:03:16 2021 – [warning] Got error on MySQL connect: 2003 (Can’t connect to MySQL server on ‘192.168.10.20’ (4))
Mon Mar 1 16:03:16 2021 – [warning] Connection failed 3 time(s)..
Mon Mar 1 16:03:18 2021 – [warning] HealthCheck: Got timeout on checking SSH connection to 192.168.10.20! at /usr/lib/perl5/vendor_perl//MHA/HealthCheck.pm line 298.
Mon Mar 1 16:03:18 2021 – [warning] Master is not reachable from health checker!
Mon Mar 1 16:03:18 2021 – [warning] Master 192.168.10.20(192.168.10.20:3306) is not reachable!
Mon Mar 1 16:03:18 2021 – [warning] SSH is NOT reachable.
Mon Mar 1 16:03:18 2021 – [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha_master/mha.cnf again, and trying to connect to all servers to check server status..

这里连接不上master节点,还是主机名解析的问题,添加/etc/hosts解析;

10. 切换master节点后,MHA会down掉,添加新节点后,需要重启,新的从节点可能没有对manager的授权,需要刷新一下授权表;