MariaDB Galera Cluster集群搭建
MariaDB Galera Cluster是什么?
Galera Cluster是由第三方公司Codership所研发的一套免费开源的集群高可用方案,实现了数据零丢失,官网地址为//galeracluster.com/
其在MySQLInnoDB存储引擎基础上打了wrep(虚拟全同步复制),Percona/MariaDB已捆绑在各自的发行版本中
MariaDB Galera Cluster是MariaDB同步多主机集群。它仅支持XtraDB/InnoDB存储引擎(虽然有对MyISAM实验支持,具体看wsrep_replicate_myisam系统变量)
MariaDB Galera Cluster主要功能
1.同步复制
2.真正的multi-master,即所有节点可以同时读写数据库
3. 自动的节点成员控制,失效节点自动被清除
4. 新节点加入数据自动复制
5.真正的并行复制,行级
6.用户可以直接连接集群,使用感受上与MySQL完全一致
优势
- 因为是多主,所以不存在Slavelag(延迟)
- 不存在丢失事务的情况
- 同时具有读和写的扩展能力
- 更小的客户端延迟
- 节点间数据是同步的,而Master/Slave模式是异步的,不同slave上的binlog可能是不同的
缺点
- 加入新节点时开销大,需要复制完整的数据
- 不能有效地解决写扩展的问题,所有的写操作都发生在所有的节点
- 有多少个节点,就有多少份重复的数据
- 由于事务提交需要跨节点通信,即涉及分布式事务操作,因此写入会比主从复制慢很多,节点越多,写入越慢,死锁和回滚也会更加频繁
- 对网络要求比较高,如果网络出现波动不稳定,则可能会造成两个节点失联,Galera Cluster集群会发生脑裂,服务将不可用
存在局限
- 仅支持InnoDB/XtraDB存储引擎,任何写入其他引擎的表,包括mysql.*表都不会被复制。但是DDL语句可以复制,但是insert into mysql.user(MyISAM存储引擎)之类的插入数据不会被复制
- Delete操作不支持没有主键的表,因为没有主键的表在不同的节点上的顺序不同,如果执行select … limit …将出现不同的结果集
- LOCK/UNLOCK TABLES/FLUSH TABLES WITH READ LOCKS不支持单表所锁,以及锁函数GET_LOCK()、RELEASE_LOCK(),但FLUSH TABLES WITH READ LOCK支持全局表锁
- General Query Log日志不能保存在表中,如果开始查询日志,则只能保存到文件中
- 不能有大事务写入,不能操作wsrep_max_ws_rows=131072(行),且写入集不能超过wsrep_max_ws_size=1073741824(1GB),否则客户端直接报错
- 由于集群是乐观锁并发控制,因此,在commit阶段会有事务冲突发生。如果两个事务在集群中的不同节点上对同一行写入并提交,则失败的节点将回滚,客户端返回死锁报错
- XA分布式事务不支持Codership Galera Cluster,在提交时可能会回滚
- 整个集群的写入吞吐量取决于最弱的节点限制,集群要使用同一的配置
服务器环境说明
MariaDB版本 | 主机IP | 节点名称 | |
---|---|---|---|
CentOS 7.9 | mariadb-10.3.27 | 192.168.40.50 | MariaDB-Node1 |
CentOS 7.9 | mariadb-10.3.27 | 192.168.40.60 | MariaDB-Node2 |
CentOS 7.9 | mariadb-10.3.27 | 192.168.40.70 |
环境准备
[root@MariaDB-Node1 ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config [root@MariaDB-Node1 ~]# setenforce 0 setenforce: SELinux is disabled [root@MariaDB-Node1 ~]# systemctl stop firewalld.service [root@MariaDB-Node1 ~]# systemctl disable firewalld.service [root@MariaDB-Node1 ~]# vi /etc/security/limits.conf * soft nofile 65536 * hard nofile 65536 [root@MariaDB-Node1 ~]#sysctl -p
安装MariaDB
参考我之前的博客安装,采用二进制安装
//www.cnblogs.com/mike666/p/13666469.html
三台数据库主机同时安装 MariaDB 10.3.27数据库
备注:从MariaDB 10.1.20 版本开始,Galera Cluster就已经包含在MariaDB包里面了,不需要单独部署MariaDB-Galera-server 和galera 包
安装 Galera 依赖包
安装依赖包 [root@MariaDB-Node1 ~]# yum -y install lsof cmake gcc gcc-c++ autoconf automake zlib* libxml* \ ncurses ncurses-devel libtool libtool-ltdl-devel* make bison bison-devel \ openssl-devel libevent-devel \ libaio libaio-devel pam-devel boost boost-devel valgrind-devel \ libnl-devel popt-devel popt-static bzr nmap-* check-devel 安装Scons [root@MariaDB-Node1 ~]# wget http://prdownloads.sourceforge.net/scons/scons-2.5.0.tar.gz [root@MariaDB-Node1 ~]# tar zxvf scons-2.5.0.tar.gz -C /usr/local/ && rm -rf scons-2.5.0.tar.gz [root@MariaDB-Node1 ~]# cd /usr/local/scons-2.5.0/ [root@MariaDB-Node1 /usr/local/scons-2.5.0]# python setup.py install [root@MariaDB-Node1 /usr/local/scons-2.5.0]# cd [root@MariaDB-Node1 ~]# scons -v SCons by Steven Knight et al.: script: v2.5.0.rel_2.5.0:3544:95d356f188a3[MODIFIED], 2016/04/09 14:38:50, by bdbaddog on ubuntu1404-32bit engine: v2.5.0.rel_2.5.0:3544:95d356f188a3[MODIFIED], 2016/04/09 14:38:50, by bdbaddog on ubuntu1404-32bit engine path: ['/usr/lib/scons-2.5.0/SCons'] Copyright (c) 2001 - 2016 The SCons Foundation [root@MariaDB-Node1 ~]#
三台都要安装这个依赖包,不然启动集群可能会报错
修改配置文件my.cnf
下面我们开始配置MariaDB Galera Cluster,分别修改MariaDB Galera集群的每个节点上的/etc/my.cnf文件,具体每个节点的内容如下:
192.168.40.50 配置文件修改:
[root@MariaDB-Node1 ~]# vim /etc/my.cnf [client] port = 3306 socket = /tmp/mysql.sock [mysqld] user = mysql datadir = /data/mysql/data pid-file = /data/mysql/mysql.pid server-id = 50 relay_log =/data/logs/mysql/mysql_relay.log bind-address = 0.0.0.0 tmpdir=/data/tmp init_connect ='SET NAMES utf8' character-set-server = utf8 skip-name-resolve back_log = 300 max_connections = 8019 max_connect_errors = 1024000 open_files_limit = 65535 table_open_cache = 2048 max_allowed_packet = 50M max_heap_table_size = 512M tmp_table_size = 256M read_buffer_size = 2M read_rnd_buffer_size = 16M sort_buffer_size = 8M join_buffer_size = 8M key_buffer_size = 512M thread_cache_size = 128 query_cache_type = 2 query_cache_size = 128M query_cache_limit = 5M thread_stack = 192k ft_min_word_len = 4 log_bin = /data/mysql/mysql_bin.log binlog_format = ROW expire_logs_days = 7 log_error = /data/logs/mysql/mysql_error.log slow_query_log = 1 long_query_time = 1 log_slow_verbosity=query_plan slow_query_log_file = /data/logs/mysql/mysql_slow.log performance_schema = 0 skip-external-locking #跳过外部锁定,避免external locking bulk_insert_buffer_size = 64M myisam_sort_buffer_size = 128M myisam_max_sort_file_size = 10G myisam_repair_threads = 1 default_storage_engine = InnoDB innodb_file_per_table = 1 innodb_open_files = 500 innodb_buffer_pool_size = 2048M innodb_write_io_threads = 8 innodb_read_io_threads = 8 innodb_thread_concurrency = 16 innodb_purge_threads = 1 innodb_flush_log_at_trx_commit = 1 innodb_log_buffer_size = 8M innodb_log_file_size = 256M innodb_log_files_in_group = 3 innodb_max_dirty_pages_pct = 90 innodb_lock_wait_timeout = 10 innodb_autoinc_lock_mode = 2 innodb_doublewrite = 1 ###New ADD innodb_rollback_on_timeout = 1 innodb_force_recovery=0 interactive_timeout = 28800 wait_timeout = 120 [galera] wsrep_provider=/usr/local/mysql/lib/galera/libgalera_smm.so # Galera library 安装包自带有 wsrep_cluster_address="gcomm://192.168.40.50,192.168.40.60,192.168.40.70" # Galera cluster 集群地址 wsrep_cluster_name='Galera_Cluster' # Galera集群名称 wsrep_node_address='192.168.40.50' # 该节点的地址 wsrep_node_name='MariaDB-Node1' # 该节点的主机名 wsrep_slave_threads = 8 #开启并行复制线程,根据CPU核数设置 wsrep_log_conflicts=1 wsrep_on=ON #开启全同步复制模式 wsrep_provider_options="gcache.size=2G" #同步复制缓冲池 wsrep_forced_binlog_format=ROW wsrep_drupal_282555_workaround=1 wsrep_max_ws_size=2147483647 wsrep_sst_method=rsync #拷贝模式 wsrep_sst_auth=galera:galera #galera集群认证用户:密码 query_cache_size=0 #关闭查询缓存 default_storage_engine=InnoDB #Mariadb存储引擎 #wsrep_dirty_reads=0 #当值是1的时候这个节点是只读节点 #wsrep_sst_method=[mysqldump|xtrabackup|xtrabackup-v2|rsync|rsync_wan] #wsrep_sst_method=xtrabackup-v2 [mysqldump] quick max_allowed_packet = 16M [myisamchk] key_buffer_size = 512M sort_buffer_size = 8M read_buffer = 4M write_buffer = 4M [root@MariaDB-Node1 ~]#
192.168.40.60配置文件修改:
[root@MariaDB-Node2 ~]# vim /etc/my.cnf [client] port = 3306 socket = /tmp/mysql.sock [mysqld] user = mysql datadir = /data/mysql/data pid-file = /data/mysql/mysql.pid server-id = 60 relay_log =/data/logs/mysql/mysql_relay.log bind-address = 0.0.0.0 tmpdir=/data/tmp init_connect ='SET NAMES utf8' character-set-server = utf8 skip-name-resolve back_log = 300 max_connections = 8019 max_connect_errors = 1024000 open_files_limit = 65535 table_open_cache = 2048 max_allowed_packet = 50M max_heap_table_size = 512M tmp_table_size = 256M read_buffer_size = 2M read_rnd_buffer_size = 16M sort_buffer_size = 8M join_buffer_size = 8M key_buffer_size = 512M thread_cache_size = 128 query_cache_type = 2 query_cache_size = 128M query_cache_limit = 5M thread_stack = 192k ft_min_word_len = 4 log_bin = /data/mysql/mysql_bin.log binlog_format = ROW expire_logs_days = 7 log_error = /data/logs/mysql/mysql_error.log slow_query_log = 1 long_query_time = 1 log_slow_verbosity=query_plan slow_query_log_file = /data/logs/mysql/mysql_slow.log performance_schema = 0 skip-external-locking #跳过外部锁定,避免external locking bulk_insert_buffer_size = 64M myisam_sort_buffer_size = 128M myisam_max_sort_file_size = 10G myisam_repair_threads = 1 default_storage_engine = InnoDB innodb_file_per_table = 1 innodb_open_files = 500 innodb_buffer_pool_size = 2048M innodb_write_io_threads = 8 innodb_read_io_threads = 8 innodb_thread_concurrency = 16 innodb_purge_threads = 1 innodb_flush_log_at_trx_commit = 1 innodb_log_buffer_size = 8M innodb_log_file_size = 256M innodb_log_files_in_group = 3 innodb_max_dirty_pages_pct = 90 innodb_lock_wait_timeout = 10 innodb_autoinc_lock_mode = 2 innodb_doublewrite = 1 innodb_rollback_on_timeout = 1 innodb_force_recovery=0 interactive_timeout = 28800 wait_timeout = 120 [galera] wsrep_provider=/usr/local/mysql/lib/galera/libgalera_smm.so wsrep_cluster_address="gcomm://192.168.40.50,192.168.40.60,192.168.40.70" wsrep_cluster_name='Galera_Cluster' wsrep_node_address='192.168.40.60' wsrep_node_name='MariaDB-Node2' wsrep_slave_threads = 8 #开启并行复制线程,根据CPU核数设置 wsrep_log_conflicts=1 wsrep_on=ON #开启全同步复制模式 wsrep_provider_options="gcache.size=2G" #同步复制缓冲池 wsrep_forced_binlog_format=ROW wsrep_drupal_282555_workaround=1 wsrep_max_ws_size=2147483647 wsrep_sst_method=rsync #拷贝模式 wsrep_sst_auth=galera:galera #galera集群认证用户:密码 query_cache_size=0 #关闭查询缓存 default_storage_engine=InnoDB #Mariadb存储引擎 #wsrep_dirty_reads=0 #当值是1的时候这个节点是只读节点 #wsrep_sst_method=[mysqldump|xtrabackup|xtrabackup-v2|rsync|rsync_wan] #wsrep_sst_method=xtrabackup-v2 [mysqldump] quick max_allowed_packet = 16M [myisamchk] key_buffer_size = 512M sort_buffer_size = 8M read_buffer = 4M write_buffer = 4M [root@MariaDB-Node2 ~]#
192.168.40.70配置文件修改:
[root@MariaDB-Node3 ~]# vim /etc/my.cnf [client] port = 3306 socket = /tmp/mysql.sock [mysqld] user = mysql datadir = /data/mysql/data pid-file = /data/mysql/mysql.pid server-id = 70 relay_log =/data/logs/mysql/mysql_relay.log bind-address = 0.0.0.0 tmpdir=/data/tmp init_connect ='SET NAMES utf8' character-set-server = utf8 skip-name-resolve back_log = 300 max_connections = 8019 max_connect_errors = 1024000 open_files_limit = 65535 table_open_cache = 2048 max_allowed_packet = 50M max_heap_table_size = 512M tmp_table_size = 256M read_buffer_size = 2M read_rnd_buffer_size = 16M sort_buffer_size = 8M join_buffer_size = 8M key_buffer_size = 512M thread_cache_size = 128 query_cache_type = 2 query_cache_size = 128M query_cache_limit = 5M thread_stack = 192k ft_min_word_len = 4 log_bin = /data/mysql/mysql_bin.log binlog_format = ROW expire_logs_days = 7 log_error = /data/logs/mysql/mysql_error.log slow_query_log = 1 long_query_time = 1 log_slow_verbosity=query_plan slow_query_log_file = /data/logs/mysql/mysql_slow.log performance_schema = 0 skip-external-locking #跳过外部锁定,避免external locking bulk_insert_buffer_size = 64M myisam_sort_buffer_size = 128M myisam_max_sort_file_size = 10G myisam_repair_threads = 1 default_storage_engine = InnoDB innodb_file_per_table = 1 innodb_open_files = 500 innodb_buffer_pool_size = 2048M innodb_write_io_threads = 8 innodb_read_io_threads = 8 innodb_thread_concurrency = 16 innodb_purge_threads = 1 innodb_flush_log_at_trx_commit = 1 innodb_log_buffer_size = 8M innodb_log_file_size = 256M innodb_log_files_in_group = 3 innodb_max_dirty_pages_pct = 90 innodb_lock_wait_timeout = 10 innodb_autoinc_lock_mode = 2 innodb_doublewrite = 1 innodb_rollback_on_timeout = 1 innodb_force_recovery=0 interactive_timeout = 28800 wait_timeout = 120 [galera] wsrep_provider=/usr/local/mysql/lib/galera/libgalera_smm.so wsrep_cluster_address="gcomm://192.168.40.50,192.168.40.60,192.168.40.70" wsrep_cluster_name='Galera_Cluster' wsrep_node_address='192.168.40.70' wsrep_node_name='MariaDB-Node3' wsrep_slave_threads = 8 #开启并行复制线程,根据CPU核数设置 wsrep_log_conflicts=1 wsrep_on=ON #开启全同步复制模式 wsrep_provider_options="gcache.size=2G" #同步复制缓冲池 wsrep_forced_binlog_format=ROW wsrep_drupal_282555_workaround=1 wsrep_max_ws_size=2147483647 wsrep_sst_method=rsync #拷贝模式 wsrep_sst_auth=galera:galera #galera集群认证用户:密码 query_cache_size=0 #关闭查询缓存 default_storage_engine=InnoDB #Mariadb存储引擎 #wsrep_dirty_reads=0 #当值是1的时候这个节点是只读节点 #wsrep_sst_method=[mysqldump|xtrabackup|xtrabackup-v2|rsync|rsync_wan] #wsrep_sst_method=xtrabackup-v2 [mysqldump] quick max_allowed_packet = 16M [myisamchk] key_buffer_size = 512M sort_buffer_size = 8M read_buffer = 4M write_buffer = 4M [root@MariaDB-Node3 ~]#
有几个参数需要修改的:
server-id = 50 wsrep_node_address='192.168.40.50' wsrep_node_name='MariaDB-Node1'
修改对应的 server ID 、IP地址 和 主机名,其他的复制第一份到别的机器修改就可以
初始化集群
在MariaDB一个节点初始化安装 192.168.40.50 (其他节点不用操作)
[root@MariaDB-Node1 ~]# /usr/local/mysql/scripts/mysql_install_db --defaults-file=/etc/my.cnf --user=mysql
在 192.168.40.50 节点上通过bootstrap启动(第一次启动一定要使用 –wsrep-new-cluster,再次启动就不需要)
[root@MariaDB-Node1 ~]# mysqld_safe --defaults-file=/etc/my.cnf --user=mysql --wsrep-new-cluster &
其他节点操作
初始化节点(192.168.40.50)操作启动之后,别的主机只要正常启动就可以了
192.168.40.60操作:
[root@MariaDB-Node2 ~]# /etc/init.d/mysql start
192.168.40.70操作:
[root@MariaDB-Node3 ~]# /etc/init.d/mysql start
192.168.40.50操作:
初始化节点等待其他节点正常启动之后,再关闭带参数启动数据库,再已正常方式启动
[root@MariaDB-Node1 ~]# ps -ef|grep mysql [root@MariaDB-Node1 ~]# kill 8316 [root@MariaDB-Node1 ~]# ps -ef|grep mysql [root@MariaDB-Node1 ~]# /etc/init.d/mysql start
查看集群状态
[root@MariaDB-Node1 ~]# mysql -uroot -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 17 Server version: 10.3.27-MariaDB-log MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> MariaDB [(none)]> MariaDB [(none)]> SHOW STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.001 sec) MariaDB [(none)]> show global status like 'ws%'; +-------------------------------+----------------------------------------------------------+ | Variable_name | Value | +-------------------------------+----------------------------------------------------------+ | wsrep_applier_thread_count | 8 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 0.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 0.000000 | | wsrep_cert_index_size | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 5 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | a4d25ee0-515c-11eb-b451-66b7a1b050e2 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 0.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0.000308077/0.00394484/0.0110307/0.00501106/3 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_active | false | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_requested | false | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | 4fd60274-523f-11eb-b7dd-476f9c9f0398 | | wsrep_gmcast_segment | 0 | | wsrep_incoming_addresses | 192.168.40.60:3306,192.168.40.70:3306,192.168.40.50:3306 | | wsrep_last_committed | 0 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 18446744073709551615 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 0 | | wsrep_local_index | 2 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | a4d25ee0-515c-11eb-b451-66b7a1b050e2 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.31(r0ede97d) | | wsrep_ready | ON | | wsrep_received | 2 | | wsrep_received_bytes | 306 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 9 | +-------------------------------+----------------------------------------------------------+ 66 rows in set (0.001 sec) MariaDB [(none)]>
可以看到集群正常使用有三个节点
注释:
wsrep_cluster_status为Primary,表示节点为主节点,正常读写
wsrep_ready为ON,表示集群正常运行
wsrep_cluster_size为 3,表示集群有三个节点
遇到问题
如果数据库 全部关机 了,再次启动可以会遇到如下报错
[root@MariaDB-Node1 ~]# cd /data/logs/mysql/ [root@MariaDB-Node1 /data/logs/mysql]# tail -100f mysql_error.log 2021-01-09 13:49:33 0 [Note] WSREP: Read nil XID from storage engines, skipping position init 2021-01-09 13:49:33 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/local/mysql/lib/galera/libgalera_smm.so' 2021-01-09 13:49:33 0 [Note] WSREP: wsrep_load(): Galera 25.3.31(r0ede97d) by Codership Oy <info@codership.com> loaded successfully. 2021-01-09 13:49:33 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration. 2021-01-09 13:49:33 0 [Note] WSREP: Found saved state: a4d25ee0-515c-11eb-b451-66b7a1b050e2:0, safe_to_bootstrap: 0 2021-01-09 13:49:33 0 [Note] WSREP: Passing config to GCS: base_dir = /data/mysql/data/; base_host = 192.168.40.50; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/mysql/data//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 2G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = 2021-01-09 13:49:33 0 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1 2021-01-09 13:49:33 0 [Note] WSREP: wsrep_sst_grab() 2021-01-09 13:49:33 0 [Note] WSREP: Start replication 2021-01-09 13:49:33 0 [Note] WSREP: 'wsrep-new-cluster' option used, bootstrapping the cluster 2021-01-09 13:49:33 0 [Note] WSREP: Setting initial position to a4d25ee0-515c-11eb-b451-66b7a1b050e2:0 2021-01-09 13:49:33 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 . 2021-01-09 13:49:33 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.40.50,192.168.40.60,192.168.40.70) failed: 7 2021-01-09 13:49:33 0 [ERROR] Aborting
解决办法
[root@MariaDB-Node1 /data/logs/mysql]# cd /data/mysql/data/ [root@MariaDB-Node1 /data/mysql/data]# cat grastate.dat # GALERA saved state version: 2.1 uuid: a4d25ee0-515c-11eb-b451-66b7a1b050e2 seqno: -1 safe_to_bootstrap: 0 [root@MariaDB-Node1 /data/mysql/data]# 把 safe_to_bootstrap: 0 修改为 1 [root@MariaDB-Node1 /data/mysql/data]# vim grastate.dat # GALERA saved state version: 2.1 uuid: a4d25ee0-515c-11eb-b451-66b7a1b050e2 seqno: -1 safe_to_bootstrap: 1 [root@MariaDB-Node1 /data/mysql/data]# mysqld_safe --defaults-file=/etc/my.cnf --user=mysql --wsrep-new-cluster &
把 safe_to_bootstrap: 0 修改为 1,然后再用带参数 启动就可以,其他节点使用正常方法启动
mysql galera 集群常见问题处理
一、mysql HA集群在断网过久或者所有节点都down了之后的恢复有以下的方法:
解决方案1:
1、等三台机器恢复网络通讯后,因为此时的mysql已经异常无法加入集群,因此需要先保证所有的mysql都是down的,再上台执行 /usr/local/mysql/bin/mysqld –wsrep-new-cluster –wsrep-cluster-address=’gcomm://’ & 这条命令,并进入mysql(只有一台机器能够成功执行,其他机器执行了过几秒钟都会异常退出这个进程,我们这里把能够成功执行的机器称为master)
2、此时三台只有一台能够成功进入mysql(即执行mysql这条命令),在非master上的两台上一台一台的执行/etc/init.d/mysql start,必须等一台成功了,另一台才能执行
3、在mysql中执行show status like “wsrep%”;
我们需要保证的第一项为synced,以及第二项必须为三个mysql的ip
4、保证3的结果是想要的说明集群已经恢复了,此时需要将master机器上面的 /usr/local/mysql/bin/mysqld –wsrep-new-cluster –wsrep-cluster-address=’gcomm://’这个进程kill掉,然后再执行 /etc/init.d/mysql start 即可
二、mysql HA集群某个节点无故down了并且有一段时间处于down的情况通过以下方式恢复:
1、 若日志里面出现以下日志
[Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (eb9f50c6-bc95-11e5-a735-9f48e437dc03): 1 (Operation not permitted)
解决方法:删除/var/lib/mysql/grastate.dat 文件(若还存在无法同步的情况则删除galera.cache文件)
2、 若那个down了的节点出现以下日志
(异常情况集群挂了)[ERROR] Found 1 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with –tc-heuristic-recover switch to commit or rollback pending transactions
解决方法:
1、/usr/local/mysql/bin/mysqld start –innodb_force_recovery=6
- (SRV_FORCE_IGNORE_CORRUPT):忽略检查到的corrupt页
- (SRV_FORCE_NO_BACKGROUND):阻止主线程的运行,如主线程需要执行full purge操作,会导致crash
- (SRV_FORCE_NO_TRX_UNDO):不执行事务回滚操作
- (SRV_FORCE_NO_IBUF_MERGE):不执行插入缓冲的合并操作
- (SRV_FORCE_NO_UNDO_LOG_SCAN):不查看重做日志,InnoDB存储引擎会将未提交的事务视为已提交
- (SRV_FORCE_NO_LOG_REDO):不执行前滚的操作
如果配置后出现以下情况:
130507 14:14:01 InnoDB: Waiting for the background threads to start
130507 14:14:02 InnoDB: Waiting for the background threads to start
130507 14:14:03 InnoDB: Waiting for the background threads to start
130507 14:14:04 InnoDB: Waiting for the background threads to start
130507 14:14:05 InnoDB: Waiting for the background threads to start
130507 14:14:06 InnoDB: Waiting for the background threads to start
130507 14:14:07 InnoDB: Waiting for the background threads to start
130507 14:14:08 InnoDB: Waiting for the background threads to start
130507 14:14:09 InnoDB: Waiting for the background threads to start
需要在galera.cfg中添加这一下:
如果在设置 innodb_force_recovery >2 的同时innodb_purge_thread = 0
2、mysqld –tc-heuristic-recover=ROLLBACK
3、删除/var/lib/mysql/ib_logfile*
4、当某个mysql节点挂了,并且存在三个mysql所在host有不同的网段,当mysql想重新加入需要一个sst的过程,sst时会需要知道集群中某个节点的ip因此需要制定参数–wsrep-sst-receive-address否则可能出现同步的ip不在三台机器所共有的网段
三、一个mysql节点若down了一段时间。重新启动的时候需要一些时间去同步数据,服务的启动超时时间不够,导致服务无法启动,解决方法如下:
The correct way to adjust systemd settings so they don’t get overwritten is to create a directory and file as such:
/etc/systemd/system/mariadb.service.d/timeout.conf
[Service]
TimeoutStartSec=12min
或者直接修改/usr/lib/systemd/system/mariadb.service
[Service]
TimeoutStartSec=12min
这里的时间最少要大于90s,默认是90s之后执行 systemctl daemon-reload再重启服务即可
四、日志中出现类似如下错误:
160428 13:54:49 [ERROR] Slave SQL: Error ‘Table ‘manage_operations’ already exists’ on query. Default database: ‘horizon’. Query: ‘CREATE TABLE `manage_operations` (
`id` integer AUTO_INCREMENT NOT NULL PRIMARY KEY,
`name` varchar(50) NOT NULL,
`type` varchar(20) NOT NULL,
`operation` varchar(20) NOT NULL,
`status` varchar(20) NOT NULL,
`time` date NOT NULL,
`operator` varchar(50) NOT NULL
) default charset=utf8′, Error_code: 1050
160428 13:54:49 [Warning] WSREP: RBR event 1 Query apply warning: 1, 28585
160428 13:54:49 [Warning] WSREP: Ignoring error for TO isolated action: source: 752eecd1-0ce0-11e6-83fc-3e0502d0bdd2 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 24053 trx_id: -1 seqnos (l: 28668, g: 28585, s: 28584, d: 28584, ts: 80224119986850)
导致进程异常关闭,
此时可以通过执行mysqladmin flush-tables来刷新表项,这个问题的原因是三个节点之间的表同步存在问题,刷新一下表即可
五、日志出现以下错误:
160820 3:13:41 [ERROR] Error in accept: Too many open files
160820 3:19:42 [ERROR] Error in accept: Too many open files
160827 3:16:24 [ERROR] Error in accept: Too many open files
160831 17:20:52 [ERROR] Error in accept: Too many open files
160831 19:54:29 [ERROR] Error in accept: Too many open files
160831 20:21:53 [ERROR] Error in accept: Too many open files
160901 11:25:57 [ERROR] Error in accept: Too many open files
解决方法
vim /usr/lib/systemd/system/mariadb.service
[Service]
LimitNOFILE=10000
默认的mysql的open_file_limits是1024将该项增大,并且修改vim /etc/my.cnf.d/server.cnf该文件的open_files_limit值
systemctl daemon-reload
systemctl restart mysqld
查看mysql的open_file_limits值是否调整成功
cat /proc/$pid/limit
其中$pid为mysql进程的pid看看值是否调整成功,并看看日志是否还会出现上述错误
总结一下:
- 当所有节点都宕机的情况下,再次启动节点,Mariadb-Galera集群会出现无法启动的现象
- Mariadb-Galera集群启动是有顺序的,遵循一个原则:最后宕机的最先启动,因为集群认为这个节点的数据是最新的
- 在生产环境下应该避免使用大事务,不建议在高并发写入场景下使用Galera Cluster架构,会导致集群限流,从而引起整个集群hang住,出现生产故障。针对这种情况可以考虑主从,实现读写分离等手段
- 对数据一致性要求较高,并且数据写入不频繁,数据库容量也不大(50GB左右),网络状况良好的情况下,可以考虑使用Galera方案
本文分享完毕,感谢支持点赞~~