在Azure云上实现postgres主备切换

  • 2019 年 10 月 3 日
  • 笔记

  以下是工作上实现postgres主备切换功能所用到的代码和步骤,中间走了不少弯路,在此记录下。所用到的操作系统为centos 7.5,安装了两台服务器,hostname为VM7的为Master,VM8则为Slave。

 

  1、安装pg10

  vm7(Mater),vm8(Slave)均需安装:

[[email protected] ~]# yum install –y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm   [[email protected] ~]# yum install postgresql10 -y  [[email protected]-vm7 ~]# yum install postgresql10-server -y  [[email protected]-vm7 ~]# systemctl enable postgresql-10  [[email protected]-vm7 ~]# /usr/pgsql-10/bin/postgresql-10-setup initdb  [[email protected]-vm7 ~]# systemctl start postgresql-10  [[email protected]-vm7 ~]# systemctl status postgresql-10

 

  其他环境设置

  vm7(Mater),vm8(Slave)均需安装:

# hosts  [[email protected]-vm7 ~]# vi /etc/hosts  ...  10.0.0.14 springcloud-vm7 vm7  10.0.0.15 springcloud-vm8 vm8    # disable selinux  [[email protected]-vm7 ~]# sed -i '7s/=.*$/=disabled/' /etc/selinux/config  [[email protected]-vm7 ~]# setenforce 0    # set timezone  [[email protected]-vm7 ~]# timedatectl set-timezone Asia/Shanghai  # postgres⽤用户环境  [[email protected]-vm7 ~]# su - postgres  -bash-4.2$ vi ~/.bash_profile  。。。  PATH=$PATH:/usr/pgsql-10/bin;export PATH  -bash-4.2$ . ~/.bash_profile    #注意:~/.bash_profile改后为:  PGDATA=/usr/local/pgsql/data  PATH=/usr/local/pgsql/bin:$PATH  export PGDATA PATH

 

  安装nginx充当给azure做探测的服务(probe):

[[email protected] ~]# yum install -y epel-release  [[email protected]-vm7 ~]# yum install -y nginx  [[email protected]-vm7 ~]# vi /etc/nginx/nginx.conf  ...  listen 5999 default_server;   #修改监听端口在5999上

 

  由于Azure上LB的floatingIP限制,它会将流量量导向probe成功的后端机器器,并且只能有⼀一台probe成功。

  参考官⽅方例例⼦子,它是⽤用iptable来阻⽌止probe:https://github.com/Azure/azure-quickstart-templates/tree/master/haproxy-redundant-floatingip-ubuntu

 

  azureLSB设置:

  注意:负载均衡的probe端⼝口设置成5999,floatingIP设置启⽤用

 

  2、配置master

[[email protected] ~]# su – postgres    # 创建复制⽤用户  -bash-4.2$ psql -c "create role repl replication login password 'postgres'"    # 创建⼀一个slot  -bash-4.2$ psql -c "select pg_create_physical_replication_slot('slot_vm7')"    pg_create_physical_replication_slot  -------------------------------------  (slot_vm7,)  (1 row)    # 配置参数  -bash-4.2$ cd $PGDATA  -bash-4.2$ vi postgresql.conf  。。。  listen_addresses = '*'  archive_mode = on  archive_command = 'cp -n %p $PGDATA/arch/%f'  log_timezone = 'Asia/Shanghai'  timezone = 'Asia/Shanghai'  。。。  # 创建arch⽬目录  -bash-4.2$ mkdir $PGDATA/arch/    # 配置pg_hba.conf  -bash-4.2$ vi pg_hba.conf  。。。  host replication repl 10.0.0.0/24 md5    # 重启  -bash-4.2$ pg_ctl restart      注意:如果是首次安装,需要在防火墙中开放5432端口  #查看各端口网络连接情况  [[email protected]-vm7 ~]# netstat –na    #安装iptables防火墙  [[email protected]-vm7 ~]# yum install iptables-services    #编辑iptables防火墙配置  [[email protected]-vm7 ~]# vi /etc/sysconfig/iptables  。。。  -A INPUT -m state --state NEW -m tcp -p tcp --dport 5432 -j ACCEPT

 

  3、配置slave

 [[email protected] ~]# su – postgres    # 直接⽤用repl⽤用户备份到$PGDATA⽬目录  -bash-4.2$ rm -rf /var/lib/pgsql/10/data  -bash-4.2$ /usr/pgsql-10/bin/pg_basebackup -R -Pv -h vm7 -U repl -D $PGDATA  Password: #postgres    pg_basebackup: initiating base backup, waiting for checkpoint to complete  pg_basebackup: checkpoint completed  pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1  pg_basebackup: starting background WAL receiver  24421/24421 kB (100%), 1/1 tablespace  pg_basebackup: write-ahead log end point: 0/20000F8  pg_basebackup: waiting for background process to finish streaming ...  pg_basebackup: base backup completed    # 修改recovery.conf⽂文件  -bash-4.2$ vi $PGDATA/recovery.conf    standby_mode = 'on'  primary_conninfo = 'user=repl password=postgres host=vm7 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'  primary_slot_name = 'slot_vm7'  restore_command = 'cp $PGDATA/arch/%f %p'  archive_cleanup_command = 'pg_archivecleanup $PGDATA/arch %r'  recovery_target_timeline = 'latest'    # 重启  -bash-4.2$ pg_ctl restart

 

  4、检查

#主库上检查下:  -bash-4.2$ psql -xc "select * from pg_stat_replication"  -[ RECORD 1 ]----+------------------------------  pid | 90512  usesysid | 16384  usename | repl  application_name | walreceiver  client_addr | 10.0.0.11  client_hostname |  client_port | 44464  backend_start | 2019-07-09 18:36:25.005202+08  backend_xmin |  state | streaming  sent_lsn | 0/F0004A0  write_lsn | 0/F0004A0  flush_lsn | 0/F0004A0  replay_lsn | 0/F0004A0  write_lag |  flush_lag |  replay_lag |  sync_priority | 0  sync_state | async  -bash-4.2$ pg_controldata  ...  -bash-4.2$ psql -xc "select * from pg_replication_slots"  ...    #被库上检查:  -bash-4.2$ psql -xc "select * from pg_stat_wal_receiver"  ...  -bash-4.2$ /usr/pgsql-10/bin/pg_controldata  ...  -bash-4.2$ psql -xc "select pg_is_in_recovery()"  -[ RECORD 1 ]-----+--  pg_is_in_recovery | t

 

  5、安装keepalived

  vm7(Mater),vm8(Slave)均需安装:

[[email protected] data]# yum install keepalived -y  [[email protected]-vm7 data]# systemctl enable keepalived  [[email protected]-vm7 data]# systemctl start keepalived  [[email protected]-vm7 data]# cd /etc/keepalived/  [[email protected]-vm7 ~]# vi /etc/keepalived/keepalived.conf    ! Configuration File for keepalived  global_defs {  notification_email {  [email protected]om  }  notification_email_from [email protected]  smtp_server 127.0.0.1  smtp_connect_timeout 30  router_id PG_HA     #主备库需要一致  }    vrrp_script chk_pg_alived {  script "/sbin/ss -ntlp4 | grep :5432 > /dev/null" # 探测端⼝口判断数据库存活,1分钟失败则认为失败  interval 10  weight 20  fall 6  }    vrrp_instance VI_1 {  state MASTER # 主库上填MASTER, 备库上为BACKUP  interface eth0  # 填写当前网卡名称,可以用IP Ad命令查看  virtual_router_id 61  priority 100 #备库的优先级设为90  advert_int 3  !nopreempt  preempt_delay 60  unicast_src_ip 10.0.0.14 # 云主机只能使⽤用单播⽅方式,这⾥里里填本机ip  unicast_peer {  10.0.0.15 #另⼀台ip  }    authentication {  auth_type PASS  auth_pass 1111  }    virtual_ipaddress {  139.217.92.247  #虚拟ip  }    track_script {  chk_pg_alived  }    notify_master "/etc/keepalived/master.sh"  notify_backup "/etc/keepalived/backup.sh"  }    # 切换成master时会执⾏行行的脚本,通过判断数据库状态决定是否promote  [[email protected]-vm7 keepalived]# vi master.sh    #!/bin/bash  export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin    # allow probe from azure  systemctl restart nginx  dbstate=`su - postgres -c "psql -Atc 'select pg_is_in_recovery()'"`  if [ $dbstate != "t" ]; then  exit 0  fi    # promote the slave to master  su - postgres -c "/usr/pgsql-10/bin/pg_ctl promote"  sleep 5    echo "select pg_create_physical_replication_slot('slot_vm7')" | su – postgres -c "psql" #注意slot的名字主备库要相应修改⼀⼀对应(Slave中改为slot_vm8)      #保存后修改脚本文件权限  [[email protected]-vm7 keepalived]# chmod 777 master.sh    #切换成slave时会执⾏行行的脚本  [[email protected]-vm7 keepalived]# vi backup.sh    #!/bin/bash  export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin  # block probe from azure  systemctl stop nginx  # check pg state  dbstate=`su - postgres -c "psql -Atc 'select pg_is_in_recovery()'"`  if [ $dbstate = "t" ]; then  exit 0  fi  # change master to slave  if [ ! -f /var/lib/pgsql/10/data/recovery.conf ] ; then  cat > /var/lib/pgsql/10/data/recovery.conf << EOF  standby_mode = 'on'  primary_conninfo = 'user=repl password=postgres host=vm8 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'  #注意slot的名字主备库要相应修改⼀⼀对应(Slave中改为vm7)  primary_slot_name = 'slot_vm8' #注意slot的名字主备库要相应修改⼀⼀对应(Slave中改为slot_vm7)  restore_command = 'cp $PGDATA/arch/%f %p'  archive_cleanup_command = 'pg_archivecleanup $PGDATA/arch %r'  recovery_target_timeline = 'latest'  EOF  fi  sleep 60  su - postgres -c "/usr/pgsql-10/bin/pg_ctl stop"  systemctl start postgresql-10    #注意slot的名字主备库要相应修改⼀⼀对应    #保存后修改脚本文件权限  [[email protected]-vm7 keepalived]# chmod 777 backup.sh

 

  6、测试

  6.1、  关闭Master服务器

 

 

  6.2、 在Slave服务器中监控keepalived:  journalctl -f -u keepalived

 

  6.3、  在Slave服务器中查询postgres日志:tail -fn20 /var/lib/pgsql/10/data/log/postgresql-Tue.log

 

  6.4、 在Slave服务器中监控IP是否发生了漂移: ip a

 

 

  6.5、  在Slave服务器中查看主备状态:  psql -xc “select pg_is_in_recovery()”

  

 

  6.6、在Slave服务器中创建新表:psql -c ‘create table t2 (id integer)’

 

 

  6.7、  Master服务器启动后再看主备状态:psql -xc “select pg_is_in_recovery()”