OpenStack HA集群3-Pace
節點間主機名必須能解析 [root@controller1 ~]# cat /etc/hosts 192.168.17.149 controller1 192.168.17.141 controller2 192.168.17.166 controller3 192.168.17.111 demo.open-stack.cn 各節點間要互信,無密碼能登錄 [root@controller1 ~]# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 20:79:d4:a4:9f:8b:75:cf:12:58:f4:47:a4:c1:29:f3 root@controller1 The key's randomart p_w_picpath is: +--[ RSA 2048]----+ | .o. ...oo | | o ...o.o+ | | o + .+o . | | o o + E. | | S o | | o o + | | . . . o | | . | | | +-----------------+ [root@controller1 ~]# ssh-copy-id controller2 [root@controller1 ~]# ssh-copy-id controller3 配置YUM源 # vim /etc/yum.repos.d/ha-clustering.repo [network_ha-clustering_Stable] name=Stable High Availability/Clustering packages (CentOS-7) type=rpm-md baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/ gpgcheck=0 gpgkey=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/repodata/repomd.xml.key enabled=1 這個YUM源可能會衝突,先enabled=0,如果剩下一個crmsh包,再enabled=1打開後安裝 Corosync下載地址,目前最新版本2.4.2 http://build.clusterlabs.org/corosync/releases/ http://build.clusterlabs.org/corosync/releases/corosync-2.4.2.tar.gz [root@controller1 ~]# ansible controller -m copy -a "src=/etc/yum.repos.d/ha-cluster.repo dest=/etc/yum.repos.d/" 安裝軟件包 # yum install pacemaker pcs resource-agents -y cifs-utils quota psmisc corosync fence-agents-all lvm2 resource-agents # yum install crmsh -y 啟動pcsd,並確認啟動正常 # systemctl enable pcsd # systemctl enable corosync # systemctl start pcsd # systemctl status pcsd [root@controller2 ~]# pacemakerd -$ Pacemaker 1.1.15-11.el7_3.2 Written by Andrew Beekhof [root@controller1 ~]# ansible controller -m command -a "pacemakerd -$" 修改hacluster密碼 【all】# echo zoomtech | passwd --stdin hacluster [root@controller1 ~]# ansible controller -m command -a "echo zoomtech | passwd --stdin hacluster" # passwd hacluster 編輯corosync.conf [root@controller3 ~]# vim /etc/corosync/corosync.conf totem { version: 2 secauth: off cluster_name: openstack-cluster transport: udpu } nodelist { node { ring0_addr: controller1 nodeid: 1 } node { ring0_addr: controller2 nodeid: 2 } node { ring0_addr: controller3 nodeid: 3 } } logging { to_logfile: yes logfile: /var/log/cluster/corosync.log to_syslog: yes } quorum { provider: corosync_votequorum } [root@controller1 ~]# scp /etc/corosync/corosync.conf controller2:/etc/corosync/ [root@controller1 ~]# scp /etc/corosync/corosync.conf controller3:/etc/corosync/ [root@controller1 corosync]# ansible controller -m copy -a "src=corosync.conf dest=/etc/corosync" 創建集群 使用pcs設置集群身份認證 [root@controller1 ~]# pcs cluster auth controller1 controller2 controller3 -u hacluster -p zoomtech --force controller3: Authorized controller2: Authorized controller1: Authorized 現在我們創建一個集群並添加一些節點。注意,這個名字不能超過15個字符 [root@controller1 ~]# pcs cluster setup --force --name openstack-cluster controller1 controller2 controller3 Destroying cluster on nodes: controller1, controller2, controller3... controller3: Stopping Cluster (pacemaker)... controller2: Stopping Cluster (pacemaker)... controller1: Stopping Cluster (pacemaker)... controller2: Successfully destroyed cluster controller1: Successfully destroyed cluster controller3: Successfully destroyed cluster Sending cluster config files to the nodes... controller1: Succeeded controller2: Succeeded controller3: Succeeded Synchronizing pcsd certificates on nodes controller1, controller2, controller3... controller3: Success controller2: Success controller1: Success Restarting pcsd on the nodes in order to reload the certificates... controller3: Success controller2: Success controller1: Success 啟動集群 [root@controller1 ~]# pcs cluster enable --all controller1: Cluster Enabled controller2: Cluster Enabled controller3: Cluster Enabled [root@controller1 ~]# pcs cluster start --all controller2: Starting Cluster... controller1: Starting Cluster... controller3: Starting Cluster... 查看集群狀態 [root@controller1 corosync]# ansible controller -m command -a "pcs cluster status" [root@controller1 ~]# pcs cluster status Cluster Status: Stack: corosync Current DC: controller3 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum Last updated: Fri Feb 17 10:39:38 2017 Last change: Fri Feb 17 10:39:29 2017 by hacluster via crmd on controller3 3 nodes and 0 resources configured PCSD Status: controller2: Online controller3: Online controller1: Online [root@controller1 corosync]# ansible controller -m command -a "pcs status" [root@controller1 ~]# pcs status Cluster name: openstack-cluster Stack: corosync Current DC: controller2 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum Last updated: Thu Mar 2 17:07:34 2017 Last change: Thu Mar 2 01:44:44 2017 by root via cibadmin on controller1 3 nodes and 1 resource configured Online: [ controller1 controller2 controller3 ] Full list of resources: vip (ocf::heartbeat:IPaddr2): Started controller2 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled 查看集群狀態 [root@controller1 corosync]# ansible controller -m command -a "crm_mon -1" [root@controller1 ~]# crm_mon -1 Stack: corosync Current DC: controller2 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum Last updated: Wed Mar 1 17:54:04 2017 Last change: Wed Mar 1 17:44:38 2017 by root via cibadmin on controller1 3 nodes and 1 resource configured Online: [ controller1 controller2 controller3 ] Active resources: vip (ocf::heartbeat:IPaddr2): Started controller1 查看pacemaker進程狀態 [root@controller1 ~]# ps aux | grep pacemaker root 75900 0.2 0.5 132632 9216 ? Ss 10:39 0:00 /usr/sbin/pacemaked -f haclust+ 75901 0.3 0.8 135268 15376 ? Ss 10:39 0:00 /usr/libexec/pacemaker/cib root 75902 0.1 0.4 135608 7920 ? Ss 10:39 0:00 /usr/libexec/pacemaker/stonithd root 75903 0.0 0.2 105092 5020 ? Ss 10:39 0:00 /usr/libexec/pacemaker/lrmd haclust+ 75904 0.0 0.4 126924 7636 ? Ss 10:39 0:00 /usr/libexec/pacemaker/attrd haclust+ 75905 0.0 0.2 117040 4560 ? Ss 10:39 0:00 /usr/libexec/pacemaker/pengine haclust+ 75906 0.1 0.5 145328 8988 ? Ss 10:39 0:00 /usr/libexec/pacemaker/crmd root 75997 0.0 0.0 112648 948 pts/0 R+ 10:40 0:00 grep --color=auto pacemaker 查看集群狀態 [root@controller1 ~]# corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 192.168.17.132 status = ring 0 active with no faults [root@controller2 corosync]# corosync-cfgtool -s Printing ring status. Local node ID 2 RING ID 0 id = 192.168.17.146 status = ring 0 active with no faults [root@controller3 ~]# corosync-cfgtool -s Printing ring status. Local node ID 3 RING ID 0 id = 192.168.17.138 status = ring 0 active with no faults [root@controller1 ~]# corosync-cmapctl | grep members runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.17.132) runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.1.status (str) = joined runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.17.146) runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.2.status (str) = joined runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(192.168.17.138) runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.3.status (str) = joined 查看集群狀態 [root@controller1 ~]# pcs status corosync Membership information ---------------------- Nodeid Votes Name 1 1 controller1 (local) 3 1 controller3 2 1 controller2 [root@controller2 corosync]# pcs status corosync Membership information ---------------------- Nodeid Votes Name 1 1 controller1 3 1 controller3 2 1 controller2 (local) [root@controller3 ~]# pcs status corosync Membership information ---------------------- Nodeid Votes Name 1 1 controller1 3 1 controller3 (local) 2 1 controller2 [root@controller1 ~]# crm_verify -L -V error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity Errors found during check: config not valid [root@controller1 ~]# [root@controller1 ~]# pcs property set stonith-enabled=false [root@controller1 ~]# pcs property set no-quorum-policy=ignore [root@controller1 ~]# crm_verify -L -V [root@controller1 corosync]# ansible controller -m command -a "pcs property set stonith-enabled=false [root@controller1 corosync]# ansible controller -m command -a "pcs property set no-quorum-policy=ignore" [root@controller1 corosync]# ansible controller -m command -a "crm_verify -L -V" 配置 VIP [root@controller1 ~]# crm crm(live)# configure crm(live)configure# show node 1: controller1 node 2: controller2 node 3: controller3 property cib-bootstrap-options: have-watchdog=false dc-version=1.1.15-11.el7_3.2-e174ec8 cluster-infrastructure=corosync cluster-name=openstack-cluster stonith-enabled=false no-quorum-policy=ignore crm(live)configure# primitive vip ocf:heartbeat:IPaddr2 params ip=192.168.17.111 cidr_netmask=24 nic=ens37 op start interval=0s timeout=20s op stop interval=0s timeout=20s monitor interval=30s meta priority=100 crm(live)configure# show node 1: controller1 node 2: controller2 node 3: controller3 primitive vip IPaddr2 params ip=192.168.17.111 cidr_netmask=24 nic=ens37 op start interval=0s timeout=20s op stop interval=30s timeout=20s monitor meta priority=100 property cib-bootstrap-options: have-watchdog=false dc-version=1.1.15-11.el7_3.2-e174ec8 cluster-infrastructure=corosync cluster-name=openstack-cluster stonith-enabled=false no-quorum-policy=ignore crm(live)configure# commit crm(live)configure# exit 查看VIP已綁定在ens37網卡上 [root@controller1 ~]# ip a 4: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:ff:8b:4b brd ff:ff:ff:ff:ff:ff inet 192.168.17.141/24 brd 192.168.17.255 scope global dynamic ens37 valid_lft 2388741sec preferred_lft 2388741sec inet 192.168.17.111/24 brd 192.168.17.255 scope global secondary ens37 valid_lft forever preferred_lft forever 上面指定的網卡名稱3個節點必須是同一個名稱,否則飄移會出現問題,切換不過去 [root@controller1 ~]# crm status Stack: corosync Current DC: controller1 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum Last updated: Wed Feb 22 11:42:07 2017 Last change: Wed Feb 22 11:22:56 2017 by root via cibadmin on controller1 3 nodes and 1 resource configured Online: [ controller1 controller2 controller3 ] Full list of resources: vip (ocf::heartbeat:IPaddr2): Started controller1 查看corosync引擎是否正常啟動 [root@controller1 ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log [51405] controller1 corosyncnotice [MAIN ] Corosync Cluster Engine ('2.4.0'): started and ready to provide service. Mar 01 17:35:20 [51425] controller1 cib: info: retrieveCib: Reading cluster configuration file /var/lib/pacemaker/cib/cib.xml (digest: /var/lib/pacemaker/cib/cib.xml.sig) Mar 01 17:35:20 [51425] controller1 cib: warning: cib_file_read_and_verify: Could not verify cluster configuration file /var/lib/pacemaker/cib/cib.xml: No such file or directory (2) Mar 01 17:35:20 [51425] controller1 cib: warning: cib_file_read_and_verify: Could not verify cluster configuration file /var/lib/pacemaker/cib/cib.xml: No such file or directory (2) Mar 01 17:35:20 [51425] controller1 cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.Apziws (digest: /var/lib/pacemaker/cib/cib.0ZxsVW) Mar 01 17:35:21 [51425] controller1 cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.ObYehI (digest: /var/lib/pacemaker/cib/cib.O8Rntg) Mar 01 17:35:42 [51425] controller1 cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.eqrhsF (digest: /var/lib/pacemaker/cib/cib.6BCfNj) Mar 01 17:35:42 [51425] controller1 cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.riot2E (digest: /var/lib/pacemaker/cib/cib.SAqtzj) Mar 01 17:35:42 [51425] controller1 cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.Q8H9BL (digest: /var/lib/pacemaker/cib/cib.MBljlq) Mar 01 17:38:29 [51425] controller1 cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.OTIiU4 (digest: /var/lib/pacemaker/cib/cib.JnHr1v) Mar 01 17:38:36 [51425] controller1 cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.2cK9Yk (digest: /var/lib/pacemaker/cib/cib.JSqEH8) Mar 01 17:44:38 [51425] controller1 cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.aPFtr3 (digest: /var/lib/pacemaker/cib/cib.E3Ve7X) [root@controller1 ~]# 查看初始化成員節點通知是否正常發出 [root@controller1 ~]# grep TOTEM /var/log/cluster/corosync.log [51405] controller1 corosyncnotice [TOTEM ] Initializing transport (UDP/IP Unicast). [51405] controller1 corosyncnotice [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none [51405] controller1 corosyncnotice [TOTEM ] The network interface [192.168.17.149] is now up. [51405] controller1 corosyncnotice [TOTEM ] adding new UDPU member {192.168.17.149} [51405] controller1 corosyncnotice [TOTEM ] adding new UDPU member {192.168.17.141} [51405] controller1 corosyncnotice [TOTEM ] adding new UDPU member {192.168.17.166} [51405] controller1 corosyncnotice [TOTEM ] A new membership (192.168.17.149:4) was formed. Members joined: 1 [51405] controller1 corosyncnotice [TOTEM ] A new membership (192.168.17.141:12) was formed. Members joined: 2 3 檢查啟動過程中是否有錯誤產生 [root@controller1 ~]# grep ERROR: /var/log/cluster/corosync.log