案例:Oracle 10g RAC 集群无法启动

  • 2019 年 12 月 16 日
  • 笔记

环境:RHEL 5.7 + Oracle 10.2.0.5 RAC

很多年前的一套测试环境,今天发现集群无法启动。手工尝试启动crs,集群日志也无任何输出。进一步检查集群配置:

[oracle@rac1-server rac1-server]$ ocrcheck  Status of Oracle Cluster Registry is as follows :           Version                  :          2           Total space (kbytes)     :      96144           Used space (kbytes)      :       3852           Available space (kbytes) :      92292           ID                       : 1953645605           Device/File Name         : /dev/raw/raw14                                      Device/File integrity check succeeded           Device/File Name         : /dev/raw/raw15                                      Device/File integrity check succeeded             Cluster registry integrity check succeeded    [oracle@rac1-server rac1-server]$ crsctl query css votedisk   0.     0    jy2    located 1 votedisk(s).

确认Votedisk 存在问题,这个jy2不知道是怎么来的,反正是没有有效的votedisk,根据实际环境,我这里尝试加入合法的votedisk后恢复正常:

[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw11  Cluster is not in a ready state for online disk addition  [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw11 -f  unrecognized parameter -f.  [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw11 -force  Now formatting voting disk: /dev/raw/raw11  successful addition of votedisk /dev/raw/raw11.  [root@rac1-server ~]#  [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw12 -force  Now formatting voting disk: /dev/raw/raw12  successful addition of votedisk /dev/raw/raw12.  [root@rac1-server ~]#  [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw13 -force  Now formatting voting disk: /dev/raw/raw13  Write failed: Broken pipe

因为我测试环境是ssh跳转的,会话断开,再次登陆查询:

[oracle@rac1-server ~]$ crsctl query css votedisk   0.     0    /dev/raw/raw13   1.     0    /dev/raw/raw11   2.     0    /dev/raw/raw12   3.     0    /dev/raw/raw13

发现有两个/dev/raw/raw13,尝试删除:

[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl delete css votedisk /dev/raw/raw13 -force  successful deletion of votedisk /dev/raw/raw13.  [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl query css votedisk   0.     0    /dev/raw/raw11   1.     0    /dev/raw/raw12   2.     0    /dev/raw/raw13    located 3 votedisk(s).  [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl delete css votedisk /dev/raw/raw13 -force  successful deletion of votedisk /dev/raw/raw13.  [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl query css votedisk   0.     0    /dev/raw/raw11   1.     0    /dev/raw/raw12    located 2 votedisk(s).  [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw13 -force  Now formatting voting disk: /dev/raw/raw13  Write failed: Broken pipe    [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl query css votedisk   0.     0    /dev/raw/raw13   1.     0    /dev/raw/raw11   2.     0    /dev/raw/raw12

不确认这里Write failed: Broken pipe会不会有潜在影响,实际我查询和使用都是正常的。 再次尝试启动crs可以成功。 从集群日志中可以看到正常使用了我们加进去的votedisk:

--节点1集群alert日志:  2019-12-12 13:27:37.806  [cssd(7734)]CRS-1603:CSSD on node rac1-server shutdown by user.  2019-12-12 13:28:15.035  [cssd(13146)]CRS-1605:CSSD voting file is online: /dev/raw/raw13. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/cssd/ocssd.log.  2019-12-12 13:28:15.048  [cssd(13146)]CRS-1605:CSSD voting file is online: /dev/raw/raw11. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/cssd/ocssd.log.  2019-12-12 13:28:15.058  [cssd(13146)]CRS-1605:CSSD voting file is online: /dev/raw/raw12. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/cssd/ocssd.log.  2019-12-12 13:28:22.162  [cssd(13146)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1-server .  2019-12-12 13:28:22.610  [evmd(12526)]CRS-1401:EVMD started on node rac1-server.  2019-12-12 13:28:22.678  [crsd(12662)]CRS-1005:The OCR upgrade was completed. Version has changed from 169870592 to 169870592. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/crsd/crsd.log.  2019-12-12 13:28:22.679  [crsd(12662)]CRS-1012:The OCR service started on node rac1-server.  2019-12-12 13:28:23.757  [crsd(12662)]CRS-1201:CRSD started on node rac1-server.  2019-12-12 13:28:24.172  [crsd(12662)]CRS-1205:Auto-start failed for the CRS resource ora.rac2-server.ASM2.asm. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/crsd/crsd.log.  2019-12-12 13:28:24.199  [crsd(12662)]CRS-1205:Auto-start failed for the CRS resource ora.jy.jy2.inst. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/crsd/crsd.log.  2019-12-12 13:28:36.180  [cssd(13146)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1-server rac2-server .    --节点2集群alert日志:  2019-12-12 13:30:23.828  [cssd(6736)]CRS-1605:CSSD voting file is online: /dev/raw/raw13. Details in /s01/oracle/product/10.2.0/crs_1/log/rac2-server/cssd/ocssd.log.  2019-12-12 13:30:23.845  [cssd(6736)]CRS-1605:CSSD voting file is online: /dev/raw/raw11. Details in /s01/oracle/product/10.2.0/crs_1/log/rac2-server/cssd/ocssd.log.  2019-12-12 13:30:23.870  [cssd(6736)]CRS-1605:CSSD voting file is online: /dev/raw/raw12. Details in /s01/oracle/product/10.2.0/crs_1/log/rac2-server/cssd/ocssd.log.  2019-12-12 13:30:24.768  [cssd(6736)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1-server rac2-server .  2019-12-12 13:30:25.463  [crsd(6199)]CRS-1012:The OCR service started on node rac2-server.  2019-12-12 13:30:25.478  [evmd(6116)]CRS-1401:EVMD started on node rac2-server.  2019-12-12 13:30:27.101  [crsd(6199)]CRS-1201:CRSD started on node rac2-server.

最后检查下集群状态确认正常:

[oracle@rac1-server ~]$ crs_stat -t  Name           Type           Target    State     Host  ------------------------------------------------------------  ora.jy.db      application    ONLINE    ONLINE    rac2-server  ora....y1.inst application    ONLINE    ONLINE    rac1-server  ora....y2.inst application    ONLINE    ONLINE    rac2-server  ora....SM1.asm application    ONLINE    ONLINE    rac1-server  ora....ER.lsnr application    ONLINE    ONLINE    rac1-server  ora....ver.gsd application    ONLINE    ONLINE    rac1-server  ora....ver.ons application    ONLINE    ONLINE    rac1-server  ora....ver.vip application    ONLINE    ONLINE    rac1-server  ora....SM2.asm application    ONLINE    ONLINE    rac2-server  ora....ER.lsnr application    ONLINE    ONLINE    rac2-server  ora....ver.gsd application    ONLINE    ONLINE    rac2-server  ora....ver.ons application    ONLINE    ONLINE    rac2-server  ora....ver.vip application    ONLINE    ONLINE    rac2-server  [oracle@rac1-server ~]$