案例:Oracle 10g RAC 集群无法启动
- 2019 年 12 月 16 日
- 笔记
环境:RHEL 5.7 + Oracle 10.2.0.5 RAC
很多年前的一套测试环境,今天发现集群无法启动。手工尝试启动crs,集群日志也无任何输出。进一步检查集群配置:
[oracle@rac1-server rac1-server]$ ocrcheck Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 96144 Used space (kbytes) : 3852 Available space (kbytes) : 92292 ID : 1953645605 Device/File Name : /dev/raw/raw14 Device/File integrity check succeeded Device/File Name : /dev/raw/raw15 Device/File integrity check succeeded Cluster registry integrity check succeeded [oracle@rac1-server rac1-server]$ crsctl query css votedisk 0. 0 jy2 located 1 votedisk(s).
确认Votedisk 存在问题,这个jy2不知道是怎么来的,反正是没有有效的votedisk,根据实际环境,我这里尝试加入合法的votedisk后恢复正常:
[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw11 Cluster is not in a ready state for online disk addition [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw11 -f unrecognized parameter -f. [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw11 -force Now formatting voting disk: /dev/raw/raw11 successful addition of votedisk /dev/raw/raw11. [root@rac1-server ~]# [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw12 -force Now formatting voting disk: /dev/raw/raw12 successful addition of votedisk /dev/raw/raw12. [root@rac1-server ~]# [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw13 -force Now formatting voting disk: /dev/raw/raw13 Write failed: Broken pipe
因为我测试环境是ssh跳转的,会话断开,再次登陆查询:
[oracle@rac1-server ~]$ crsctl query css votedisk 0. 0 /dev/raw/raw13 1. 0 /dev/raw/raw11 2. 0 /dev/raw/raw12 3. 0 /dev/raw/raw13
发现有两个/dev/raw/raw13,尝试删除:
[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl delete css votedisk /dev/raw/raw13 -force successful deletion of votedisk /dev/raw/raw13. [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl query css votedisk 0. 0 /dev/raw/raw11 1. 0 /dev/raw/raw12 2. 0 /dev/raw/raw13 located 3 votedisk(s). [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl delete css votedisk /dev/raw/raw13 -force successful deletion of votedisk /dev/raw/raw13. [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl query css votedisk 0. 0 /dev/raw/raw11 1. 0 /dev/raw/raw12 located 2 votedisk(s). [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw13 -force Now formatting voting disk: /dev/raw/raw13 Write failed: Broken pipe [root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl query css votedisk 0. 0 /dev/raw/raw13 1. 0 /dev/raw/raw11 2. 0 /dev/raw/raw12
不确认这里Write failed: Broken pipe会不会有潜在影响,实际我查询和使用都是正常的。 再次尝试启动crs可以成功。 从集群日志中可以看到正常使用了我们加进去的votedisk:
--节点1集群alert日志: 2019-12-12 13:27:37.806 [cssd(7734)]CRS-1603:CSSD on node rac1-server shutdown by user. 2019-12-12 13:28:15.035 [cssd(13146)]CRS-1605:CSSD voting file is online: /dev/raw/raw13. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/cssd/ocssd.log. 2019-12-12 13:28:15.048 [cssd(13146)]CRS-1605:CSSD voting file is online: /dev/raw/raw11. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/cssd/ocssd.log. 2019-12-12 13:28:15.058 [cssd(13146)]CRS-1605:CSSD voting file is online: /dev/raw/raw12. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/cssd/ocssd.log. 2019-12-12 13:28:22.162 [cssd(13146)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1-server . 2019-12-12 13:28:22.610 [evmd(12526)]CRS-1401:EVMD started on node rac1-server. 2019-12-12 13:28:22.678 [crsd(12662)]CRS-1005:The OCR upgrade was completed. Version has changed from 169870592 to 169870592. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/crsd/crsd.log. 2019-12-12 13:28:22.679 [crsd(12662)]CRS-1012:The OCR service started on node rac1-server. 2019-12-12 13:28:23.757 [crsd(12662)]CRS-1201:CRSD started on node rac1-server. 2019-12-12 13:28:24.172 [crsd(12662)]CRS-1205:Auto-start failed for the CRS resource ora.rac2-server.ASM2.asm. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/crsd/crsd.log. 2019-12-12 13:28:24.199 [crsd(12662)]CRS-1205:Auto-start failed for the CRS resource ora.jy.jy2.inst. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/crsd/crsd.log. 2019-12-12 13:28:36.180 [cssd(13146)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1-server rac2-server . --节点2集群alert日志: 2019-12-12 13:30:23.828 [cssd(6736)]CRS-1605:CSSD voting file is online: /dev/raw/raw13. Details in /s01/oracle/product/10.2.0/crs_1/log/rac2-server/cssd/ocssd.log. 2019-12-12 13:30:23.845 [cssd(6736)]CRS-1605:CSSD voting file is online: /dev/raw/raw11. Details in /s01/oracle/product/10.2.0/crs_1/log/rac2-server/cssd/ocssd.log. 2019-12-12 13:30:23.870 [cssd(6736)]CRS-1605:CSSD voting file is online: /dev/raw/raw12. Details in /s01/oracle/product/10.2.0/crs_1/log/rac2-server/cssd/ocssd.log. 2019-12-12 13:30:24.768 [cssd(6736)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1-server rac2-server . 2019-12-12 13:30:25.463 [crsd(6199)]CRS-1012:The OCR service started on node rac2-server. 2019-12-12 13:30:25.478 [evmd(6116)]CRS-1401:EVMD started on node rac2-server. 2019-12-12 13:30:27.101 [crsd(6199)]CRS-1201:CRSD started on node rac2-server.
最后检查下集群状态确认正常:
[oracle@rac1-server ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora.jy.db application ONLINE ONLINE rac2-server ora....y1.inst application ONLINE ONLINE rac1-server ora....y2.inst application ONLINE ONLINE rac2-server ora....SM1.asm application ONLINE ONLINE rac1-server ora....ER.lsnr application ONLINE ONLINE rac1-server ora....ver.gsd application ONLINE ONLINE rac1-server ora....ver.ons application ONLINE ONLINE rac1-server ora....ver.vip application ONLINE ONLINE rac1-server ora....SM2.asm application ONLINE ONLINE rac2-server ora....ER.lsnr application ONLINE ONLINE rac2-server ora....ver.gsd application ONLINE ONLINE rac2-server ora....ver.ons application ONLINE ONLINE rac2-server ora....ver.vip application ONLINE ONLINE rac2-server [oracle@rac1-server ~]$