Oracle集群(RAC)的時間同步有哪幾種方式?
- 2019 年 10 月 10 日
- 筆記
今天小麥苗給大家分享的是Oracle集群(RAC)的時間同步的2種方式,NTP和CTSS。
【優化】COUNT(1)、COUNT(*)
答案:可以採用作業系統的NTP服務,也可以使用Oracle自帶的服務ctss,如果ntp沒有啟用,那麼Oracle會自動啟用自己的ctssd進程。
從oracle 11gR2 RAC開始使用Cluster Time Synchronization Service(CTSS)同步各節點的時間,當安裝程式發現NTP協議處於非活動狀態時,安裝集群時間同步服務將以活動模式(active)自動進行安裝並同步所有節點的時間。如果發現配置了 NTP,則以觀察者模式(observer mode)啟動集群時間同步服務,Oracle Clusterware不會在集群中進行活動的時間同步。
在RAC中,集群的時間應該是保持同步的,否則可能導致很多問題,例如:依賴於時間的應用會造成數據的錯誤,各種日誌列印的順序紊亂,這將會影響問題的診斷,嚴重的可能會導致集群宕機或者重新啟動集群時節點無法加入集群。
在Oracle 11gR2前,集群的時間是由NTP同步的,而在11gR2後,Oracle引入了CTSS組件,如果系統沒有配置NTP,則由CTSS來同步集群時間。
NTP和CTSS是可以共存的,且NTP的優先順序要高於CTSS,也就是說,如果系統中同時有NTP和CTSS,那麼集群的時間是由NTP同步的,CTSS會處於觀望(Observer)模式,只有當集群關閉所有的NTP服務,CTSS才會處於激活(Active)模式。在一個集群中,只要有一個節點的ntp處於活動狀態,那麼集群的所有節點的CTSS都會處於激活(Active)模式。
需要注意的是,要讓CTSS處於激活(Active)模式,則不僅要關閉ntp服務(/sbin/service ntpd stop),還要刪除/etc/ntp.conf文件(mv /etc/ntp.conf /etc/ntp.conf.bak),否則不能啟用CTSS。
CTSS同步模式
關閉NTP:
/sbin/service ntpd stop
mv /etc/ntp.conf /etc/ntp.conf.bak
service ntpd status
chkconfig ntpd off
[root@raclhr-11gR2-N2 ~]# ps -ef|grep ctss
root 19678 1 0 19:22 ? 00:00:02 /u01/app/11.2.0/grid/bin/octssd.bin reboot
root 20970 20623 0 19:35 pts/4 00:00:00 grep ctss
[root@raclhr-11gR2-N2 ~]#
[root@raclhr-11gR2-N2 ~]# crsctl stat res -t -init
——————————————————————————–
NAME TARGET STATE SERVER STATE_DETAILS
——————————————————————————–
Cluster Resources
——————————————————————————–
ora.asm
1 ONLINE ONLINE raclhr-11gr2-n2 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE raclhr-11gr2-n2
ora.crf
1 ONLINE ONLINE raclhr-11gr2-n2
ora.crsd
1 ONLINE ONLINE raclhr-11gr2-n2
ora.cssd
1 ONLINE ONLINE raclhr-11gr2-n2
ora.cssdmonitor
1 ONLINE ONLINE raclhr-11gr2-n2
ora.ctssd
1 ONLINE ONLINE raclhr-11gr2-n2 ACTIVE:0
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE raclhr-11gr2-n2
ora.gipcd
1 ONLINE ONLINE raclhr-11gr2-n2
ora.gpnpd
1 ONLINE ONLINE raclhr-11gr2-n2
ora.mdnsd
1 ONLINE ONLINE raclhr-11gr2-n2
[root@raclhr-11gR2-N2 ~]#
節點1的ctss狀態:
[root@raclhr-11gR2-N1 ~]# crsctl check ctss
CRS-4701: The Cluster Time Synchronization Service is in Active mode.
CRS-4702: Offset (in msec): 0
[root@raclhr-11gR2-N1 ~]#
節點1的octssd的日誌:
/u01/app/11.2.0/grid/log/raclhr-11gr2-n1/ctssd/octssd.log
2018-06-30 19:25:56.369: [ CTSS][899475200]sclsctss_gvss2: NTP default pid file not found
2018-06-30 19:25:56.369: [ CTSS][899475200]sclsctss_gvss8: Return [0] and NTP status [1].
2018-06-30 19:25:56.369: [ CTSS][899475200]ctss_check_vendor_sw: Vendor time sync software is not detected. status [1].
2018-06-30 19:25:57.002: [ CTSS][916338432]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xcc], offset[0 ms]}, length=[8].
2018-06-30 19:26:01.263: [ CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].
2018-06-30 19:26:01.264: [ CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg
2018-06-30 19:26:01.264: [ CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc4] nodenum [2] hostname [raclhr-11gr2-n2] )
2018-06-30 19:26:09.267: [ CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].
節點1的octssd.log中記錄沒有發現ntp服務,ctss服務為激活模式。
節點2的ctss狀態:
[root@raclhr-11gR2-N2 ~]# crsctl check ctss
CRS-4701: The Cluster Time Synchronization Service is in Active mode.
CRS-4702: Offset (in msec): 0
[root@raclhr-11gR2-N2 ~]#
節點2的octssd的日誌:
/u01/app/11.2.0/grid/log/raclhr-11gr2-n2/ctssd/octssd.log
2018-06-30 19:28:49.539: [ CTSS][839321344]sclsctss_gvss2: NTP default pid file not found
2018-06-30 19:28:49.539: [ CTSS][839321344]sclsctss_gvss8: Return [0] and NTP status [1].
2018-06-30 19:28:49.539: [ CTSS][839321344]ctss_check_vendor_sw: Vendor time sync software is not detected. status [1].
2018-06-30 19:29:05.544: [ CTSS][839321344]ctsselect_msm: CTSS mode is [0xc4]
2018-06-30 19:29:05.544: [ CTSS][839321344]ctssslave_swm1_2: Ready to initiate new time sync process.
2018-06-30 19:29:05.545: [ CTSS][839321344]ctssslave_swm2_1: Waiting for time sync message from master. sync_state[2].
2018-06-30 19:29:05.546: [ CTSS][845625088]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].
2018-06-30 19:29:05.546: [ CTSS][845625088]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].
2018-06-30 19:29:05.547: [ CTSS][839321344]ctssslave_swm2_3: Received time sync message from master.
2018-06-30 19:29:05.547: [ CTSS][839321344]ctssslave_swm: The system time difference is too small [243] usec. Not adjusting time.
2018-06-30 19:29:05.547: [ CTSS][839321344]ctssslave_swm17: LT [1530358145sec 546888usec], MT [1530358145sec 140655884523349usec], Delta [2314usec]
2018-06-30 19:29:05.547: [ CTSS][839321344]ctssslave_swm19: The offset is [243 usec] and sync interval set to [1]
2018-06-30 19:29:05.547: [ CTSS][839321344]ctssslave_swm: Received from master (mode [0xcc] nodenum [1] hostname [raclhr-11gr2-n1] )
2018-06-30 19:29:05.547: [ CTSS][839321344]ctsselect_msm: Sync interval returned in [1]
2018-06-30 19:29:05.547: [ CTSS][845625088]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler
2018-06-30 19:29:07.910: [ CTSS][860387072]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xc4], offset[0 ms]}, length=[8].
節點2的octssd.log中記錄沒有發現ntp服務,ctss服務為激活模式,同步時間的主節點是節點1,並且會告訴集群的時間有差異,但是因為差異過小,無需調整。
校驗集群的時間:
cluvfy comp clocksync -n all -verbose
雖然集群時間不一致,但是這種情況下校驗結果是通過的,而且略微的差異範圍內集群也會自動同步回來。
[grid@raclhr-11gR2-N1 ~]$ cluvfy comp clocksync -n all -verbose
Verifying Clock Synchronization across the cluster nodes
Checking if Clusterware is installed on all nodes…
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes…
Check: CTSS Resource running on all nodes
Node Name Status
———————————— ————————
raclhr-11gr2-n2 passed
raclhr-11gr2-n1 passed
Result: CTSS resource check passed
Querying CTSS for time offset on all nodes…
Result: Query of CTSS for time offset passed
Check CTSS state started…
Check: CTSS state
Node Name State
———————————— ————————
raclhr-11gr2-n2 Active
raclhr-11gr2-n1 Active
CTSS is in Active state. Proceeding with check of clock time offsets on all nodes…
Reference Time Offset Limit: 1000.0 msecs
Check: Reference Time Offset
Node Name Time Offset Status
———— ———————— ————————
raclhr-11gr2-n2 0.0 passed
raclhr-11gr2-n1 0.0 passed
Time offset is within the specified limits on the following set of nodes:
"[raclhr-11gr2-n2, raclhr-11gr2-n1]"
Result: Check of clock time offsets passed
Oracle Cluster Time Synchronization Services check passed
Verification of Clock Synchronization across the cluster nodes was successful.
NTP同步模式
開啟NTP:
mv /etc/ntp.conf.bak /etc/ntp.conf
service ntpd status
/sbin/service ntpd start
# chkconfig ntpd off
ps -ef|grep ntp
節點1 :
[root@raclhr-11gR2-N1 ~]# crsctl check ctss
CRS-4700: The Cluster Time Synchronization Service is in Observer mode.
[root@raclhr-11gR2-N1 ~]# crsctl stat res -t -init
ora.ctssd
1 ONLINE ONLINE raclhr-11gr2-n1 OBSERVER
節點1的ctss日誌:
/u01/app/11.2.0/grid/log/raclhr-11gr2-n1/ctssd/octssd.log
2018-06-30 20:51:29.388: [ CTSS][899475200]sclsctss_gvss1: NTP default config file found
2018-06-30 20:51:29.389: [ CTSS][899475200]sclsctss_gvss8: Return [0] and NTP status [2].
2018-06-30 20:51:29.389: [ CTSS][899475200]ctss_check_vendor_sw: Vendor time sync software is detected. status [2].
2018-06-30 20:51:29.389: [ CTSS][899475200]ctss_check_vendor_sw: Ctssd is switching to observer role
2018-06-30 20:51:29.389: [ CTSS][899475200]clsctsselect_update_mbrdata: Updating pridata: { version[1] node[1] swversion[186647296] mode[0xee] }.
2018-06-30 20:51:29.639: [ CRSCCL][671086336]clsCclGetPriMemberData: Detected pridata change for node[1]. Retrieving it to the cache.
2018-06-30 20:51:31.434: [ CTSS][916338432]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xee], offset[0 ms]}, length=[8].
2018-06-30 20:51:35.258: [ CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].
2018-06-30 20:51:35.258: [ CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg
2018-06-30 20:51:35.259: [ CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc4] nodenum [2] hostname [raclhr-11gr2-n2] )
2018-06-30 20:51:35.656: [ CRSCCL][671086336]clsCclGetPriMemberData: Detected pridata change for node[2]. Retrieving it to the cache.
2018-06-30 20:51:43.240: [ CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].
2018-06-30 20:51:43.240: [ CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg
2018-06-30 20:51:43.240: [ CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc6] nodenum [2] hostname [raclhr-11gr2-n2] )
2018-06-30 20:51:51.217: [ CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].
2018-06-30 20:51:51.217: [ CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg
2018-06-30 20:51:51.218: [ CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc6] nodenum [2] hostname [raclhr-11gr2-n2] )
2018-06-30 20:51:59.194: [ CTSS][901576448]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [1].
2018-06-30 20:51:59.194: [ CTSS][901576448]ctsscomm_msg_hndlr: Received sync msg
2018-06-30 20:51:59.195: [ CTSS][901576448]ctsscomm_msg_hndlr: Received from slave ( mode [0xc6] nodenum [2] hostname [raclhr-11gr2-n2] )
節點1的octssd.log中記錄發現ntp服務,ctss服務會自動切換到觀望模式。
2018-06-30 20:57:27.608: [ CTSS][839321344]ctsselect_msm: CTSS mode is [0xc6]
2018-06-30 20:57:27.608: [ CTSS][839321344]ctssslave_swm1_2: Ready to initiate new time sync process.
2018-06-30 20:57:27.609: [ CTSS][839321344]ctssslave_swm2_1: Waiting for time sync message from master. sync_state[2].
2018-06-30 20:57:27.612: [ CTSS][845625088]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].
2018-06-30 20:57:27.613: [ CTSS][845625088]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].
2018-06-30 20:57:27.613: [ CTSS][839321344]ctssslave_swm2_3: Received time sync message from master.
2018-06-30 20:57:27.613: [ CTSS][839321344]ctssslave_swm17: LT [1530363447sec 613028usec], MT [1530363447sec 140655884569984usec], Delta [4410usec]
2018-06-30 20:57:27.613: [ CTSS][839321344]ctssslave_swm19: The offset is [19748 usec] and sync interval set to [1]
2018-06-30 20:57:27.613: [ CTSS][839321344]ctssslave_swm: Received from master (mode [0xee] nodenum [1] hostname [raclhr-11gr2-n1] )
2018-06-30 20:57:27.613: [ CTSS][839321344]ctsselect_msm: Sync interval returned in [1]
2018-06-30 20:57:27.613: [ CTSS][845625088]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler
節點2的octssd.log中也會記錄發現ntp服務,ctss服務為觀望模式,並且同步時間的主節點是節點1。
模擬集群時間不一致
如果在我們生產系統中碰到集群時間不一致會導致什麼結果,我們的排查思路是怎麼樣的,以下是模擬集群時間不一致的場景。
更改節點2的時間,向後推移2天:
將系統時間設定成2018年07月02日的命令如下:
#date -s 07/02/2018
將系統時間設定成下午23點23分06秒的命令如下。
#date -s 23:23:06
[root@raclhr-11gR2-N2 ctssd]# crsctl stat res -t -init
ora.ctssd
1 ONLINE ONLINE raclhr-11gr2-n2 ACTIVE:172768000
[root@raclhr-11gR2-N2 ctssd]# crsctl check ctss
CRS-4701: The Cluster Time Synchronization Service is in Active mode.
CRS-4702: Offset (in msec): 172768000
172768000微妙大約為2天:
SYS@lhrrac11> select 172768000/1000/24/60/60 from dual;
172768000/1000/24/60/60
———————–
1.99962963
更改節點2的時間後,在ASM和DB的alert日誌中產生了以下的告警資訊:
Time drift detected. Please check VKTM trace file for more details.
drift表示漂移。
[grid@raclhr-11gR2-N2 trace]$ pwd
/u01/app/grid/diag/asm/+asm/+ASM2/trace
[grid@raclhr-11gR2-N2 trace]$ ll -lrt *vktm*
-rw-r—– 1 grid oinstall 136 May 17 14:09 +ASM2_vktm_29999.trm
-rw-r—– 1 grid oinstall 1847 May 17 14:09 +ASM2_vktm_29999.trc
-rw-r—– 1 grid oinstall 529 Jun 4 14:52 +ASM2_vktm_32504.trm
-rw-r—– 1 grid oinstall 7238 Jun 4 14:52 +ASM2_vktm_32504.trc
-rw-r—– 1 grid oinstall 78 Jun 4 14:59 +ASM2_vktm_14800.trm
-rw-r—– 1 grid oinstall 1079 Jun 4 14:59 +ASM2_vktm_14800.trc
-rw-r—– 1 grid oinstall 90 Jun 4 17:26 +ASM2_vktm_14991.trm
-rw-r—– 1 grid oinstall 1200 Jun 4 17:26 +ASM2_vktm_14991.trc
-rw-r—– 1 grid oinstall 89 Jun 29 10:05 +ASM2_vktm_17961.trm
-rw-r—– 1 grid oinstall 1200 Jun 29 10:05 +ASM2_vktm_17961.trc
-rw-r—– 1 grid oinstall 191 Jul 2 21:35 +ASM2_vktm_19774.trm
-rw-r—– 1 grid oinstall 3171 Jul 2 21:35 +ASM2_vktm_19774.trc
[grid@raclhr-11gR2-N2 trace]$ cat +ASM2_vktm_19774.trc
*** 2018-06-30 19:22:12.650
VKTM running at (1)millisec precision with DBRM quantum (100)ms
[Start] HighResTick = 1530357732650537
kstmrmtickcnt = 0 : ksudbrmseccnt[0] = 1530357732
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time stalled at 1530363888044519
*** 2018-06-10 20:04:00.000
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by
(172844812599)usec at (1528632240000738) whereas (1000000) is allowed
usec代表微秒,ms表示毫秒,1s=1000ms=1000000us
VKTM進程發現系統時間變了,alert日誌會產生相應的告警資訊,從產生的trace文件中可知,系統向前推進了172844812599微秒,也即為2天,也就是我們模擬更改的時間,而允許的差異範圍為1秒。
SYS@lhrrac11> select 172844812599/1000/1000/24/60/60 from dual;
172844812599/1000/1000/24/60/60
——————————-
2.00051866
節點2的octssd.log中和ctss狀態都記錄了偏移的時間:
2018-07-02 21:54:39.330: [ CTSS][1400497920]ctsselect_msm: CTSS mode is [0x84]
2018-07-02 21:54:39.330: [ CTSS][1400497920]ctssslave_swm1_2: Ready to initiate new time sync process.
2018-07-02 21:54:39.330: [ CTSS][1400497920]ctssslave_swm2_1: Waiting for time sync message from master. sync_state[2].
2018-07-02 21:54:39.331: [ CTSS][1404700416]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].
2018-07-02 21:54:39.331: [ CTSS][1404700416]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].
2018-07-02 21:54:39.331: [ CTSS][1400497920]ctssslave_swm2_3: Received time sync message from master.
2018-07-02 21:54:39.331: [ CTSS][1400497920]ctssslave_swm: The magnitude [172757997797] of the offset [172757997797 usec] is larger than [86400000000 usec] sec which is the CTSS limit.
2018-07-02 21:54:39.331: [ CTSS][1400497920]ctssslave_swm: The magnitude of the systime diff is larger than max adjtime limit. Offset [172757997797] usec will be changed to max adjtime limit [+/- 131071].
2018-07-02 21:54:39.331: [ CTSS][1400497920]ctssslave_swm15: The CTSS master is behind this node. The local time offset [-131071 usec] is being adjusted. Sync method [2]
2018-07-02 21:54:39.331: [ CTSS][1400497920]ctssslave_swm17: LT [1530539679sec 331583usec], MT [1530366921sec 139882790197210usec], Delta [1267usec]
2018-07-02 21:54:39.331: [ CTSS][1400497920]ctssslave_swm19: The offset is [131071 usec] and sync interval set to [4]
2018-07-02 21:54:39.331: [ CTSS][1400497920]ctssslave_swm: Received from master (mode [0x8c] nodenum [1] hostname [raclhr-11gr2-n1] )
2018-07-02 21:54:39.331: [ CTSS][1400497920]ctsselect_msm: Sync interval returned in [4]
2018-07-02 21:54:39.331: [ CTSS][1404700416]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler
集群的時間同步校驗也是失敗的,校驗結果是需要同步節點2的時間,此時因為集群時間差異較大,同步服務往往是無法做到的,只有手工同步才能修復。
校驗集群的時間同步:
[grid@raclhr-11gR2-N2 ~]$ cluvfy comp clocksync -n all -verbose
Verifying Clock Synchronization across the cluster nodes
Checking if Clusterware is installed on all nodes…
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes…
Check: CTSS Resource running on all nodes
Node Name Status
———————————— ————————
raclhr-11gr2-n2 passed
raclhr-11gr2-n1 passed
Result: CTSS resource check passed
Querying CTSS for time offset on all nodes…
Result: Query of CTSS for time offset passed
Check CTSS state started…
Check: CTSS state
Node Name State
———————————— ————————
raclhr-11gr2-n2 Active
raclhr-11gr2-n1 Active
CTSS is in Active state. Proceeding with check of clock time offsets on all nodes…
Reference Time Offset Limit: 1000.0 msecs
Check: Reference Time Offset
Node Name Time Offset Status
———— ———————— ————————
raclhr-11gr2-n2 1.727568E8 failed
raclhr-11gr2-n1 0.0 passed
Result: PRVF-9661 : Time offset is greater than acceptable limit on node "raclhr-11gr2-n2" [actual = "1.727568E8", acceptable = "1000.0" ]
PRVF-9652 : Cluster Time Synchronization Services check failed
Verification of Clock Synchronization across the cluster nodes was unsuccessful.
Checks did not pass for the following node(s):
raclhr-11gr2-n2
1.727568E8表示科學計數法,為1.7*10的8次方,即172756800ms,即2天。
在沒有同步時間之前,重啟節點2是無法正常啟動的,從以下命令可知是在ctss這一步有問題,通過重新更改正確時間後,集群才能正常啟動。
[root@raclhr-11gR2-N2 ~]# crsctl stat res -t -init
——————————————————————————–
NAME TARGET STATE SERVER STATE_DETAILS
——————————————————————————–
Cluster Resources
——————————————————————————–
ora.asm
1 ONLINE OFFLINE Instance Shutdown
ora.cluster_interconnect.haip
1 ONLINE ONLINE raclhr-11gr2-n2
ora.crf
1 ONLINE ONLINE raclhr-11gr2-n2
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE ONLINE raclhr-11gr2-n2
ora.cssdmonitor
1 ONLINE ONLINE raclhr-11gr2-n2
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE raclhr-11gr2-n2
ora.gpnpd
1 ONLINE ONLINE raclhr-11gr2-n2
ora.mdnsd
1 ONLINE ONLINE raclhr-11gr2-n2
查看集群的告警日誌:
/u01/app/11.2.0/grid/log/raclhr-11gr2-n2/alertraclhr-11gr2-n2.log
2018-07-02 22:05:36.344
[ctssd(30350)]CRS-2405:The Cluster Time Synchronization Service on host raclhr-11gr2-n2 is shutdown by user
2018-07-02 22:05:40.689
[ctssd(30358)]CRS-2407:The new Cluster Time Synchronization Service reference node is host raclhr-11gr2-n1.
2018-07-02 22:05:40.689
[ctssd(30358)]CRS-2401:The Cluster Time Synchronization Service started on host raclhr-11gr2-n2.
2018-07-02 22:05:42.704
[ctssd(30358)]CRS-2404:The Cluster Time Synchronization Service detects that the local time is significantly different from the mean cluster time. Details in /u01/app/11.2.0/grid/log/raclhr-11gr2-n2/ctssd/octssd.log.
2018-07-02 22:05:43.395
[ctssd(30358)]CRS-2402:The Cluster Time Synchronization Service aborted on host raclhr-11gr2-n2. Details at in /u01/app/11.2.0/grid/log/raclhr-11gr2-n2/ctssd/octssd.log.
2018-07-02 22:05:44.404
[ohasd(29989)]CRS-2807:Resource 'ora.asm' failed to start automatically.
2018-07-02 22:05:44.405
[ohasd(29989)]CRS-2807:Resource 'ora.crsd' failed to start automatically.
2018-07-02 22:05:44.405
[ohasd(29989)]CRS-2807:Resource 'ora.ctssd' failed to start automatically.
2018-07-02 22:05:44.405
[ohasd(29989)]CRS-2807:Resource 'ora.evmd' failed to start automatically.
查看octssd.log
2018-07-02 22:05:42.702: [ CTSS][1805252352]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [3].
2018-07-02 22:05:42.702: [ CTSS][1805252352]ctsscomm_recv_cb4_2: Receive active version change msg. Old active version [186647296] New active version [186647296].
2018-07-02 22:05:42.702: [ CTSS][1805252352]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].
2018-07-02 22:05:42.702: [ CTSS][1805252352]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].
2018-07-02 22:05:42.703: [ CTSS][1798948608]ctssslave_swm2_3: Received time sync message from master.
2018-07-02 22:05:42.703: [ CTSS][1798948608]ctssslave_swm: sendtime{sec[1530540340], usec[690191]}, receivetime{sec[1530540342], usec[702977]}.
2018-07-02 22:05:42.703: [ CTSS][1798948608]ctssslave_swm: The RTT of sync msg [2012786] is too large for time sync to be accurate. Recommends retry. Returns [17].
2018-07-02 22:05:42.703: [ CTSS][1798948608]ctssslave_swm: Received from master (mode [0x8c] nodenum [1] hostname [raclhr-11gr2-n1] )
2018-07-02 22:05:42.703: [ CTSS][1798948608]ctsselect_monitor_steysync_mode: Failed in clsctssslave_sync_with_master [17]. Retries [0/3].
2018-07-02 22:05:42.703: [ CTSS][1798948608]ctssslave_swm1_1: Waiting for last time sync process to finish. sync_state[6].
2018-07-02 22:05:42.703: [ CTSS][1805252352]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler
2018-07-02 22:05:42.703: [ CTSS][1798948608]ctssslave_swm1_2: Ready to initiate new time sync process.
2018-07-02 22:05:42.703: [ CTSS][1798948608]ctssslave_swm2_1: Waiting for time sync message from master. sync_state[2].
2018-07-02 22:05:42.704: [ CTSS][1805252352]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].
2018-07-02 22:05:42.704: [ CTSS][1805252352]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].
2018-07-02 22:05:42.704: [ CTSS][1798948608]ctssslave_swm2_3: Received time sync message from master.
2018-07-02 22:05:42.704: [ CTSS][1798948608]ctssslave_swm: The magnitude [172752141259 usec] of the offset [172752141259 usec] is larger than [86400000000 usec] sec which is the CTSS limit.
2018-07-02 22:05:42.704: [ CTSS][1798948608]ctsselect_monitor_steysync_mode: Failed in clsctssslave_sync_with_master [12]: Time offset is too much to be corrected
2018-07-02 22:05:42.704: [ CTSS][1805252352]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler
2018-07-02 22:05:43.395: [ CTSS][2023593728]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xd0], offset[172752141 ms]}, length=[8].
2018-07-02 22:05:43.395: [ CTSS][1798948608]ctsselect_monitor_steysync_mode: CTSS daemon exiting [12].
2018-07-02 22:05:43.395: [ CTSS][1798948608]CTSS daemon aborting
2018-07-02 22:05:44.398: [ CTSS][2023593728]ctss_checkcb: clsdm requested check alive. checkcb_data{mode[0xd0], offset[172752141 ms]}, length=[8].
下面開始修復系統:
將系統時間設定成2018年06月30日的命令如下:
#date -s 06/30/2018
將系統時間設定成下午23點23分06秒的命令如下。
#date -s 22:14:06
然後重啟CRS服務:
crsctl stop crs -f
crsctl start crs
然後ctss自動同步時間:
[root@raclhr-11gR2-N2 ctssd]# crsctl stat res -t -init
——————————————————————————–
NAME TARGET STATE SERVER STATE_DETAILS
——————————————————————————–
Cluster Resources
——————————————————————————–
ora.ctssd
1 ONLINE ONLINE raclhr-11gr2-n2 ACTIVE:100
[root@raclhr-11gR2-N2 ctssd]# crsctl stat res -t -init
ora.ctssd
1 ONLINE ONLINE raclhr-11gr2-n2 ACTIVE:0
注意:本文內容太多,公眾號有字數限制,全文可點擊文末的閱讀原文,謝謝大家的理解。