问题背景:
将Oracle 11gR2 RAC正常部署完成之后执行两节点重启操作发现其中有一个节点的集群资源无法启动,遂再次重启该无法启动集群资源的节点,还是不可。随即将正常节点重启发现原故障节点资源起来了,待重启完毕后原正常节点资源无法启动。
集群环境:
OS:RedHat EnterPrise5.8 x86_x64
DB:Oracle EnterPrise Database 11.2.0.4.0 x86_x64
GRID:Oracle Grid Infrastructure 11.2.0.4 x86_x64
心跳和公网网卡均做bond1绑定;
存储是IBM的,故采用RDAC做多路径聚合;
问题分析:
检查OS系统日志和集群日志
Oracle 11g 在RedHat Linux 5.8_x64平台的安装手册
Linux-6-64下安装Oracle 12C笔记
在CentOS 6.4下安装Oracle 11gR2(x64)
Oracle 11gR2 在VMWare虚拟机中安装步骤
Debian 下 安装 Oracle 11g XE R2
db01 grid 日志信息
2014-09-29 15:36:36.587:
[ctssd(12616)]CRS-2405:The Cluster Time Synchronization Service on host db01 is shutdown by user
2014-09-29 15:36:36.589:
[mdnsd(6173)]CRS-5602:mDNS service stopping by request.
[client(17411)]CRS-10001:29-Sep-14 15:36 ACFS-9290: Waiting for ASM to shutdown.
2014-09-29 15:36:46.395:
[cssd(8958)]CRS-1603:CSSD on node db01 shutdown by user.
2014-09-29 15:36:46.509:
[ohasd(12463)]CRS-2767:Resource state recovery not attempted for 'ora.cssdmonitor' as its target state is OFFLINE
2014-09-29 15:36:46.509:
[ohasd(12463)]CRS-2769:Unable to failover resource 'ora.cssdmonitor'.
2014-09-29 15:36:46.608:
[cssd(8958)]CRS-1660:The CSS daemon shutdown has completed
ocssd
2014-09-29 15:36:46.052: [ CSSD][1122224448]clssnmSendingThread: sending status msg to all nodes
2014-09-29 15:36:46.053: [ CSSD][1122224448]clssnmSendingThread: sent 4 status msgs to all nodes
2014-09-29 15:36:46.388: [ CSSD][1089472832]clssgmExitGrock: client 1 (0x1261ee30), grock haip.cluster_interconnect, member 0
2014-09-29 15:36:46.388: [ CSSD][1089472832]clssgmUnregisterPrimary: Unregistering member 0 (0x1261ace0) in global grock haip.cluster_interconnect
2014-09-29 15:36:46.388: [ CSSD][1089472832]clssgmUnreferenceMember: global grock haip.cluster_interconnect member 0 refcount is 0
2014-09-29 15:36:46.388: [ CSSD][1089472832]clssgmAllocateRPCIndex: allocated rpc 357 (0x2aaaaaee8a58)
2014-09-29 15:36:46.388: [ CSSD][1089472832]clssgmRPC: rpc 0x2aaaaaee8a58 (RPC#357) tag(165002a) sent to node 1
2014-09-29 15:36:46.388: [ CSSD][1079531840]clssgmHandleMasterMemberExit: [s(1) d(1)]
2014-09-29 15:36:46.388: [ CSSD][1079531840]clssgmRemoveMember: grock haip.cluster_interconnect, member number 0 (0x1261ace0) node number 1 state 0x4 grock type 2
2014-09-29 15:36:46.388: [ CSSD][1079531840]clssgmGrantLocks: 0-> new master (1/2) group haip.cluster_interconnect
2014-09-29 15:36:46.388: [ CSSD][1079531840]clssgmRPCDone: rpc 0x2aaaaaee8a58 (RPC#357) state 6, flags 0x100
2014-09-29 15:36:46.388: [ CSSD][1079531840]clssgmDelMemCmpl: rpc 0x2aaaaaee8a58, ret 0, client 0x1261ee30 member 0x1261ace0
2014-09-29 15:36:46.388: [ CSSD][1079531840]clssgmFreeRPCIndex: freeing rpc 357
2014-09-29 15:36:46.388: [ CSSD][1079531840]clssgmAllocateRPCIndex: allocated rpc 358 (0x2aaaaaee8b00)
2014-09-29 15:36:46.388: [ CSSD][1089472832]clssgmDiscEndpcl: gipcDestroy 0x1a370
2014-09-29 15:36:46.388: [ CSSD][1079531840]clssgmRPCBroadcast: rpc(0x166002a), status(1), sendcount(1), filtered by specific properties:
2014-09-29 15:36:46.388: [ CSSD][1089472832]clssgmDeadProc: proc 0x1262a8a0
2014-09-29 15:36:46.388: [ CSSD][1089472832]clssgmDestroyProc: cleaning up proc(0x1262a8a0) con(0x1a341) skgpid ospid 12515 with 0 clients, refcount 0
2014-09-29 15:36:46.388: [ CSSD][1089472832]clssgmDiscEndpcl: gipcDestroy 0x1a341
2014-09-29 15:36:46.389: [ CSSD][1079531840]clssgmRPCDone: rpc 0x2aaaaaee8b00 (RPC#358) state 4, flags 0x402
2014-09-29 15:36:46.389: [ CSSD][1079531840]clssgmBroadcastGrockRcfgCmpl: RPC(0x166002a) of grock(haip.cluster_interconnect) received all acks, grock update sequence(4)
2014-09-29 15:36:46.389: [ CSSD][1079531840]clssgmFreeRPCIndex: freeing rpc 358
2014-09-29 15:36:46.395: [ CSSD][1089472832]clssgmExecuteClientRequest: MAINT recvd from proc 7 (0x1264c8e0)
2014-09-29 15:36:46.395: [ CSSD][1089472832]clssgmShutDown: Received explicit shutdown request from client.
2014-09-29 15:36:46.395: [ CSSD][1089472832]clssgmClientShutdown: total iocapables 0
2014-09-29 15:36:46.395: [ CSSD][1089472832]clssgmClientShutdown: graceful shutdown completed.
2014-09-29 15:36:46.395: [ CSSD][1089472832]clssgmClientShutdown: signaling to the agent that resource should remain down
2014-09-29 15:36:46.395: [ CSSD][1089472832]clssgmCompareSwapEventValue: changed CmInfo State val 0, from 11, changes 21
2014-09-29 15:36:46.395: [ CSSD][1079531840]clssgmPeerListener: terminating at incarn(307480783)
2014-09-29 15:36:46.395: [ CSSD][1079531840]clssgmPeerDeactivate: node 2 (db02), death 0, state 0x1 connstate 0xf
2014-09-29 15:36:46.395: [ CSSD][1079531840]clssgmCleanFuture: discarded 0 future msgs for 2
2014-09-29 15:36:46.395: [ CSSD][1079531840]clssgmDiscEndppl: gipcDestroy 0x26027
2014-09-29 15:36:46.496: [ CSSD][1089472832]clssnmSendManualShut: Notifying all nodes that this node has been manually shut down
2014-09-29 15:36:46.497: [GIPCHAUP][1091049792] gipchaUpperDisconnect: initiated discconnect umsg 0x12ac30d0 { msg 0x12ad9f08, ret gipcretRequestPending (15), flags 0x2 }, msg 0x12ad9f08 { type gipchaMsgTypeDisconnect (5), srcCid 00000000-00026038, dstCid 00000000-00000d5b }, endp 0x12ac12d0 [0000000000026038] { gipchaEndpoint : port '5996-da20-cc7e-b119', peer 'db02:gm2_db-cluster/2991-10ef-6fca-054c', srcCid 00000000-00026038, dstCid 00000000-00000d5b, numSend 0, maxSend 100, groupListType 2, hagroup 0x12a9f4e0, usrFlags 0x4000, flags 0x21c }
2014-09-29 15:36:46.497: [GIPCHAUP][1091049792] gipchaUpperCallbackDisconnect: completed DISCONNECT ret gipcretSuccess (0), umsg 0x12ac30d0 { msg 0x12ad9f08, ret gipcretSuccess (0), flags 0x2 }, msg 0x12ad9f08 { type gipchaMsgTypeDisconnect (5), srcCid 00000000-00026038, dstCid 00000000-00000d5b }, hendp 0x12ac12d0 [0000000000026038] { gipchaEndpoint : port '5996-da20-cc7e-b119', peer 'db02:gm2_db-cluster/2991-10ef-6fca-054c', srcCid 00000000-00026038, dstCid 00000000-00000d5b, numSend 0, maxSend 100, groupListType 2, hagroup 0x12a9f4e0, usrFlags 0x4000, flags 0x21c }
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: Aborting client (0x2aaaac8b3180) proc (0x2aaaac937fd0), iocapables 1.
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: I/O capable proc (0x2aaaac937fd0), pid (6300), iocapables 1, client (0x2aaaac8b3180)
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: Aborting client (0x2aaaac9773e0) proc (0x2aaaac937fd0), iocapables 2.
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: I/O capable proc (0x2aaaac937fd0), pid (6300), iocapables 2, client (0x2aaaac9773e0)
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: Aborting client (0x2aaaac958d50) proc (0x2aaaac99d570), iocapables 3.
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: I/O capable proc (0x2aaaac99d570), pid (6208), iocapables 3, client (0x2aaaac958d50)
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: Aborting client (0x12645650) proc (0x1264c8e0), iocapables 4.
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: I/O capable proc (0x1264c8e0), pid (12518), iocapables 4, client (0x12645650)
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: Aborting client (0x126171b0) proc (0x1264c8e0), iocapables 5.
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: I/O capable proc (0x1264c8e0), pid (12518), iocapables 5, client (0x126171b0)
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: Aborting client (0x1264cef0) proc (0x1264c8e0), iocapables 6.
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: I/O capable proc (0x1264c8e0), pid (12518), iocapables 6, client (0x1264cef0)
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: Aborting client (0x125f0340) proc (0x1260f910), iocapables 7.
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: I/O capable proc (0x1260f910), pid (12520), iocapables 7, client (0x125f0340)
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: Aborting client (0x125e98b0) proc (0x1260f910), iocapables 8.
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: I/O capable proc (0x1260f910), pid (12520), iocapables 8, client (0x125e98b0)
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: Aborting client (0x12ac2790) proc (0x1260f910), iocapables 9.
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmSendShutdown: I/O capable proc (0x1260f910), pid (12520), iocapables 9, client (0x12ac2790)
2014-09-29 15:36:46.506: [ CSSD][1089472832]clssgmClientShutdown: sending shutdown, fence_done 1
2014-09-29 15:36:46.608: [ default][1089472832]kgzf_fini: called
2014-09-29 15:36:46.608: [ default][1089472832]kgzf_fini1: completed. kgzf layer has quit.
crsd
2014-09-29 15:36:13.419: [UiServer][1180916032]{1:44501:180} Sending message to PE. ctx= 0x8a97b60
2014-09-29 15:36:13.420: [ CRSPE][1178814784]{1:44501:180} Cmd : 0x2aaab018b580 : flags: FORCE_TAG
2014-09-29 15:36:13.420: [ CRSPE][1178814784]{1:44501:180} Processing PE command id=238. Description: [Server Shutdown {} : pass=0 : 0x2aaab018b580]
2014-09-29 15:36:13.420: [ CRSPE][1178814784]{1:44501:180} Prepared shutdown cmd for: db01
2014-09-29 15:36:13.420: [ CRSPE][1178814784]{1:44501:180} Server [db01] has changed state from [ONLINE] to [LEAVING]
2014-09-29 15:36:13.420: [ CRSOCR][1176713536]{1:44501:180} Multi Write Batch processing...
2014-09-29 15:36:13.420: [ CRSRPT][1180916032]{1:44501:180} Published to EVM CRS_SERVER_STATE_CHANGE for db01
2014-09-29 15:36:13.437: [ CRSPE][1178814784]{1:44501:180} Op 0x2aaab02145f0 has 3 WOs
2014-09-29 15:36:13.437: [ CRSPE][1178814784]{1:44501:180} Op 0x2aaab0215980 has 4 WOs
2014-09-29 15:36:13.438: [ CRSPE][1178814784]{1:44501:180} Op 0x2aaab02765c0 has 5 WOs
2014-09-29 15:36:13.438: [UiServer][1180916032]{1:44501:180} Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'db01']
MSGTYPE:
TextMessage[3]
OBJID:
TextMessage[db01]
WAIT:
TextMessage[0]
]
2014-09-29 15:36:13.438: [ CRSPE][1178814784]{1:44501:180} Op 0x2aaab0217fb0 has 11 WOs
2014-09-29 15:36:13.438: [ CRSPE][1178814784]{1:44501:180} Op 0x2aaab0214150 has 9 WOs
2014-09-29 15:36:13.438: [ CRSPE][1178814784]{1:44501:180} Op 0x2aaab0213ee0 has 3 WOs
2014-09-29 15:36:13.439: [ CRSPE][1178814784]{1:44501:180} Op 0x2aaab018ab50 has 14 WOs
2014-09-29 15:36:13.439: [ CRSPE][1178814784]{1:44501:180} RI [ora.cvu 1 1] new internal state: [STOPPING] old value: [STABLE]
2014-09-29 15:36:13.439: [ CRSPE][1178814784]{1:44501:180} Sending message to agfw: id = 1238
2014-09-29 15:36:13.439: [ AGFW][1168308544]{1:44501:180} Agfw Proxy Server received the message: RESOURCE_STOP[ora.cvu 1 1] ID 4099:1238
2014-09-29 15:36:13.439: [ CRSPE][1178814784]{1:44501:180} CRS-2673: Attempting to stop 'ora.cvu' on 'db01'
2014-09-29 15:36:13.439: [ AGFW][1168308544]{1:44501:180} Agfw Proxy Server forwarding the message: RESOURCE_STOP[ora.cvu 1 1] ID 4099:1238 to the agent /DBSoft/11.2.4/grid/bin/scriptagent_grid
2014-09-29 15:36:13.440: [UiServer][1180916032]{1:44501:180} Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2673: Attempting to stop 'ora.cvu' on 'db01']
MSGTYPE:
TextMessage[3]
OBJID:
TextMessage[ora.cvu]
WAIT:
TextMessage[0]
]
………….
更多详情见请继续阅读下一页的精彩内容: