当前位置:  数据库>oracle

Oracle 11gR2 RAC节点crash故障分析

    来源: 互联网  发布时间:2017-06-06

    本文导语: 环境:AIX 7100Oracle 11gR2 RAC 详细版本:11.2.0.4 现象: 节点2 CRS HANG住了,CRSCTL命令完全没反应,直接干掉CRS进程主机重启后,但VIP没飘到节点1 分析思路; 1、DB下的alert日志及相关trace日志。 2. 查看所有节点的"errpt -a"的输出。...

环境:AIX 7100
Oracle 11gR2 RAC
 详细版本:11.2.0.4
 
现象:
 节点2 CRS HANG住了,CRSCTL命令完全没反应,直接干掉CRS进程主机重启后,但VIP没飘到节点1
 
分析思路;
 1、DB下的alert日志及相关trace日志。
 2. 查看所有节点的"errpt -a"的输出。
 3. 查看发生问题时所有节点的GI日志:
 /log//alert*.log
 /log//crsd/crsd.log
 /log//cssd/ocssd.log
 /log//agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log
 /log//agent/ohasd/oracssdagent_root/oracssdagent_root.log
 /etc/oracle/lastgasp/*, or /var/opt/oracle/lastgasp/*(If have)
 注:如果是CRS发起的重启主机会在/etc/oracle/lastgasp/目录下的文件中添加一条记录。
 4.  查看发生问题时所有节点的LMON, LMS*,LMD0 trace files。
 5. 查看发生问题时所有节点OSW的所有输出。

--------------------------------------分割线 --------------------------------------

在CentOS 6.4下安装Oracle 11gR2(x64)

Oracle 11gR2 在VMWare虚拟机中安装步骤

Debian 下 安装 Oracle 11g XE R2

--------------------------------------分割线 --------------------------------------
 
详细分析过程如下:
 
节点1DB的alert日志:
 Tue Mar 25 12:59:07 2014
 Thread 1 advanced to log sequence 245 (LGWR switch)
  Current log# 2 seq# 245 mem# 0: +SYSDG/dbracdb/onlinelog/group_2.264.840562709
  Current log# 2 seq# 245 mem# 1: +SYSDG/dbracdb/onlinelog/group_2.265.840562727
 Tue Mar 25 12:59:20 2014
 Archived Log entry 315 added for thread 1 sequence 244 ID 0xffffffff82080958 dest 1:
 Tue Mar 25 13:14:54 2014
 IPC Send timeout detected. Sender: ospid 6160700 [oracle@dbrac1 (LMS0)]
 Receiver: inst 2 binc 291585594 ospid 11010320
 IPC Send timeout to 2.1 inc 50 for msg type 65518 from opid 12
 Tue Mar 25 13:14:59 2014
 Communications reconfiguration: instance_number 2
 Tue Mar 25 13:15:01 2014
 IPC Send timeout detected. Sender: ospid 12452050 [oracle@dbrac1 (LMS1)]
 Receiver: inst 2 binc 291585600 ospid 11534636
 IPC Send timeout to 2.2 inc 50 for msg type 65518 from opid 13
 Tue Mar 25 13:15:22 2014
 IPC Send timeout detected. Sender: ospid 10682630 [oracle@dbrac1 (TNS V1-V3)]
 Receiver: inst 2 binc 50 ospid 6095056
 Tue Mar 25 13:15:25 2014
 Detected an inconsistent instance membership by instance 1
 Evicting instance 2 from cluster
 Waiting for instances to leave: 2
 Tue Mar 25 13:15:26 2014
 Dumping diagnostic data in directory=[cdmp_20140325131526], requested by (instance=2, osid=8192018 (LMD0)), summary=[abnormal instance termination].
 Tue Mar 25 13:15:42 2014
 Reconfiguration started (old inc 50, new inc 54)
 List of instances:
 1 (myinst: 1)
 ...
 Tue Mar 25 13:15:52 2014
 Archived Log entry 316 added for thread 2 sequence 114 ID 0xffffffff82080958 dest 1:
 Tue Mar 25 13:15:53 2014
 ARC3: Archiving disabled thread 2 sequence 115
 Archived Log entry 317 added for thread 2 sequence 115 ID 0xffffffff82080958 dest 1:
 Tue Mar 25 13:16:37 2014
 Thread 1 advanced to log sequence 246 (LGWR switch)
  Current log# 3 seq# 246 mem# 0: +SYSDG/dbracdb/onlinelog/group_3.266.840562735
  Current log# 3 seq# 246 mem# 1: +SYSDG/dbracdb/onlinelog/group_3.267.840562747
 Tue Mar 25 13:16:46 2014
 Decreasing number of real time LMS from 2 to 0
 Tue Mar 25 13:16:51 2014
 Archived Log entry 318 added for thread 1 sequence 245 ID 0xffffffff82080958 dest 1:
 Tue Mar 25 13:20:50 2014
 IPC Send timeout detected. Sender: ospid 9306248 [oracle@dbrac1 (PING)]
 Receiver: inst 2 binc 291585377 ospid 2687058
 Tue Mar 25 13:30:08 2014
 Thread 1 advanced to log sequence 247 (LGWR switch)
  Current log# 1 seq# 247 mem# 0: +SYSDG/dbracdb/onlinelog/group_1.262.840562653
  Current log# 1 seq# 247 mem# 1: +SYSDG/dbracdb/onlinelog/group_1.263.840562689
 Tue Mar 25 13:30:20 2014
 Archived Log entry 319 added for thread 1 sequence 246 ID 0xffffffff82080958 dest 1:
 Tue Mar 25 13:45:23 2014
 Thread 1 advanced to log sequence 248 (LGWR switch)
  Current log# 2 seq# 248 mem# 0: +SYSDG/dbracdb/onlinelog/group_2.264.840562709
  Current log# 2 seq# 248 mem# 1: +SYSDG/dbracdb/onlinelog/group_2.265.840562727
 
 节点2DB的alert日志:
 Tue Mar 25 12:07:15 2014
 Archived Log entry 309 added for thread 2 sequence 112 ID 0xffffffff82080958 dest 1:
 Tue Mar 25 12:22:22 2014
 Dumping diagnostic data in directory=[cdmp_20140325122222], requested by (instance=1, osid=7012828), summary=[incident=384673].
 Tue Mar 25 12:45:21 2014
 Thread 2 advanced to log sequence 114 (LGWR switch)
  Current log# 6 seq# 114 mem# 0: +SYSDG/dbracdb/onlinelog/group_6.274.840563009
  Current log# 6 seq# 114 mem# 1: +SYSDG/dbracdb/onlinelog/group_6.275.840563017
 Tue Mar 25 12:45:22 2014
 Archived Log entry 313 added for thread 2 sequence 113 ID 0xffffffff82080958 dest 1:
 Tue Mar 25 13:14:57 2014
 IPC Send timeout detected. Receiver ospid 11010320
 Tue Mar 25 13:14:57 2014
 Errors in file /oraclelog/diag/rdbms/dbracdb/dbracdb2/trace/dbracdb2_lms0_11010320.trc:
 IPC Send timeout detected. Receiver ospid 11534636 [
 Tue Mar 25 13:15:01 2014
 Errors in file /oraclelog/diag/rdbms/dbracdb/dbracdb2/trace/dbracdb2_lms1_11534636.trc:
 Tue Mar 25 13:15:25 2014
 LMS0 (ospid: 11010320) has detected no messaging activity from instance 1
 LMS0 (ospid: 11010320) issues an IMR to resolve the situation
 Please check LMS0 trace file for more detail.
 Tue Mar 25 13:15:25 2014
 Suppressed nested communications reconfiguration: instance_number 1
 Detected an inconsistent instance membership by instance 1
 Tue Mar 25 13:15:25 2014
 Received an instance abort message from instance 1
 Please check instance 1 alert and LMON trace files for detail.
 LMD0 (ospid: 8192018): terminating the instance due to error 481
 Tue Mar 25 13:15:26 2014
 ORA-1092 : opitsk aborting process
 Tue Mar 25 13:15:29 2014
 System state dump requested by (instance=2, osid=8192018 (LMD0)), summary=[abnormal instance termination].
 System State dumped to trace file /oraclelog/diag/rdbms/dbracdb/dbracdb2/trace/dbracdb2_diag_9699724_20140325131529.trc
 Instance terminated by LMD0, pid = 8192018
 

节点1的OSW PRVTNET日志:
 zzz ***Tue Mar 25 13:12:19 BEIST 2014
 trying to get source for 192.168.100.1
 source should be 192.168.100.1
 traceroute to 192.168.100.1 (192.168.100.1) from 192.168.100.1 (192.168.100.1), 30 hops max
 outgoing MTU = 1500
 1  dbrac1-priv (192.168.100.1)  1 ms  0 ms  0 ms
 trying to get source for 192.168.100.2
 source should be 192.168.100.1
 traceroute to 192.168.100.2 (192.168.100.2) from 192.168.100.1 (192.168.100.1), 30 hops max
 outgoing MTU = 1500
 1  dbrac2-priv (192.168.100.2)  1 ms  0 ms *
 zzz ***Warning. Traceroute response is spanning snapshot intervals.
 zzz ***Tue Mar 25 13:12:31 BEIST 2014
 trying to get source for 192.168.100.1
 source should be 192.168.100.1
 traceroute to 192.168.100.1 (192.168.100.1) from 192.168.100.1 (192.168.100.1), 30 hops max
 outgoing MTU = 1500
 1  dbrac1-priv (192.168.100.1)  1 ms  0 ms  0 ms
 trying to get source for 192.168.100.2
 source should be 192.168.100.1
 traceroute to 192.168.100.2 (192.168.100.2) from 192.168.100.1 (192.168.100.1), 30 hops max
 outgoing MTU = 1500
 1  * * *
 2  * * *
 3  * dbrac2-priv (192.168.100.2)  0 ms *
 zzz ***Warning. Traceroute response is spanning snapshot intervals.
 zzz ***Tue Mar 25 13:13:17 BEIST 2014
 trying to get source for 192.168.100.1
 source should be 192.168.100.1
 traceroute to 192.168.100.1 (192.168.100.1) from 192.168.100.1 (192.168.100.1), 30 hops max
 outgoing MTU = 1500
 1  dbrac1-priv (192.168.100.1)  1 ms  0 ms  0 ms
 trying to get source for 192.168.100.2
 source should be 192.168.100.1
 traceroute to 192.168.100.2 (192.168.100.2) from 192.168.100.1 (192.168.100.1), 30 hops max
 outgoing MTU = 1500
 1  * * *
 2  * * *
 3  dbrac2-priv (192.168.100.2)  0 ms * *
 zzz ***Warning. Traceroute response is spanning snapshot intervals.
 zzz ***Tue Mar 25 13:14:04 BEIST 2014
 trying to get source for 192.168.100.1
 source should be 192.168.100.1
 traceroute to 192.168.100.1 (192.168.100.1) from 192.168.100.1 (192.168.100.1), 30 hops max
 outgoing MTU = 1500
 1  dbrac1-priv (192.168.100.1)  1 ms  0 ms  0 ms
 trying to get source for 192.168.100.2
 source should be 192.168.100.1
 traceroute to 192.168.100.2 (192.168.100.2) from 192.168.100.1 (192.168.100.1), 30 hops max
 outgoing MTU = 1500
 1  * * *


    
 
 
 
本站(WWW.)旨在分享和传播互联网科技相关的资讯和技术,将尽最大努力为读者提供更好的信息聚合和浏览方式。
本站(WWW.)站内文章除注明原创外,均为转载、整理或搜集自网络。欢迎任何形式的转载,转载请注明出处。












  • 相关文章推荐
  • Oracle中关数据库对象的统计分析
  • 深入oracle特定信息排序的分析
  • oracle 数据库连接分析
  • Oracle移动数据文件到新分区步骤分析
  • 基于oracle小数点前零丢失的问题分析
  • Oracle date如何比较大小分析
  • Linux上建立第二个ORACLE实例分析
  • Oracle 数据库容灾复制解决方案分析Shar Plex
  • oracle更新xml节点问题的一些细节 iis7站长之家
  • oracle导出sql语句的结果集和保存执行的sql语句(深入分析)
  • linux as3 _weblogic8_oracle系统访问50万/日,常常ping不通,分析原因
  • Oracle数据库安全策略分析 (三)第1/2页
  • Oracle数据库安全策略分析(一)第1/2页
  • Oracle数据完整性嵌套事务调用分析研究
  • Oracle In和exists not in和not exists的比较分析
  • oracle修改SGA后无法启动问题分析及解决方法
  • Oracle案例:分析10053跟踪文件
  • Oracle数据库安全策略分析(二)
  • Oracle 9i中自动撤销管理的优点分析
  • Oracle数据库后台进程的功能分析
  • Oracle 12c发布简单介绍及官方下载地址
  • 在linux下安装oracle,如何设置让oracle自动启动!也就是让oracle那个服务自动启动,不是手动的
  • oracle 11g最新版官方下载地址
  • 请问su oracle 和su - oracle有什么不同?
  • Oracle 数据库(oracle Database)Select 多表关联查询方式
  • 虚拟机装Oracle R12与Oracle10g
  • Oracle数据库(Oracle Database)体系结构及基本组成介绍
  • Oracle 数据库开发工具 Oracle SQL Developer
  • 如何设置让Oracle SQL Developer显示的时间包含时分秒
  • Oracle EBS R12 支持 Oracle Database 11g
  • Oracle 10g和Oracle 11g网格技术介绍


  • 站内导航:


    特别声明:169IT网站部分信息来自互联网,如果侵犯您的权利,请及时告知,本站将立即删除!

    ©2012-2021,