今天,一个11.2.0.2 on CentOS5.8 x86-64 的单点库,出现了问题。
具体现象就是,每次alter database open时,会在10几秒内 自动crash(崩溃)掉
alert 日志如下
------------------------------------------------------------------------------------------------------------------------
SMON: slave died unexpectedly, downgrading to serial recovery
Errors in file /cvms/app/Oracle/diag/rdbms/hncvms/HNCVMS/trace/HNCVMS_smon_11203.trc (incident=187832):
ORA-00600: internal error code, arguments: [17182], [0x2B7C8E7CD7A0], [], [], [], [], [], [], [], [], [], []
Incident details in: /cvms/app/oracle/diag/rdbms/hncvms/HNCVMS/incident/incdir_187832/HNCVMS_smon_11203_i187832.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Doing block recovery for file 12 block 470173
Resuming block recovery (PMON) for file 12 block 470173
Block recovery from logseq 31163, block 68 to scn 70921364289
Recovery of Online Redo Log: Thread 1 Group 1 Seq 31163 Reading mem 0
Mem# 0: /cvms/app/oracle/oradata/HNCVMS/redo01.log
Block recovery completed at rba 31163.135.16, scn 16.2201887554
ORACLE Instance HNCVMS (pid = 14) - Error 600 encountered while recovering transaction (11, 3) on object 75050.
Errors in file /cvms/app/oracle/diag/rdbms/hncvms/HNCVMS/trace/HNCVMS_smon_11203.trc:
ORA-00600: internal error code, arguments: [17182], [0x2B7C8E7CD7A0], [], [], [], [], [], [], [], [], [], []
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x90890C3, kghrst()+1567] [flags: 0x0, count: 1]
Errors in file /cvms/app/oracle/diag/rdbms/hncvms/HNCVMS/trace/HNCVMS_smon_11203.trc (incident=187833):
ORA-07445: exception encountered: core dump [kghrst()+1567] [SIGSEGV] [ADDR:0x0] [PC:0x90890C3] [SI_KERNEL(general_protection)] []
ORA-00600: internal error code, arguments: [17182], [0x2B7C8E7CD7A0], [], [], [], [], [], [], [], [], [], []
Incident details in: /cvms/app/oracle/diag/rdbms/hncvms/HNCVMS/incident/incdir_187833/HNCVMS_smon_11203_i187833.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Feb 17 10:44:11 2014
PMON (ospid: 11177): terminating the instance due to error 474
Errors in file /cvms/app/Oracle/diag/rdbms/hncvms/HNCVMS/trace/HNCVMS_smon_11203.trc (incident=187832):
ORA-00600: internal error code, arguments: [17182], [0x2B7C8E7CD7A0], [], [], [], [], [], [], [], [], [], []
Incident details in: /cvms/app/oracle/diag/rdbms/hncvms/HNCVMS/incident/incdir_187832/HNCVMS_smon_11203_i187832.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Doing block recovery for file 12 block 470173
Resuming block recovery (PMON) for file 12 block 470173
Block recovery from logseq 31163, block 68 to scn 70921364289
Recovery of Online Redo Log: Thread 1 Group 1 Seq 31163 Reading mem 0
Mem# 0: /cvms/app/oracle/oradata/HNCVMS/redo01.log
Block recovery completed at rba 31163.135.16, scn 16.2201887554
ORACLE Instance HNCVMS (pid = 14) - Error 600 encountered while recovering transaction (11, 3) on object 75050.
Errors in file /cvms/app/oracle/diag/rdbms/hncvms/HNCVMS/trace/HNCVMS_smon_11203.trc:
ORA-00600: internal error code, arguments: [17182], [0x2B7C8E7CD7A0], [], [], [], [], [], [], [], [], [], []
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x90890C3, kghrst()+1567] [flags: 0x0, count: 1]
Errors in file /cvms/app/oracle/diag/rdbms/hncvms/HNCVMS/trace/HNCVMS_smon_11203.trc (incident=187833):
ORA-07445: exception encountered: core dump [kghrst()+1567] [SIGSEGV] [ADDR:0x0] [PC:0x90890C3] [SI_KERNEL(general_protection)] []
ORA-00600: internal error code, arguments: [17182], [0x2B7C8E7CD7A0], [], [], [], [], [], [], [], [], [], []
Incident details in: /cvms/app/oracle/diag/rdbms/hncvms/HNCVMS/incident/incdir_187833/HNCVMS_smon_11203_i187833.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Feb 17 10:44:11 2014
PMON (ospid: 11177): terminating the instance due to error 474
------------------------------------------------------------------------------------------------------------------------------
通过观察日志,看部分
Doing block recovery for file 12 block 470173
Resuming block recovery (PMON) for
Block recovery from logseq 31163, block 68 to scn 70921364289
Recovery of Online Redo Log: Thread 1 Group 1 Seq 31163 Reading mem 0
Mem# 0: /cvms/app/oracle/oradata/HNCVMS/redo01.log
Block recovery completed at rba 31163.135.16, scn 16.2201887554
ORACLE Instance HNCVMS (pid = 14) - Error 600 encountered while recovering transaction (11, 3) on object
再根据原理分析,则可以定位到问题是出现在Instance recovery时出现的。通过报错可以判断,是对 file12 block 470173 也就是 object id为 75050 中的一个块,出现了问题(坏块)
导致instance recover时遭遇到了600错误,Pmon 终止,Pmon的终止又导致 整个实例的down掉
遇到这个情况,我们分两步处理:
第一步:启动实例,查询file 12 block 470173 以及object 75050 是属于哪个类型的对象
1、启动实例
启动实例时,会执行Instance recovery,而recovery又会crash库,那么我们需要将 instance recovery的动作暂时先停掉
根据现在的spfile 创建一个pfile,然后在pfile后面跟一条:
event="10513 trace name context forever, level 2"
2、查询对象(现在系统是不会崩溃了,但是数据库状态是非正常的,因为没有instance recovery)
2、查询对象(现在系统是不会崩溃了,但是数据库状态是非正常的,因为没有instance recovery)
1)SQL> SELECT owner, object_name, object_type FROM dba_objects WHERE object_id = 75050;
2) SQL> select segment_type,owner,segment_name from dba_extents where file_id = &file_id and &block between block_id and block_id+blocks -1;
根据上面反馈的信息,把对象类型查出来
2) SQL> select segment_type,owner,segment_name from dba_extents where file_id = &file_id and &block between block_id and block_id+blocks -1;
根据上面反馈的信息,把对象类型查出来