朋友那一开发用的Oracle系统,linux平台,版本10.2.0.1.0。
一次在做rman备份时,提示ORA-00600错误。退出后,只要一查询DBA_JOBS,数据库就提示ORA-00600错误,然后数据库就自动关闭了。
一、故障现象
将alert.log拿来看了看,如下:
Database mounted in Exclusive Mode
Completed: ALTER DATABASE MOUNT
Mon Oct 15 14:55:00 2012
ALTER DATABASE OPEN
Mon Oct 15 14:55:00 2012
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=16, OS id=4306
Mon Oct 15 14:55:00 2012
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=17, OS id=4308
Mon Oct 15 14:55:00 2012
Thread 1 opened at log sequence 29
Current log# 1 seq# 29 mem# 0: /opt/ora10g/product/10.2.0/oradata/tftdb/REDO01.LOG
Successful open of redo thread 1
Mon Oct 15 14:55:00 2012
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Oct 15 14:55:00 2012
ARC0: STARTING ARCH PROCESSES
Mon Oct 15 14:55:00 2012
SMON: enabling cache recovery
Mon Oct 15 14:55:00 2012
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Mon Oct 15 14:55:00 2012
ARC2: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC0: Becoming the heartbeat ARCH
ARC2 started with pid=18, OS id=4310
Mon Oct 15 14:55:00 2012
Successfully onlined Undo Tablespace 1.
Mon Oct 15 14:55:00 2012
SMON: enabling tx recovery
Mon Oct 15 14:55:00 2012
Database Characterset is ZHS16GBK
Mon Oct 15 14:55:00 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_smon_4289.trc:
ORA-00600: internal error code, arguments: [4000], [2426], [], [], [], [], [], []
replication_dependency_tracking turned off (no async multimaster replication found)
Mon Oct 15 14:55:00 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/udump/tftdb_ora_4304.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 14:55:01 2012
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Mon Oct 15 14:55:01 2012
Starting background process QMNC
QMNC started with pid=19, OS id=4312
Mon Oct 15 14:55:01 2012
db_recovery_file_dest_size of 2048 MB is 48.96% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Mon Oct 15 14:55:01 2012
Completed: ALTER DATABASE OPEN
Mon Oct 15 14:55:01 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_cjq0_4293.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 14:55:01 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_cjq0_4293.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 14:55:01 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_cjq0_4293.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 14:55:02 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_cjq0_4293.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 14:55:06 2012
Errors in file /opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_cjq0_4293.trc:
ORA-00600: internal error code, arguments: [4000], [2411], [], [], [], [], [], []
Mon Oct 15 14:55:06 2012
可以看到,在数据库启动时,SMON进程就报错了:
二、故障分析
主要故障有两个:ORA-00600 [4000], [2426], [], [], [], [], [], [] 和ORA-00600 [4000], [2411], [], [], [], [], [], []
1)ORA-00600 [4000], [2426], [], [], [], [], [], []
打开tftdb_smon_4289.trc文件,可看到如下信息:
/opt/ora10g/product/10.2.0/admin/tftdb/bdump/tftdb_smon_4289.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /opt/ora10g/product/10.2.0/db_1
System name: Linux
Node name: node1
Release: 2.6.18-194.el5
Version: #1 SMP Tue Mar 16 21:52:43 EDT 2010
Machine: i686
Instance name: tftdb
Redo thread mounted by this instance: 1
Oracle process number: 8
Unix process pid: 4289, image: oracle@node1 (SMON)
*** SERVICE NAME:() 2012-10-15 14:55:00.738
*** SESSION ID:(164.1) 2012-10-15 14:55:00.738
*** 2012-10-15 14:55:00.738
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4000], [2426], [], [], [], [], [], []
Current SQL statement for this session:
select smontabv.cnt, smontab.time_mp, smontab.scn, smontab.num_mappings, smontab.tim_scn_map, smontab.orig_thread
from smon_scn_time smontab,
(select max(scn) scnmax,count(*)+sum(NVL2(TIM_SCN_MAP,NUM_MAPPINGS,0)) cnt
from smon_scn_time where thread=0) smontabv
where smontab.scn = smontabv.scnmax and thread=0
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+27 call ksedst1() 0 ? 1 ?
ksedmp()+557 call ksedst() 0 ? D ? CBD2D20 ? 2A ?
CBD2D20 ? 2A ?
看样子是smon_scn_time表出了问题。这个表smon进程约每6秒更新一次,写入scn与time的map信息。创建语句如下:
create cluster smon_scn_to_time (
thread number /* thread, compatibility */
)
/
create index smon_scn_to_time_idx on cluster smon_scn_to_time
/
create table smon_scn_time (
thread number, /* thread, compatibility */
time_mp number, /* time this recent scn represents */
time_dp date, /* time as date, compatibility */
scn_wrp number, /* scn.wrp, compatibility */
scn_bas number, /* scn.bas, compatibility */
num_mappings number,
tim_scn_map raw(1200),
scn number default 0, /* scn */
orig_thread number default 0 /* for downgrade */
) cluster smon_scn_to_time (thread)
/
create unique index smon_scn_time_tim_idx on smon_scn_time(time_mp)
/
create unique index smon_scn_time_scn_idx on smon_scn_time(scn)
/
执行查询:
select count(*) from sys.smon_scn_time;
ERROR at line 1:
ORA-00600: internal error code, arguments: [4000], [2521], [], [], [], [], [],
[]
但是一下语句却可以正确执行:
select * from smon_scn_time where rownum