检查日志,发现两次次宕机前均出现Oracle内部600错误 (ORA-00600 [15709] , [29], [1]),该错误是由于数据库发生并行回滚触发了Oracle Bug 6954722。检查alert日志及数据库回滚信息,发现确实有大量的数据回滚。
日志分析:
--First Crash
Wed Jan 8 12:08:51 2014
Errors in file /oracle/products/admin/szdm/bdump/szdm_smon_762530.trc:
ORA-00600: internal error code, arguments: [15709], [29], [1], [], [], [], [], []
ORA-30319: Message 30319 not found; product=RDBMS; facility=ORA
Wed Jan 8 12:08:58 2014
Fatal internal error happened while SMON was doing active transaction recovery.
Wed Jan 8 12:08:58 2014
Errors in file /oracle/products/admin/szdm/bdump/szdm_smon_762530.trc:
ORA-00600: internal error code, arguments: [15709], [29], [1], [], [], [], [], []
ORA-30319: Message 30319 not found; product=RDBMS; facility=ORA
SMON: terminating instance due to error 474
Termination issued to instance processes. Waiting for the processes to exit
--Second Crash
Wed Jan 8 16:11:13 2014
Completed checkpoint up to RBA [0xc7b41.2.10], SCN: 319187804449
Wed Jan 8 16:11:24 2014
Errors in file /oracle/products/admin/szdm/bdump/szdm_smon_816146.trc:
ORA-00600: internal error code, arguments: [15709], [29], [1], [], [], [], [], []
ORA-30319: Message 30319 not found; product=RDBMS; facility=ORA
Wed Jan 8 16:11:30 2014
Fatal internal error happened while SMON was doing active transaction recovery.
Wed Jan 8 16:11:30 2014
Errors in file /oracle/products/admin/szdm/bdump/szdm_smon_816146.trc:
ORA-00600: internal error code, arguments: [15709], [29], [1], [], [], [], [], []
ORA-30319: Message 30319 not found; product=RDBMS; facility=ORA
SMON: terminating instance due to error 474
Termination issued to instance processes. Waiting for the processes to exit
Wed Jan 8 16:11:40 2014
metalink:
Solution :
To implement solution for unpublished Bug: 6954722, please execute one of the following steps:
1. Use the following workaround
Set fast_start_parallel_rollback=false and recovery_parallelism=0
Setting fast_start_parallel_rollback=false and recovery_parallelism=0 simply tells Oracle to recover failed/aborted transaction in serial mode. THere is not harm in setting these as that should not be a common operation。
通过查询确认发现大量的事务在回滚:
select * from V$FAST_START_TRANSACTIONS;
根据以上分析结果可以确认异常宕机的原因是由于用户大量数据并行回滚导致触发Oracle Bug 6954722导致。
通过查看当前数据库的参数设置为:
recovery_parallelism=20
fast_start_parallel_rollback=LOW
根据metalink的建议将数据库参数修改,规避该Bug的问题出现。
recovery_parallelism=0
fast_start_parallel_rollback=false
: