环境:
平台:RedHat EnterPrise 5.8 X86_X64
数据库:Oracle EnterPrise 11.2.0.3
集群软件:Oracle grid 11.2.0.3
故障现象:
数据库出现了归档失败,其中有一个节点的实例出现HANG死的状况。
日志信息如下:
Fri Feb 28 19:49:04 2014
ARC1: Error 19504 Creating archive log file to '+DATA02'
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance orcl1 - Archival Error
ORA-16038: log 14 sequence# 68244 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.264.792274883'
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.265.792274889'
Archiver process freed from errors. No longer stopped
Fri Feb 28 19:50:22 2014
ARC0: LGWR is actively archiving destination LOG_ARCHIVE_DEST_3
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance orcl1 - Archival Error
ORA-16014: log 14 sequence# 68244 not archived, no available destinations
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.264.792274883'
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.265.792274889'
ARC0: Archive log rejected (thread 1 sequence 68240) at host 'orclsh'
FAL[server, ARC0]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance orcl1 - Archival Error. Archiver continuing.
分析:
由于归档失败发生在ASM磁盘上,首先检查ASM磁盘空间以及DB_RECOVERY_FILE_DEST_SIZE,ASM磁盘空间是足够的,而且由于只有一个节点出现出现了无法归档的问题,也可以排除是空间不足造成的。确认两个节点的DB_RECOVERY_FILE_DEST_SIZE参数设置都是0,基本上可以判断问题和当前节点的ASM实例状态不正常有关。
检查ASM实例的错误信息:
Fri Feb 28 19:41:23 2014
Dumping diagnostic data in directory=[cdmp_20130702164115], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165521].
Fri Feb 28 19:49:19 2014
Dumping diagnostic data in directory=[cdmp_20130702164845], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165522].
Fri Feb 28 19:55:56 2014
Dumping diagnostic data in directory=[cdmp_20130702165517], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165523].
当前节点ASM实例出现了的这个信息,说明报错发生在实例2上:
Fri Feb 28 18:34:25 2014
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_2032294.trc (incident=186256):
ORA-04031: unable to allocate 3768 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges enqueues")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Insufficient shared pool to allocate a GES object (ospid 2032294)
Fri Feb 28 18:29:53 2014
Sweep [inc][186256]: completed
Fri Feb 28 18:36:49 2014
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_2032294.trc (incident=186257):
ORA-04031: unable to allocate 3768 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges enqueues")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Insufficient shared pool to allocate a GES object (ospid 2032294)