景:先前在生产库上配置了Oracle监控,每5分钟尝试连接一次数据库,若连接失败则通过nagios+fetion自动报警,此配置参考文章:
早晨收到报警信息后,登陆数据库执行ps -ef查看oracle的后台进程都在,使用conn /as sysdba的方式登陆数据库,提示连接到空闲的实例,使用easy connect 方式连接则报oracle实例无法分配内存,从报错提示上看,就像oracle数据库实例未打开的状态!分析alert日志不断出现如下错误信息:
Process J002 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_cjq0_18577.trc:
trace文件摘要信息如下:
*** 2012-07-26 10:20:31.068
Process J002 is dead (pid=13857 req_ver=1136 cur_ver=1136 state=KSOSP_SPAWNED).
*** 2012-07-26 10:20:32.069
Process J002 is dead (pid=13876 req_ver=1594 cur_ver=1594 state=KSOSP_SPAWNED).
google查询一番后,发现大部分描述和oracle的进程数设置有关,又或者是内存不足引起!于是在oracle 10g环境下测试,线上数据库环境为11.2.0.3
一:设置processes初始化参数值为20,重启数据库后,已经占用19个进程
SQL> select count(*) from v$process; COUNT(*) ---------- 19 SQL> show parameter process; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ aq_tm_processes integer 0 db_writer_processes integer 1 gcs_server_processes integer 0 job_queue_processes integer 10 log_archive_max_processes integer 2 processes integer 20
新的会话连接,则报连接到空闲的实例,alert日志则出现相应的报错
[root@db1 ~]# su - oracle [oracle@db1 ~]$ sqlplus /nolog SQL*Plus: Release 10.2.0.1.0 - Production on Wed Jul 4 13:50:22 2012 Copyright (c) 1982, 2005, Oracle. All rights reserved. SQL> conn /as sysdba Connected to an idle instance. [oracle@db1 dbs]$ tail -f /u01/app/oracle/admin/db1/bdump/alert_db1.log Wed Jul 4 13:52:23 2012 ksvcreate: Process(q000) creation failed Wed Jul 4 13:52:35 2012 Process q001 died, see its trace file Wed Jul 4 13:52:35 2012 ksvcreate: Process(q001) creation failed Wed Jul 4 13:52:37 2012 Process m000 died, see its trace file Wed Jul 4 13:52:37 2012 ksvcreate: Process(m000) creation failed