早上刚上班,同事告诉我数据库连不上了,提示“ORA-12516”错误,我尝试通过PL/SQL Developer远程连接数据库,果然,报错了“ORA-12516: TNS: 监听程序无法找到匹配协议栈的可用句柄”;接着我通过远程桌面登录服务器,尝试用sys用户登录数据库,报了同样的错误。奇怪,昨天下班时还好好的。
我上网查了一下,这个报错一般是由于数据库的当前会话数不足造成的,相关的参数有两个:processes和sessions。我想查一下数据库这两个参数,但是sys用户无法登陆,真是着急。后来在朋友的建议下,采取以下步骤,顺利解决了这个问题。
a.关闭listener,禁止新的连接;
b.杀掉local=no的部分或者全部进程(根据业务的重要性),杀掉几个,保证sys用户能登陆;
c.登进去看看哪个业务出问题了,杀掉出问题的用户进程;
d.检查数据库;
e.启动listener;
介绍一下我的操作环境:
操作系统:Windows Server 2008 R2
数据库:Oracle 10g
首先,通过lsnrctl stop关闭监听器,禁止新的连接,以确保第二步能够执行成功;
第二,关闭了两个连接数据库的应用程序,然后尝试用sys用户登录数据库,登录成功;
第三,查看了processes和sessions两个初始化参数值,分别为150、170,均为默认值;
SQL>
SQL> show parameter processes
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 3
gcs_server_processes integer 0
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 150
SQL> show parameter sessions
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
java_max_sessionspace_size integer 0
java_soft_sessionspace_limit integer 0
license_max_sessions integer 0
license_sessions_warning integer 0
logmnr_max_persistent_sessions integer 1
sessions integer 170
shared_server_sessions integer
SQL>
第四,通过select sid,serial#,program,terminal from v$session;查看当前所有会话信息,从当时的结果可以看到,有一百多条记录,已经超过了数据库的session上限;而且,除了Oracle自身的十几个会话外,其余一百多个会话都是同一个terminal。由此,找出了故障点所在(这台设备是昨晚刚刚安装的一台终端)。
第五,关闭故障设备上的应用程序,再次通过select sid,serial#,program,terminal from v$session;查看当前所有会话信息,查询结果显示只剩下二十多条会话信息,考虑到Oracle自身的十几个会话外和同时启动的几个应用程序,应该是正常的;
第六,启动listener,尝试通过其他客户端连接数据库,一切正常,到此故障解决;
接下来,我想看一下究竟是什么原因导致了这次故障,继续;
第七,查看报警日志,在日志中看到了大量的Process m000 died报警;
Wed Apr 29 21:27:31 2015
ksvcreate: Process(m000) creation failed
Wed Apr 29 21:28:32 2015
Process m000 died, see its trace file
Wed Apr 29 21:28:32 2015
ksvcreate: Process(m000) creation failed
Wed Apr 29 21:29:33 2015
Process m000 died, see its trace file
第八,找到对应时间的trace文件,看到了“ORA-00020: maximum number of processes 150 exceeded Died during process startup with error 20 (seq=5413)”语句,原来是连接数超过了阀值,数据库无法再建立新的连接,所以报错。
Dump file c:\oracle\product\10.2.0\admin\hoegh\bdump\hoegh_ora_8032.trc
Wed Apr 29 21:28:31 2015
ORACLE V10.2.0.4.0 - 64bit Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.1 Service Pack 1
CPU : 24 - type 8664, 12 Physical Cores
Process Affinity : 0x0000000000000000
Memory (Avail/Total): Ph:3339M/8181M, Ph+PgF:10815M/16361M
Instance name: hoegh
Redo thread mounted by this instance: 1
Oracle process number: 0
Windows thread id: 8032, image: ORACLE.EXE
ORA-00020: maximum number of processes 150 exceeded
Died during process startup with error 20 (seq=5413)
OPIRIP: Uncaught error 20. Error stack:
ORA-00020: maximum number of processes (150) exceeded
Dump file c:\oracle\product\10.2.0\admin\hoegh\bdump\hoegh_ora_8032.trc
Thu Apr 30 00:19:05 2015
ORACLE V10.2.0.4.0 - 64bit Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.1 Service Pack 1
CPU : 24 - type 8664, 12 Physical Cores
Process Affinity : 0x0000000000000000
Memory (Avail/Total): Ph:3347M/8181M, Ph+PgF:10813M/16361M
Instance name: hoegh
Redo thread mounted by this instance: 1
Oracle process number: 0
Windows thread id: 8032, image: ORACLE.EXE
ORA-00020: maximum number of processes 150 exceeded
Died during process startup with error 20 (seq=5582)
OPIRIP: Uncaught error 20. Error stack:
ORA-00020: maximum number of processes (150) exceeded
Dump file c:\oracle\product\10.2.0\admin\hoegh\bdump\hoegh_ora_8032.trc
Thu Apr 30 01:27:31 2015
ORACLE V10.2.0.4.0 - 64bit Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.1 Service Pack 1
CPU : 24 - type 8664, 12 Physical Cores
Process Affinity : 0x0000000000000000
Memory (Avail/Total): Ph:3350M/8181M, Ph+PgF:10812M/16361M
Instance name: hoegh
Redo thread mounted by this instance: 1
Oracle process number: 0
Windows thread id: 8032, image: ORACLE.EXE
ORA-00020: maximum number of processes 150 exceeded
Died during process startup with error 20 (seq=5650)
OPIRIP: Uncaught error 20. Error stack:
ORA-00020: maximum number of processes (150) exceeded
Dump file c:\oracle\product\10.2.0\admin\hoegh\bdump\hoegh_ora_8032.trc
Thu Apr 30 09:54:12 2015
ORACLE V10.2.0.4.0 - 64bit Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.1 Service Pack 1
CPU : 24 - type 8664, 12 Physical Cores
Process Affinity : 0x0000000000000000
Memory (Avail/Total): Ph:3857M/8181M, Ph+PgF:11421M/16361M
Instance name: hoegh
Redo thread mounted by this instance: 1
Oracle process number: 0
Windows thread id: 8032, image: ORACLE.EXE
至于为什么新增设备会产生大量连接,到现在还没有搞清楚,怀疑和操作系统有关,这台设备安装的操作系统是windows xp embeded裁剪版系统,据说在安装系统时不太顺利;在故障设备上启动应用程序,通过select sid,serial#,program,terminal from v$session;监控实时会话信息,会话数不断增多,直到触碰阀值,数据库报错,问题成功复现;
我们又找来另外一台相同配置、相同操作系统的设备进行测试,没有出现这个问题。最后,只能把这台设备重装系统。
下面总结一下ORA-12516错误的解决办法:
一、一般是由于数据库的当前会话数不满足造成的,可以视业务需要增加processes和sessions参数的大小,这二者的关系是:sessions=(1.1*processes+5);
二、如果存在类似上述案例的恶意连接,可以按照上述步骤找到问题session,直接kill相关进程。
: