Oracle jobs为Oracle开发人员和数据库管理员提供了数据库层面维护的极大便利性。对于Oracle jobs在Oracle 9i之前,是由dbms_jobs来实现,而到了10g之后,多出了dbms_scheduler方式。两者同样可以添加Oracle job,只不过dbms_scheduler的功能更为强大。在使用Oracle jobs时,我们不得不关注job_queue_processes参数,用于设定job队列可以启动的进程数。本文即是围绕此展开。
1、job_queue_processes参数
alter system set job_queue_processes= 0,,,1000
下面是11g reference的描述:
JOB_QUEUE_PROCESSES specifies the maximum number of job slaves per instance that can be created for the execution of DBMS_JOB jobs and Oracle Scheduler (DBMS_SCHEDULER) jobs. DBMS_JOB and Oracle Scheduler share the same job coordinator and job slaves, and they are both controlled by the JOB_QUEUE_PROCESSES parameter.
If the value of JOB_QUEUE_PROCESSES is set to 0, then DBMS_JOB jobs and Oracle Scheduler jobs will not run on the instance.If JOB_QUEUE_PROCESSES is set to a value in the range of 1 to 1000, then DBMS_JOB jobs and Oracle Scheduler jobs will run. The actual number of job slaves created for Oracle Scheduler jobs is auto-tuned by the Scheduler depending on several factors, including available resources, Resource Manager settings, and currently running jobs. However, the combined total number of job slaves running DBMS_JOB jobs and Oracle Scheduler jobs on an instance can never exceed the value of JOB_QUEUE_PROCESSES for that instance. The number of job slaves running Oracle Scheduler jobs is additionally limited to the value of the MAX_JOB_SLAVE_PROCESSES Scheduler attribute.
Advanced replication uses Oracle Scheduler for data refreshes. Oracle Streams Advanced Queuing uses Oracle Scheduler for message propagation. Materialized views use Oracle Scheduler for automatic refreshes. Setting JOB_QUEUE_PROCESS to 0 will disable these features as well as any other features that use Oracle Scheduler or DBMS_JOB.
a、从上面的描述可知,对于Oracle job进程,包含协调进程(主进程)以及奴隶进程(子进程)。
b、job_queue_processes取值范围为0到1000,总共可创建多少个job进程由job_queue_processes参数来决定。
c、当job_queue_processes大于1时,且并行执行job时,至少一个为协调进程。其总数不会超出job_queue_processes的值。
d、job_queue_processes参数的值为且DBMS_JOB与DBMS_SCHEDULER共享。
e、job_queue_processes参数,当设定该值为0的时候则任意方式创建的job都不会运行。
f、非零值的job_queue_processes,其job子进程数依赖于可用资源,资源配置方式以及当前运行的job数来自行调整。
g、此外对于Scheduler jobs方式还受限制于scheduler属性MAX_JOB_SLAVE_PROCESSES的设置。
h、可以通过DBMS_SCHEDULER.SET_SCHEDULER_ATTRIBUTE来设置max_job_slave_processes
2、测试参数job_queue_processes为1的情形
-->演示环境 SQL> select * from v$version where rownum<2; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production -->创建测试用表 CREATE TABLE tb_job ( job_name VARCHAR2 (5), update_dt VARCHAR2 (20) ); -->添加多个Oracle job来并发执行 SQL> ho more add_job.sql DECLARE job_name VARCHAR2 (20); BEGIN DBMS_OUTPUT.put_line ('Current sysdate is ' || TO_CHAR (SYSDATE, 'yyyymmdd hh24:mi:ss')); FOR i IN 1 .. 5 LOOP job_name := 'JOB_' || TO_CHAR (i); sys.DBMS_SCHEDULER.create_job ( job_name => job_name, start_date => sysdate+1/1440, repeat_interval => 'freq = minutely; interval=1', end_date => NULL, job_class => 'DEFAULT_JOB_CLASS', job_type => 'PLSQL_BLOCK', job_action => ' begin INSERT INTO tb_job SELECT ''' || job_name || ''', TO_CHAR (SYSDATE, ''yyyymmdd hh24:mi:ss'') FROM DUAL; dbms_lock.sleep(60); commit; end; ', enabled => true, comments => 'my test job'); END LOOP; END; / SQL> @add_job PL/SQL procedure successfully completed. -->查看job_queue_processes参数的值 SQL> show parameter job NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ job_queue_processes integer 1 -->此时和Job相关的进程一个,为ora_j000 SQL> ho ps -ef | grep rac11g | grep ora_j | grep -v grep oracle 9692 1 4 12:03 ? 00:00:01 ora_j000_rac11g1 -->查看刚刚添加的job -->下面的NEXT_RUN_DATE在秒级别上稍有差异,其实在定义job时,这个是由于前面的for循环造成的延迟 -->在此忽略这个细微的差异 SQL> @jobs OWNER JOB_NAME ENABL JOB_CLASS NEXT_RUN_DATE ------------ -------------- ----- ------------------------------ ----------------------------------- SCOTT JOB_1 TRUE DEFAULT_JOB_CLASS 19-MAR-13 12.21.33.000000 PM +08:00 SCOTT JOB_2 TRUE DEFAULT_JOB_CLASS 19-MAR-13 12.21.33.000000 PM +08:00 SCOTT JOB_3 TRUE DEFAULT_JOB_CLASS 19-MAR-13 12.21.33.000000 PM +08:00 SCOTT JOB_4 TRUE DEFAULT_JOB_CLASS 19-MAR-13 12.21.34.000000 PM +08:00 SCOTT JOB_5 TRUE DEFAULT_JOB_CLASS 19-MAR-13 12.21.34.000000 PM +08:00 -->job执行的情况,可以看到5个job被逐一执行 -->尽管我们定义时的NEXT_RUN_DATE相差1秒,而此时job的执行后则每一个相差1分钟 -->job_1与job_5相差4分多钟,这是由于我们定义了dbms_lock.sleep(60)为1分钟 -->其次可以看出由于只有一个job进程,因此每一个job是一个一个被执行 SQL> select * from tb_job; JOB_N UPDATE_DT ----- -------------------- JOB_1 20130319 12:21:33 JOB_2 20130319 12:22:35 JOB_3 20130319 12:23:37 JOB_4 20130319 12:24:39 JOB_5 20130319 12:25:41
3、测试参数job_queue_processes大于1的情形
-->首先移除之前的job SQL> ho more remove_job.sql DECLARE job_name VARCHAR2 (10); BEGIN FOR i IN 1 .. 5 LOOP job_name := 'JOB_' || TO_CHAR (i); sys.DBMS_SCHEDULER.drop_job (job_name, force => TRUE); END LOOP; END; / SQL> @remove_job PL/SQL procedure successfully completed. -->此时设置job_queue_processes的值为6 SQL> alter system set job_queue_processes=6; System altered. -->清空测试用表 SQL> truncate table tb_job; Table truncated. -->此时Oracle为job启动了2个进程 SQL> ho ps -ef | grep rac11g | grep ora_j | grep -v grep oracle 3477 1 9 12:29 ? 00:00:01 ora_j000_rac11g1 oracle 3491 1 4 12:29 ? 00:00:00 ora_j001_rac11g1 -->添加多个job SQL> @add_job PL/SQL procedure successfully completed. --> Author : Robinson --> Blog : http://blog.csdn.net/robinson_0612 SQL> @jobs OWNER JOB_NAME ENABL JOB_CLASS NEXT_RUN_DATE -------------------- ----------- ----- --------------------- ----------------------------------- SCOTT JOB_1 TRUE DEFAULT_JOB_CLASS 19-MAR-13 12.31.55.000000 PM +08:00 SCOTT JOB_2 TRUE DEFAULT_JOB_CLASS 19-MAR-13 12.31.56.000000 PM +08:00 SCOTT JOB_3 TRUE DEFAULT_JOB_CLASS 19-MAR-13 12.31.56.000000 PM +08:00 SCOTT JOB_4 TRUE DEFAULT_JOB_CLASS 19-MAR-13 12.31.56.000000 PM +08:00 SCOTT JOB_5 TRUE DEFAULT_JOB_CLASS 19-MAR-13 12.31.56.000000 PM +08:00 -->片刻后可以看到job进程总数达到6个 SQL> ho ps -ef | grep rac11g | grep ora_j | grep -v grep oracle 7668 1 1 11:57 ? 00:00:01 ora_j000_rac11g1 oracle 7678 1 0 11:57 ? 00:00:01 ora_j001_rac11g1 oracle 7700 1 1 11:57 ? 00:00:01 ora_j002_rac11g1 oracle 9230 1 0 11:57 ? 00:00:00 ora_j003_rac11g1 oracle 9257 1 2 11:58 ? 00:00:01 ora_j005_rac11g1 oracle 9353 1 7 11:59 ? 00:00:00 ora_j004_rac11g1 -->查看表tb_job的情形 SQL> select * from tb_job order by 1,2; JOB_N UPDATE_DT ----- -------------------- JOB_1 20130319 12:31:57 JOB_1 20130319 12:32:58 JOB_1 20130319 12:33:59 JOB_2 20130319 12:31:58 JOB_2 20130319 12:32:59 JOB_2 20130319 12:34:00 JOB_3 20130319 12:31:58 JOB_3 20130319 12:32:59 JOB_3 20130319 12:34:00 JOB_4 20130319 12:31:59 JOB_4 20130319 12:33:00 JOB_4 20130319 12:34:01 JOB_5 20130319 12:31:58 JOB_5 20130319 12:32:59 JOB_5 20130319 12:34:00 -->从上面的查询结果可知每一个job的上一次与
版本11gR2中引入cursor sharing游标共享和mutex互斥锁增强的一些特性,而这些特性也带来了一些问题(主要体现在版本11.2.0.1和11.2.0.2上,11.2.0.3上基本已经修复)。
Cursor Obsolescence游标废弃是一种SQL Cursor游标管理方面的增强特性,该特性启用后若parent cursor父游标名下的子游标child cursor总数超过一定的数目,则该父游标parent cursor将被废弃,同时一个新的父游标将被开始。 这样做有2点好处:
- 避免进程去扫描长长的子游标列表child cursor list以找到一个合适的子游标child cursor
- 废弃的游标将在一定时间内被age out,其占用的内存可以被重新利用
实际在版本10g中就引入了该Cursor Obsolescence游标废弃特性,当时child cursor 的总数阀值是1024, 但是这个阀值在11g中被移除了,这导致出现一个父游标下大量child cursor即high version count的发生;由此引发了一系列的版本11.2.0.3之前的cursor sharing 性能问题,主要症状是版本11.2.0.1和11.2.0.2上出现大量的Cursor: Mutex S 和 library cache lock等待事件。
增强补丁Enhancement patch《Bug 10187168 – Enhancement to obsolete parent cursors if VERSION_COUNT exceeds a threshold》就该问题引入了新的隐藏参数_cursor_obsolete_threshold(Number of cursors per parent before obsoletion.),该”_cursor_obsolete_threshold”参数用以指定子游标总数阀值,若一个父游标的child cursor count<=>version count高于”_cursor_obsolete_threshold”,则触发Cursor Obsolescence游标废弃特性。
http://www.askmaclean.com/archives/11-2-obsolete-parent-cursors-_cursor_features_enabled-106001-event.html
注意版本11.2.0.3中默认就有”_cursor_obsolete_threshold”了,而且默认值为100。
对于版本11.1.0.7、11.2.0.1和11.2.0.2则都有该Bug 10187168的bug backport存在,从2011年5月开始就有相关针对的one-off backport补丁存在。 但是这些one-off backport补丁不使用”_cursor_obsolete_threshold”参数。在版本11.1.0.7、11.2.0.1和11.2.0.2上需要设置合适的”_cursor_features_enabled”(默认值为2)参数,并设置必要的106001 event,该event的level值即是child cursor count的阀值,必须设置该106001事件后该特性才生效。
但是请注意 ”_cursor_features_enabled”参数需要重启实例方能生效。而”_cursor_obsolete_threshold”参数和106001 event则可以在线启用、禁用。
对于不同的版本而言,一般推荐打上最新的PSU补丁,并根据补丁的README提示或者咨询Oracle Support获得关于该版本上Cursor Obsolescence问题的信息:
针对不同版本设置 ”_cursor_features_enabled”+106001 event的方法:
版本11.1.0.7 SQL> alter system set "_cursor_features_enabled"=18 scope=spfile; System altered. SQL> alter system set event='106001 trace name context forever,level 1024' scope=spfile; System altered. 并重启实例 版本11.2.0.1 SQL> alter system set "_cursor_features_enabled"=34 scope=spfile; System altered. SQL> alter system set event='106001 trace name context forever,level 1024' scope=spfile; System altered. 版本11.2.0.2 SQL> alter system set "_cursor_features_enabled"=1026 scope=spfile; System altered. SQL> alter system set event='106001 trace name context forever,level 1024' scope=spfile; System altered. 11gR2 Experience -> If using cursor_sharing = “FORCE” or “SIMILAR” 1) ORA-600 errors as workload increases [kkspsc0: basehd] or [kglLockOwnersListAppend-ovf] - applied patches to address 2) AWR showing -> cursor: mutex S and library cache lock 1. Download and apply the 11.2.0.2.3PSU Patch 11724916 2. Enable event 106001 to address Bug 10187168. To enable the fix "_cursor_features_enabled" needs to be set 3) Oracle 11.2.0.2.2 PSU (Patch Set Update) includes new parameters that you can tweak based on workload characteristics. Even more fixes have been added Note: 10411618 - Enhancement to add different "Mutex" wait schemes [ID 10411618.8] 4) 11.2.0.3 Has many Mutex enhancement’s 106001 level The level is used to specify the maximum number of child cursors that a parent can have before we obsolete the parent cursor and create a new parent. Doing the above can help reduce mutex waits, memory consumption and other side effects seen when we see many child cursors for a given parent.
题外话是11.2.0.3中的Mutex增强了很多,不要再跟着初学者论坛的那帮家伙一起愚蠢地大喊:”虽然版本升级 8i=>9i=>10g=>11g=>12c,但是我觉得oracle里面基础、核心的东西一直都没变了”这种神话了, 你一直浮游在Oracle的表面怎么可能知道Kernel到底有多大的变化?!!
版本10.2.0.4和11.1.0.6中”_library_cache_advice”=TRUE的情况下可能出现高latch:shared pool、latch: shared pool simulator等latch争用等待事件,默认情况下_library_cache_advice受到参数”statistics_level”的影响为TRUE,当_library_cache_advice=TRUE时他启用library cache simulator特性。
该library cache simulator特性负责估算shared pool LRU的表现,simulator模拟器收集heap内存堆大小以及load载入、pin、unpin的次数信息;通过这些数据来估算出若我们有更大的shared pool,我们可以由更大的共享池来缓存更多的SQL、PLSQL在共享池中,以此来节约加载时间。若我们设置更小的shared pool size,则又会对加载时间有何等的影响?
题外话:另一个对ASMM 下shared pool有作用的参数:
- If 0, will not try to shrink shared pool or Java pool
- If greater than zero, will wait this many seconds after failed shrink request to ask again
禁用library cache simulator设置”_library_cache_advice”=false”可能”(具体仍需要诊断)解决高latch:shared pool、latch: shared pool simulator、Library Cache – Mutex X具体等的问题,禁用library cache simulator会导致AWR中”shared pool advisory”和 “java pool advisory”2个环节不可用,但是这些特性实际可有可无。
但是”_library_cache_advice”=false”时且启用了ASMM(sga_target>0)的情况,注意为shared_pool_size设置一个合理的最小值!
分别在10.2.0.4和11.1.0.6上进行了针对解析的压力测试:
10204 no change baseline Executes/second = 3,610, DB Time = 12,349s, DB CPU = 8,938s, latch:library cache wait = 598s, avg.wait = 34ms 10204 – _library_cache_advice=off Executes/second = 3,843, DB Time = 16,208s, DB CPU = 9,402s, latch:library cache wait = 616s, avg. wait = 50ms 11106- no change -baseline Executes/second = 3,529, DB Time = 14,148s, DB CPU = 9,286s, library cache: mutex X wait = 2,725s, avg. wait = 1ms 11106 -session_cache=500, instantiation=150 Executes/second = 3,436, DB Time = 13,396s, DB CPU = 9,040s, library cache: mutex X wait = 2,383s avg. wait = 1ms 11106 – _library_cache_advice=off Executes/second = 6,059, DB Time = 75,134s, DB CPU = 17,321s, library cache: mutex X wait = 38,892s,avg. wait = 1ms
针对高latch:shared pool、latch: shared pool simulator、Library Cache – Mutex X解析类等待事件,解决的思路包括: