今天进行MERGE操作的时候是根据两个2个大表大约每张表10G左右,进行更新,服务器使用了PCI-E闪存盘,速度还是相当快的,大概语句如下:
merge into TEST_EVENT a
using
TEST_EVENT_1 b
on(a.MAIN_TARGET=b.MAIN_TARGET and a.EVENT_TYPE=b.EVENT_TYPE)
when matched then
update
set
a.CARD_NO=b.CARD_NO,a.EVENT_ACCOUNT=b.EVENT_ACCOUNT,a.EXTEND_PROPERTIES=b.EXTEND_PROPERTIES
where
b.EVENT_NAME= 'WITHDRAW' AND b.EVENT_TYPE IN ('SYNCHRONIZE','ASYNCHRONIZE')
AND
b.raw_add_time>=to_date('2014-01-01','yyyy-mm-dd')
AND
b.raw_add_time
出现了2个问题如下进行分析和说明:
1、merge 链接不稳定
报错
ORA-30926: 无法在源表中获得一组稳定的行
MERGE is a deterministic statement. You therefore need to evaluate the data returned by
the USING clause to ensure that there are no duplicate values in the join. Modify
the merge statement to include a deterministic where clause.
进行举例
SQL> select * from testmerg;
ID NAME
----------- --------------------
1 gaopeng
2 gaopeng
3 gaopeng
SQL> select * from testmerg2;
ID NAME
----------- --------------------
1 yanlei
2 yanlei
2 yanlei1
SQL>
SQL> merge into testmerg a
2 using testmerg2 b
3 on (a.id=b.id)
4 when matched then update
5 set a.name=b.name;
merge into testmerg a
using testmerg2 b
on (a.id=b.id)
when matched then update
set a.name=b.name
ORA-30926: 无法在源表中获得一组稳定的行
可以看到我们根据ID进行连接(a.id=b.id),但是源表ID有2个ID为2的行,所以MERGE不知道如何修改,到底把ID=2的 2 gaopeng 这一行更改为 yanlei,还是 yanlei1呢?
所以这是MERGE需要注意地方,如果不能满足这个条件MERGE是做不了的。
2、merge 临时表空间过大,并且伴随了大量的 direct path read temp等待事件
Explanation:
============
We might consume more temp space on disk only when the build side(the smaller
side in hash join) can't be fit in memory entirely. If it fits in memory, we
don't need any buffering for the probe side(the rightside).
If the staging table (smaller table participating in the hash join)
is too big to fit into the build side (in memory),
the right hand side's hash partitions need to be buffered, and thus the
huge temp segments requirement.
If the hash area size is large enough to accomodate the build side in memory, you
should not see this type of consumption.
如上解释,
SQL> select SQL_ID,BLOCKS*8/1024/1024,SEGTYPE from v$sort_usage;
SQL_ID BLOCKS*8/1024/1024 SEGTYPE
------------- ------------------ ---------
08mh0r0xsj3h6 0.0009765625 DATA
08mh0r0xsj3h6 0.0009765625 LOB_DATA
08mh0r0xsj3h6 0.0009765625 INDEX
dyk4dprp70d74 7.0634765625 HASH
如下看到类型为HASH大小约为7G,查看执行计划
PLAN_TABLE_OUTPUT
------------------------------------------------------------
Plan hash value: 3625956359
------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
------------------------------------------------------------
| 0 | MERGE STATEMENT | | 2680K| 7953M|
| 1 | MERGE | SYN_EVENT | | |
| 2 | VIEW | | | |
|* 3 | HASH JOIN | | 2680K| 4605M|
| 4 | TABLE ACCESS FULL| SYN_EVENT_1 | 2680K| 2259M|
| 5 | TABLE ACCESS FULL| SYN_EVENT | 21M| 18G|
------------------------------------------------------------
可以看到临时表空间的增加来自于HASH JION,其实HASH JION 并不是MERGE的唯一的
链接方式.因为HASH JION是使用不到索引的,所以想使用到索引需要使用USE_NL的提示如下
merge /*+ leading( b ) use_nl( a b ) */ into SYN_EVENT a
using SYN_EVENT_1 b
on (a.id = b.id )
when matched then
update
set a.CARD_NO = b.CARD_NO,
a.EVENT_ACCOUNT = b.EVENT_ACCOUNT,
a.EXTEND_PROPERTIES = b.EXTEND_PROPERTIES
where b.EVENT_NAME = 'WITHDRAW'
AND b.EVENT_TYPE IN ('SYNCHRONIZE', 'ASYNCHRONIZE')
AND b.raw_add_time >= to_date('2014-01-01', 'yyyy-mm-dd')
AND b.raw_add_time < to_date('2015-01-01', 'yyyy-mm-dd');
但是这样仅仅是使用了NEST LOOP 代替了HASH JION,NEST LOOP的被驱动表是可以使用索引的,可以测试出确实TEMPSPACE来自于HASH JION。
在我的机器上测试syn_event数据量500W大约6G数据,SYN_EVENT_1数据量大约250W大约3G数据,的数据包含CLOB字段进行更新,更改执行计划后耗时大约为750 S
ELAPSED_SECONDS TIME_REMAINING SQL_ID SOFAR TOTALWORK
--------------- -------------- ------------- ---------- ----------
750 0 90ccdwpdtaq81 322615 322615
下面是更改为NSET LOOP后相关资源消耗
SID NAME VALUE
---------- ---------------------------------------------------------------- ----------
20 session logical reads 5110391
20 physical read total IO requests 218114
20 physical read total multi block requests 2520
20 physical read total bytes 4402323456
20 logical read bytes from cache 4186432307
20 physical reads 537393
20 physical reads cache 537393
20 physical read IO requests 218114
20 physical read bytes 4402323456
20 physical reads cache prefetch 319289
20 redo entries 816088
20 redo size 396696100
20 redo buffer allocation retries 1
20 redo log space requests 2
20 redo log space wait time 5
20 redo ordering marks 33766
20 redo subscn max counts 33781
20 redo synch time (usec) 5
20 redo synch writes 2
20 Batched IO vector read count 8
SID NAME VALUE
---------- ---------------------------------------------------------------- ----------
20 undo change vector size 263713896
20 data blocks consistent reads - undo records applied 206356
20 no work - consistent read gets 404311
20 cleanouts only - consistent read gets 1
20 cleanouts and rollbacks - consistent read gets 103178
20 IMU undo allocation size 7624
20 sorts (memory) 17
20 sorts (rows) 68
: