多列复合索引的使用绕过微软sql server的一个缺陷__and_60_or_the_server_

当前位置: 数据库>sqlserver

多列复合索引的使用绕过微软sql server的一个缺陷

来源: 互联网发布时间：2014-10-08

本文导语: 然而，微软sql server在处理这类索引时，有个重要的缺陷，那就是把本该编译成索引seek的操作编成了索引扫描，这可能导致严重性能下降举个例子来说明问题，假设某个表T有索引 ( cityid, sentdate, userid), 现在有个分页列表功能...

然而，微软sql server在处理这类索引时，有个重要的缺陷，那就是把本该编译成索引seek的操作编成了索引扫描，这可能导致严重性能下降

举个例子来说明问题，假设某个表T有索引 ( cityid, sentdate, userid), 现在有个分页列表功能，要获得大于某个多列复合索引V0的若干个记录的查询，用最简单表意的方式写出来就是 V >= V0, 如果分解开来，就是:
cityid > @cityid0 or (cityid = @cityid0 and (sentdate > @sentdate0 or (sentdate = @sentdate0 and userid >= @userid0))),

当你写出上述查询时，你会期待sql server会自动的把上述识别为V >= V0类型的边界条件，并使用index seek操作来实施该查询。然而，微软的sql server (2005版)有一个重要缺陷(其他的sql server如何还不得知), 当它遇到这样sql时，sql server就会采用index scan来实施，结果是您建立好的索引根本就没有被使用，如果这个表的数据量很大，那所造成的性能下降是非常大的。
对于这个问题，我曾经提交给微软的有关人士，他们进一步要求我去一个正式的网站上去提交这个缺陷，我懒得去做。

不过，对这个缺陷，还是有个办法能够绕过去的，只要把上面给出的条件变变形，sql server还是能够变回到是用index seek, 而不是低性能的index scan. 具体请看我的英文原文吧(对不起了, 我一旦写了中文,就不想翻成英文,反过来也一样, 估计大家英文都还可以,实在不行的就看黑体部分吧, )：
The seek predicate of the form "x > bookmark_of_x" is needed in paging related query. The compiler has no difficulty to parse it correctly if x is a single column index, or two columns index, however, if x is a three columns index or more, then the compiler will have a hard time to recognize it. This failure will result in that the seek predicate ended up in residue predicate, which results in a much worse execution plan.
To illustrate the point, take a example,
Create table A( a int, b int, c int, d float, primary key (a, b, c))
now check the plan for the query:
select c, d from A where (a> 111 or a= 111 and
(b > 222 or b = 222 and c > 333))
you can see a table scan op is used, and the Where clause ended up in residue predicate.
However, if you rewrite the query in an equivalent form:
select c, d from A where a> 111 or a= 111 and b > 222 or a= 111 and b= 222 and c >333
Then the compiler can choose an index seek op, which is desired.
The problem is, the compiler should be able to recognize the first form of seek predicate on multiple columns index, it saves the user from having to pay extra time to figure out a get-around, not to mention the first form is a more efficient form of same expression.
上面的问题，可以说是部分的绕过去了，但是，也有绕不过的时候，接着看下面一段:
It looks like that sql server lacks a consept of vector bookmark, or vector comparison or whatever you like to call it.
The workaround is not a perfect workaround. If sql server were to understand the concept of vector bookmark, then the following two would be the same in execution plan and performance:
1. select top(n) * from A where vectorIndex >= @vectorIndex
2. select * from A where vectorIndex >= @vectorIndex and vectorIndex 60 or a= 60 and b > 20
4. select a, b, c, d from A where (a> 60 or a= 60 and b > 20) and
(a< 60 or a= 60 and b 60 or a= 60 and
(b > 20 or b = 20 and c >= 31)) and (a< 60 or a= 60 and
(b < 20 or b = 20 and c 60 or a= 60 and b > 20 or a= 60 and b= 20 and c >= 31
select a, b, c, d from A where (a> 60 or a= 60 and b > 20 or a= 60 and b= 20 and c >= 31) and
(a< 60 or a= 60 and b < 20 or a= 60 and b= 20 and c 60 or a= 60 and b > 20
select a, b, c, d from A where (a> 60 or a= 60 and b > 20) and (a< 60 or a= 60 and b 60 or a= 60 and b > 20
select a, b, c, d from A where (a> 60 or a= 60 and b > 20) and (a< 60 or a= 60 and b

您可能感兴趣的文章:

本站(WWW.)旨在分享和传播互联网科技相关的资讯和技术，将尽最大努力为读者提供更好的信息聚合和浏览方式。
本站(WWW.)站内文章除注明原创外，均为转载、整理或搜集自网络。欢迎任何形式的转载，转载请注明出处。