3/29/2012 12:22:03 PM
Amit Bansal -
I had started the index internal series sometime back. The first part is here:http://www.sqlservergeeks.com/blogs/AmitBansal/sql-server-bi/487/sql-server-index-internals-part-1-%E2%80%93dynamic-index-seek-operation
Apologies for making it late on Part 2 – this part is more about some food for thought in index internals. So here is the question:
In AdventureWorks database, there is a table called Person.Contact and there is a clustered index on COntactID. So here are the scenarios:
When you issue out the following query, the optimizer rightly does a clustered index scan:
When you issue out the following query, the optimizer rightly does a clustered index seek. The output gives you one record, which means SQL Server has done one seek.
where ContactID = 1
But when you issue out the following query, it seeks again. The output returns 17995 records. Does it mean that SQL Server is seeking 17995 times???
where ContactID BETWEEN 1 AND 18000
I hope the scenario is clear. I am looking for answers from all of you. Let me know what you think. Why is the optimizer SEEKING instead of scanning, as you know, in this case, scanning could be faster. Awaiting your responses !
If you liked this post, do like us on FaceBook at http://www.FaceBook.com/SQLServerGeeks
Have a SQL Server question? Join the fastest growing SQL Server facebook group at: http://www.facebook.com/groups/458103987564477/
Visit my FaceBook page at http://www.facebook.com/AmitRSBansal
Contribute on SQLServerGeeks.com: visit http://www.sqlservergeeks.com/default-category/write-for-us
Amit Bansal (Member since: 3/12/2011 4:59:54 PM)
Follow Amit at Twitter @A_Bansal : Amit Bansal is the CTO of eDominer Systems & Peopleware India. He is a consultant, trainer, writer, speaker & evangelist on SQL Server & Business Intelligence. A seasoned speaker; he speaks at major summits and conferences of Microsoft worldwide including TechED, MCT Summits, MSDN & TechNet conferences. Over the last 8 years, he has consulted, trained & mentored more than 6000 IT professionals on SQL Server & Business Intelligence and worked with top notch blue chip companies worldwide including Microsoft, Infosys, Wipro, RBS, HCL, HP, Siemens, IBM, Accenture, etc. He has delivered more than 400 workshops on SQL Server & Business Intelligence. Apart from holding many Microsoft credentials, he is also a Microsoft Most Valuable Professional (MVP) awardee on SQL Server. He is also one of the first Microsoft Certified Trainer in India. Amit has worked with Microsoft in India and US as a Subject Matter Expert in various capacities participating in OD sessions, technical reviewing, etc. He is also a very active community lead in Asia Pacific. Visit www.amitbansal.net to know more. Subscribe to Amit’s blog and be a member of his technical forum at www.SQLServerGeeks.com - an exclusive SQL Server portal. Visit www.peoplewareindia.com for Corporate Training solutions.
FaceBook – http://www.facebook.com/people/Amit-Bansal/525339346
LinkedIn – http://www.linkedin.com/pub/amit-bansal/7/121/755
View Amit Bansal 's profile
ask. To keep my answer short, I’d say (it loud) that Index seek is a confusing
term. It is either a seek or a seek over a range scan as it depends upon the predicate specified.
For example, in this case when you specify “where ContactID BETWEEN 1 AND 18000” the clustered index PK_Contact_ContactID is actually
doing a seek staring from the root but at the leaf level it scan for a range
which satisfies ContactID>1 and ContactID<18000 in an
If you look
at the XML plan (or properties of the CI seek operator in graphical plan) you
will see the same
<RelOp AvgRowSize="4387" EstimateCPU="0.0199511" EstimateIO="0.380903" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="17994.6" LogicalOp="Clustered Index Seek" NodeId="0" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.400854" TableCardinality="19972">
<IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">
This is one of the common mistakes peole make. They believe that seek "always" seek for a specific row and thats all. To add to the complexity, they believe that scan means 'scanning the entire set of pages (either an index or table). Both these assumptions/believes are wrong. Seek could seek more than one row at a single pass and scan can be a short scan too.
The decision of scan or seek depends only on one point: Does the query has a reference point to start? If it does, it will be a seek operation. Otherwise it will be a scan operation.
I wrote an article some times pack on this: "Understanding scans and seeks" you can read about it more here: http://www.sql-server-performance.com/2009/understanding-scans-seeks/
In your case, you will see only one seek operation, but returning multiple rows.
May be the stastics are out of date & query analyzer thought it is better to have index seek than index scan.
By using between keyword it is searching between two predicate's as shown in execution plan & query analyzer is unaware of how many values its going to return.
I guess I don't see the problem...a seek is more direct than a scan. I've always considered a clustered index scan like a full table scan since a clustered index is really nothing more than the physical table re-ordered to have those columns you specify in order. In this case you're using a specific range of ContactIDs, a column on which this table is physically ordered. So specifying a range of 1-18000 doesn't mean SQL has to do 18000 seeks, it means it can efficiently seek to the start of the range and seek to the end of the range, and pull only those pages where those values reside and nothing more. On the other hand, a SCAN would dig through every row in that column saying "are you 1-18000? nope. Are you 1-18000? nope. Are you 1-18000? yep. are you..." and so on.
So, I'd much rather see a seek than a scan in this case. It'll mean only touching the pages where those 18000 records reside instead of the pages where ALL ContactIDs reside.
This is good one.
Below is my view:
Total Number of Rows in Person.Contact table: 19972
Query which we are going to perform is going to fetch rows: 17995 which is 90.10% of the rows of the table.
Ideally speaking yes it should go for a Cluster Index Scan rather than Cluster Index Seek but optimizer chooses Cluster Index Seek WHY.....?
Well if we look at tha STATISTICS TIME ON and IO ON for the SELECT * FROM PERSON.CONTACT table which goes for a Index Scan took CPUTIME : 31 ms and ELAPSED TIME: 1162 ms
set statistics io on
set statistics time on
SELECT * FROM Person.Contact
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
CPU time = 0 ms, elapsed time = 26 ms.
(19972 row(s) affected)
Table 'Contact'. Scan count 1, logical reads 561, physical reads 2, read-ahead reads 559, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
CPU time = 31 ms, elapsed time = 1162 ms.
But when we ran 2nd query i.e. BETWEEN 1 and 18000 which returns 17995 rows the values turn out to be:
SELECT * FROM Person.Contact where ContactID BETWEEN 1 AND 18000
CPU time = 0 ms, elapsed time = 1 ms.
(17995 row(s) affected)
Table 'Contact'. Scan count 1, logical reads 506, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CPU time = 78 ms, elapsed time = 442 ms.
Above we can see that ELAPSE TIME to run this query with INDEX SEEK is 442 MS but yes CPU time is 78 ms (which is almost double than a index scan when we performed SELECT * operation).
So Optimizer thinks that if it runs this query by performing INDEX SEEK it will complete fast rather than the INDEX SCAN.
Gurpreet Singh Sethi
Since, ContactID is the key in the Clustered Index, the optmizer seeks the 1st ocurrence of ContactID (1).
Since the Index/table is physically ordered on ContactID, All ContactID BETWEEN 1 AND 18000 wiil be returned by seeking the last position of the
18000 wiil be returned by seeking the last position of the
ordered range. Almost as when one uses low level file operations with (FOPEN, FSEEK, FREAD etc). Let's say we had a non clustered index based on lastname
and firstname. If we were to look for ("john Smith"). The first thing the optmizer would do is an index seek to position itself in the first ocurrence of "Smith". Then an Index Scan within the "Smith" dataset that woilukd be ordered by name, to then look for "John".
When whole table data is searched on or say no where clause is applied in your select statement SQL server scans whole table to fetch the results.
When there is a where clause or something like range is given to select result from the table it will always perform a cluster seek...
So the secret lies in Range in where clause.
This is because Optimizer is beliving that it does not have to scan the entire table and there is an available index on ContactID. It therefore uses this Index along with partial scan. Hope this answers...Thanks
Amit, How are you.
Interesting question. would like to know your point of view reg. this
This questions my understanding about SCAN and SEEK.
Leave a comment