SQL Server Index Internals Part 2 – Why is the Optimizer SEEKING instead of SCANNING

Who is online?  133 guests and 0 members
home  »  blogs  »  Amit Bansal  »  SQL Server Index Internals Part 2 – Why is the Optimizer SEEKING instead of SCANNING
  Rate This Blog Entry:  register  or  login

Author

AmitBansal Amit Bansal (Member since: 3/12/2011 4:59:54 PM)
Follow Amit at Twitter @A_Bansal : Amit Bansal is the CTO of eDominer Systems & Peopleware India. He is a consultant, trainer, writer, speaker & evangelist on SQL Server & Business Intelligence. A seasoned speaker; he speaks at major summits and conferences of Microsoft worldwide including TechED, MCT Summits, MSDN & TechNet conferences. Over the last 8 years, he has consulted, trained & mentored more than 6000 IT professionals on SQL Server & Business Intelligence and worked with top notch blue chip companies worldwide including Microsoft, Infosys, Wipro, RBS, HCL, HP, Siemens, IBM, Accenture, etc. He has delivered more than 400 workshops on SQL Server & Business Intelligence. Apart from holding many Microsoft credentials, he is also a Microsoft Most Valuable Professional (MVP) awardee on SQL Server. He is also one of the first Microsoft Certified Trainer in India. Amit has worked with Microsoft in India and US as a Subject Matter Expert in various capacities participating in OD sessions, technical reviewing, etc. He is also a very active community lead in Asia Pacific. Visit www.amitbansal.net to know more. Subscribe to Amit’s blog and be a member of his technical forum at www.SQLServerGeeks.com - an exclusive SQL Server portal. Visit www.peoplewareindia.com for Corporate Training solutions. FaceBook – http://www.facebook.com/people/Amit-Bansal/525339346 LinkedIn – http://www.linkedin.com/pub/amit-bansal/7/121/755

View Amit Bansal 's profile

Comments (10)

Prashant
3/30/2012 8:40:01 PM Prashant said:

Hi Amit, 

Interesting ask. To keep my answer short, I’d say (it loud) that Index seek is a confusing term. It is either a seek or a seek over a range scan as it depends upon the  predicate specified. For example, in this case when you specify “where ContactID BETWEEN 1 AND 18000” the clustered index PK_Contact_ContactID is actually doing a seek staring from the root but at the leaf level it scan for a range which satisfies ContactID>1 and ContactID<18000 in an orderly fashion.

 

If you look at the XML plan (or properties of the CI seek operator in graphical plan) you will see the same

<RelOp AvgRowSize="4387" EstimateCPU="0.0199511" EstimateIO="0.380903" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="17994.6" LogicalOp="Clustered Index Seek" NodeId="0" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.400854" TableCardinality="19972">

           

<IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">

by
prithiviraj Kulasingham
3/31/2012 2:18:26 AM prithiviraj Kulasingham said:

Hi Amit,

This is one of the common mistakes peole make.  They believe that seek "always" seek for a specific row and thats all. To add to the complexity, they believe that scan means 'scanning the entire set of pages (either an index or table).  Both these assumptions/believes are wrong. Seek could seek more than one row at a single pass and scan can be a short scan too. 

The decision of scan or seek depends only on one point:  Does the query has a reference point to start? If it does,  it will be a seek operation.  Otherwise it will be a scan operation.  

I wrote an article some times pack on this:  "Understanding scans and seeks" you can read about it more here:  http://www.sql-server-performance.com/2009/understanding-scans-seeks/

In your case, you will see only one seek operation, but returning multiple rows.  

by
Deepak
3/31/2012 9:08:22 AM Deepak said:

May be the stastics are out of date &  query analyzer thought it is better to have index seek than index scan.

or

By using between keyword it is searching between two predicate's  as shown in execution plan & query analyzer is unaware of how many values its going to return.

by
Number2
3/31/2012 4:31:54 PM Number2 said:

I guess I don't see the problem...a seek is more direct than a scan.   I've always considered a clustered index scan like a full table scan since a clustered index is really nothing more than the physical table re-ordered to have those columns you specify in order. In this case you're using a specific range of ContactIDs, a column on which this table is physically ordered.  So specifying a range of 1-18000 doesn't mean SQL has to do 18000 seeks, it means it can efficiently seek to the start of the range and seek to the end of the range, and pull only those pages where those values reside and nothing more.  On the other hand, a SCAN would dig through every row in that column saying "are you 1-18000? nope.  Are you 1-18000?  nope.  Are you 1-18000? yep. are you..."   and so on.  

So, I'd much rather see a seek than a scan in this case.  It'll mean only touching the pages where those 18000 records reside instead of the pages where ALL ContactIDs reside.

John Nelson

by
Gurpreet Singh Sethi
3/31/2012 5:48:28 PM Gurpreet Singh Sethi said:

Hi Amit,

This is good one.

Below is my view:

Total Number of Rows in Person.Contact table: 19972

Query which we are going to perform is going to fetch rows: 17995 which is 90.10% of the rows of the table.

Ideally speaking yes it should go for a Cluster Index Scan rather than Cluster Index Seek but optimizer chooses Cluster Index Seek WHY.....?

Well if we look at tha STATISTICS TIME ON and IO ON for the SELECT * FROM PERSON.CONTACT table which goes for a Index Scan took CPUTIME : 31 ms and ELAPSED TIME: 1162 ms

set statistics io on

set statistics time on

 

SELECT *  FROM Person.Contact

 

/*

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

 

 SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 26 ms.

 

(19972 row(s) affected)

Table 'Contact'. Scan count 1, logical reads 561, physical reads 2, read-ahead reads 559, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

(1 row(s) affected)

 

 SQL Server Execution Times:

   CPU time = 31 ms,  elapsed time = 1162 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

 

 SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

 

*/

But when we ran 2nd query i.e. BETWEEN 1 and 18000 which returns 17995 rows the values turn out to be:

SELECT *  FROM Person.Contact  where ContactID BETWEEN 1 AND 18000

 

/*

 

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

 

 SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 1 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

 

(17995 row(s) affected)

Table 'Contact'. Scan count 1, logical reads 506, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

(1 row(s) affected)

 

 SQL Server Execution Times:

   CPU time = 78 ms,  elapsed time = 442 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

 

 SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

 

*/

 

Above we can see that ELAPSE TIME to run this query with INDEX SEEK is 442 MS but yes CPU time is 78 ms (which is almost double than a index scan when we performed SELECT * operation).

 

So Optimizer thinks that if it runs this query by performing INDEX SEEK it will complete fast rather than the INDEX SCAN.

 

 

 

by
Gurpreet Singh Sethi
3/31/2012 5:49:58 PM Gurpreet Singh Sethi said:

Hi Amit,

This is good one.

Below is my view:

Total Number of Rows in Person.Contact table: 19972

Query which we are going to perform is going to fetch rows: 17995 which is 90.10% of the rows of the table.

Ideally speaking yes it should go for a Cluster Index Scan rather than Cluster Index Seek but optimizer chooses Cluster Index Seek WHY.....?

Well if we look at tha STATISTICS TIME ON and IO ON for the SELECT * FROM PERSON.CONTACT table which goes for a Index Scan took CPUTIME : 31 ms and ELAPSED TIME: 1162 ms

set statistics io on

set statistics time on

 

SELECT *  FROM Person.Contact

 

/*

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

 

 SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 26 ms.

 

(19972 row(s) affected)

Table 'Contact'. Scan count 1, logical reads 561, physical reads 2, read-ahead reads 559, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

(1 row(s) affected)

 

 SQL Server Execution Times:

   CPU time = 31 ms,  elapsed time = 1162 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

 

 SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

 

*/

But when we ran 2nd query i.e. BETWEEN 1 and 18000 which returns 17995 rows the values turn out to be:

SELECT *  FROM Person.Contact  where ContactID BETWEEN 1 AND 18000

 

/*

 

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

 

 SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 1 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

 

(17995 row(s) affected)

Table 'Contact'. Scan count 1, logical reads 506, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

(1 row(s) affected)

 

 SQL Server Execution Times:

   CPU time = 78 ms,  elapsed time = 442 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

 

 SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

 

*/

 

Above we can see that ELAPSE TIME to run this query with INDEX SEEK is 442 MS but yes CPU time is 78 ms (which is almost double than a index scan when we performed SELECT * operation).

 

So Optimizer thinks that if it runs this query by performing INDEX SEEK it will complete fast rather than the INDEX SCAN.

 

 

Regards

Gurpreet Singh Sethi

by
Jorge Serres
3/31/2012 6:38:15 PM Jorge Serres said:

Since, ContactID is the key in the Clustered Index, the optmizer seeks the 1st ocurrence of ContactID (1).

Since the Index/table is physically ordered on ContactID, All ContactID BETWEEN 1 AND 18000 wiil be returned by seeking the last position of the

ordered range. Almost as when one uses low level file operations with (FOPEN, FSEEK, FREAD etc). Let's say we had a non clustered index based on lastname

and firstname. If we were to look for ("john Smith"). The first thing the optmizer would do is an index seek to position itself in the first ocurrence of "Smith". Then an Index Scan within the "Smith" dataset that woilukd be ordered by name, to then look for "John".

by
Sandeep Prajapati
4/1/2012 10:58:10 AM Sandeep Prajapati said:

Hello,

When whole table data is searched on or say no where clause is applied in your select statement SQL server scans whole table to fetch the results.

When there is a where clause or something like range is given to select result from the table it will always perform a cluster seek...

So the secret lies in Range in where clause.

by
Ritz
4/2/2012 8:22:51 AM Ritz said:

This is because Optimizer is beliving that it does not have to scan the entire table and there is an available index on ContactID. It therefore uses this Index along with partial scan. Hope this answers...Thanks

by
Ramkumar
4/23/2012 5:30:03 PM Ramkumar said:

Amit, How are you.

Interesting question. would like to know your point of view reg. this

This questions my understanding about SCAN and SEEK.

Thanks
Ram

by

Leave a comment

Name:
  *  
Email:  (your email is kept secure)
 *  
Website:

example: "http://www.SQLServerGeeks.com"

 
Type the characters:
 *
 
   

SQLServerGeeks FaceBook group

SQLServerGeeks Events

Training on Microsoft Products & Technologies

Email Subscriptions

   Get the Most Recent Blogs in your inbox
Blogs RSS Feed

Amit Bansal's latest blog posts

Blogs RSS Feed

Latest community blog posts

  • Hi Friends, In this blog post, let’s have an insight on how Auditing can be done in SSIS packages? The basic auditing in package includes measures like How many rows were inserted, updated or de...
  • Hi SQL Geeks, Here are the blog posts by Piyush Bajaj for the month of May. We would expect more of his informative blogs this month. Happy Learning! If you like our blogs do like us on Facebook at ht...
  • Hi SQL Geeks, Here are the blog posts by Ahmad Osama for the month of May. We would expect more of his informative blogs this month. Happy Learning! If you like our blogs do like us on Facebook at htt...
  • Hi Friends, Are you facing any issue on SQL Server 2014 while running the same query which is running fine on previous version of SQL Server? If answer is yes then this may be due to the change in SQL...
  • I have written a SQL Server Change Data Capture GUI interface to facilitate for CDC operations. The project is hosted here https://sqlcdcapp.codeplex.com/ . Feel free to download, review and suggest c...
  • Recently I came across an interesting deadlock scenario I wasn’t aware off. I didn’t have idea that foreign key constraint can also result in deadlocks. The detailed steps to replicate the...
  • Hi Friends, In my earlier blog, I just explain about the importance of filtered indexes. Link for that blog is mention below: http://sqlservergeeks.com/blogs/princerastogi/personal/599/filtered-indexe...
  • A few words about the technology of our site. It's easy to take technology for granted, to focus on content and pay no attention to how it is being delivered. Typically we would encourage this focus. ...