How is SQL Server Statistics Histogram Computation Used by The Query Optimizer – Part 1

In some of our past blogs, we have discussed how SQL Server Statistics are created, automatically maintained & updated demonstrating how the different computations of the histogram are done. With that said, today’s blog will be dealing with how this computed data is used by the SQL optimizer for Cardinality Estimation.

Histogram includes a lot of data such as Density Factor, RANGE_HI_KEYS, RANGE_ROWS, EQ_ROWS, DISTINCT_ROWS, and AVG_RANGE_ROWS.

The prime focus of this blog will be on RANGE_ROWS & EQ_ROWS.

We will be using AdventureWorks2008 for this demo.

The table SalesOrderHeader contains a TotalDue column (as shown below). This column comprises a variety of values ranging from a single-digit to multiple digits.

SELECT * FROM Sales.SalesOrderHeader

Histogram1

Now, let’s create statistics on TotalDue and then view the histogram using the DBCC SHOW_STATISTICS command.

CREATE STATISTICS TotalDue ON Sales.SalesOrderHeader(TotalDue)
GO
DBCC SHOW_STATISTICS (N'Sales.SalesOrderHeader', TotalDue)
GO

Histogram2

The histogram data is comprised of all the different columns mentioned earlier. For now, we will look into the most basic column EQ_ROWS.

On selecting a particular value from the boundaries of the histogram (RANGE_HI_KEY column), the EQ_ROWS tells the optimizer the exact number of rows in TotalDue column that match the value. But, in the case of a value that is not an exact match, but is present in-between two distinct values in the RANGE_HI_KEY column, it returns the AVG_RANGE_ROWS value instead. This is where the computed data from the histogram comes into play.

   

On selecting a value between 30.6859 & 32.0229, the optimizer always returns a value of 4.6, as computed in the AVG_RANGE_ROWS column.

Histogram3

The query shown below returns 59 rows having a value of 32.0229, which is the same as the value provided to us in the EQ_ROWS histogram statistics as seen earlier.

SELECT * FROM Sales.SalesOrderHeader
WHERE TotalDue = 32.0229

In the Execution Plan, upon hovering over the Filter operator, it is seen that the Actual Number of Rows & Estimated Number of Rows is the same. This is considered to be 100% or perfect Cardinality Estimation.

Histogram4

Changing the value at the WHERE clause to 31, the same query returns an empty result set since there are no values in the RANGE_HI_KEY that is an exact match.

SELECT * FROM Sales.SalesOrderHeader
WHERE TotalDue = 31

However, the Execution Plan shows an estimation of 4.6, similar to the computed value in the AVG_RANGE_ROWS column.

Histogram5

This operation can be performed with other values that do not directly reside in the RANGE_HI_KEY column, only to obtain similar results. This demonstrates how Statistics are used by the optimizer for Cardinality Estimation.

   

About Amit Bansal

Amit Bansal is always brainstorming around SQL Server. Despite working with SQL since 1997, he is amazed that he keeps learning new things every single day. SQL Server is AB's first love, and his wife does not mind that. He tries to share as much and spreads the SQL goodness. Internals and Performance Tuning excites him, and also gives him sleepless nights at times, simply because he is not a genius, but quite a hard worker and does not give up. It has been a long and exciting journey since 1997, you can read here: http://sqlmaestros.com/amit-bansal/ He is on Twitter: https://www.twitter.com/A_Bansal

View all posts by Amit Bansal →

Leave a Reply

Your email address will not be published.