SQL Server Statistics Density – What is the purpose?

Hi Friends,

There is a table in AdventureWorks2008R2 called SalesOrderHeader which has a column called TotalDue. Let us create statistics dennsity for this column

USE [AdventureWorks2008R2]
GO
 
CREATE STATISTICS [st_TotalDue] ON [Sales].[SalesOrderHeader]([TotalDue])
GO

Now, let us display the statistics that we have just created:

DBCC Show_Statistics('Sales.SalesOrderHeader',st_TotalDue)
GO

We all know the purpose of statistics, right? It helps the SQL Query Optimizer in cardinality estimation. In other words, it helps the optimizer to produce an efficient query execution plan.

What exactly is the purpose of DENSITY factor here?

1_SQL_Server_What_is_the_purpose_of_DENSITY_in_STATISTICS

Density can be calculated as 1/ distinct values and can be cross verified using the below query.

Select [DistinctValue]=Count(Distinct TotalDue),
[Density]=1/CAST(Count(Distinct TotalDue) AS Float) 
from Sales.SalesOrderHeader 
GO

2_SQL_Server_What_is_the_purpose_of_DENSITY_in_STATISTICS

   

But what does this mean and where does it help really?
Density, just like other items in statistics describes the data. It is a calculation of uniqueness of data in that column. It answers the question: “How often duplicate values occur in TotalDue column?”

High density would mean less unique data. And low density would mean greater uniqueness. Density can only be between 0 and 1. So what if density is 1? It means 100% duplicate records.

Density information is vital to improve the Query Optimizer’s estimates for GROUP BY operations, and that’s just one of the uses.

Let us consider the same example as above, the number of distinct values in TotalDue column of Sales.SalesOrderHeader table is 4754. Density can be computed, as mentioned earlier, as 1 / “number of distinct values,” which will be 1 / 4754, which is 0.000210349179638199 as shown in the above image.

The Query Optimizer can benefit from already available density information to estimate the cardinality of GROUP BY queries (estimated number of distinct values). Since this density information is already available, now all the optimizer needs to do is to calculate the estimated number of distinct values by calculating the reciprocal of the density value. As shown earlier in the verification code, to estimate the cardinality of the following query using GROUP BY TotalDue, we can calculate the reciprocal of the TotalDue density which will be 1 / 0.000210349179638199, that gives us 4754, and that is the estimated number of rows as shown below:

[Turn on Actual Execution Plan]
SELECT TotalDue from Sales.SalesOrderHeader
GROUP BY TotalDue

3_SQL_Server_What_is_the_purpose_of_DENSITY_in_STATISTICS

In my next post I shall show how density information can be used to estimate the cardinality of a query using local variables.

 

 

   

About Amit Bansal

Amit Bansal is always brainstorming around SQL Server. Despite working with SQL since 1997, he is amazed that he keeps learning new things every single day. SQL Server is AB's first love, and his wife does not mind that. He tries to share as much and spreads the SQL goodness. Internals and Performance Tuning excites him, and also gives him sleepless nights at times, simply because he is not a genius, but quite a hard worker and does not give up. It has been a long and exciting journey since 1997, you can read here: http://sqlmaestros.com/amit-bansal/ He is on Twitter: https://www.twitter.com/A_Bansal

View all posts by Amit Bansal →

2 Comments on “SQL Server Statistics Density – What is the purpose?”

  1. So density is equl to cardinality?!?! Why change the name? Microsoft and Oracle need to use common terms very frustrating when they change it.

  2. Thank you Amit. Great write up on this aspect of SQL Server. Very clear and concise. Excellent visuals and examples.

Leave a Reply

Your email address will not be published.