Hi Friends,

We all know that in SQL Server, resource allocation for a query execution depends on the number of rows to be processed here for number of rows you can say Cardinality Estimation. But how sql server knows about this number of rows to be processed without scanning the tables or indexes before the execution? Actually, SQL Server decides it on the basis of statistics where statistics are like an object which stores the information about data distribution. Each statistics contain three sections Stat Header, Density Vector and Histogram.

1-      Header: Contain the information about statistics last update date, sample rows, filtered or not etc.
2-      Density Vector: Here it shows the density for some combinations of columns. Which is calculated on the basis of density=1/ distinct number of rows for column(s).
3-      Histogram: Here sql server histogram store much more detailed information about data distribution for leading column.

Let me show you this practically.

stats_1

In the above figure, you can see all the three sections of stats. Let me explain the meaning of each column under histogram:

RANGE_HI_KEY: Values shown under Leading column of stats i.e. bal.
RANGE_ROWS: number of rows falls under that step excluding upper boundary value.
EQ_ROWS: number of rows for upper boundary value.
DISTINCT_RANGE_ROWS: number of distinct rows falls under that step excluding upper boundary value.
AVG_RANGE_ROWS: average number of rows falls under that step excluding upper boundary value.

Now let me run the below query on this table with actual execution plan.

stats_2

Here estimated numbers of rows have been calculated based on stats histogram. Let me show you.

stats_3

Under that query we select all the rows where bal<30. From histogram we will

1-      Select all number of rows under RANGE_ROWS because it represents number of rows falls under that range excluding upper boundary. Highlighted as RED.

2-       As specified earlier, RANGE_ROWS will not provide number of rows for upper boundary so we will also add values under EQ_ROWS because it represents number of rows for upper boundary. Highlighted as BLUE.

3-      Here, we will exclude the EQ_ROWS (number of rows for upper boundary) for value 30 because we are trying to select columns which fall below 30 only. Highlighted as BLACK.

So RANGE_ROWS= 0 + 1 + 2 + 2 + 2 + 3 + 2 + 5 = 17

EQ_ROWS= 1 + 2 + 3 + 3 + 2 + 2 + 1 = 14

Total = 17 + 14 = 31 which is equals to estimated number of rows.

HAPPY LEARNING!

Regards:
Prince Kumar Rastogi

Like us on FaceBook | Join the fastest growing SQL Server group on FaceBook

Follow Prince Rastogi on Twitter | Follow Prince Rastogi on FaceBook