I really do not know what motivates me or drives me to write this blog post at 12 midnight, just landed from Singapore and waiting for my BA flight to Redmond – I guess it you, the readers and audience of SQLServerGeeks.com that motivates each author who is blogging here – when they see thousands of hits and comments, that really motivates :). I am so sleepy, yet blogging on query hint; if I make mistakes, pardon me 🙂
There is something called parameter sniffing – the query optimizer sniffs the values/constants specified in your query predicates and produces and optimized query plan based on the sniffed value.
USE AdventureWorks GO DBCC FREEPROCCACHE; Go SELECT SalesOrderId, OrderDate from Sales.SalesOrderHeader WHERE SalesPersonID=288
When you run the above code snippet with parameter value 288 (turn on actual execution plan), you will observe Index Seek, bookmark lookup and nested loop join. Note that the parameter value 288 produces only 16 rows.
When you run the following code, you will see in the execution plan, Seek, idx_date scan & hash match. Note that this query produces 473 records.
SELECT SalesOrderId, OrderDate from Sales.SalesOrderHeader WHERE SalesPersonID=277
The difference between the above two queries is just the constant value. Thus, the optimizer sniffs the values, uses statistics, computes the correct cardinality estimation and produces an optimized plan, which in our case is different.
Now run the same queries again, but this time using a variable and observe the execution plan.
DECLARE @SalesPersonID int; SET @SalesPersonID = 288; SELECT SalesOrderId, OrderDate from Sales.SalesOrderHeader WHERE SalesPersonID= @SalesPersonID DECLARE @SalesPersonID int; SET @SalesPersonID = 277; SELECT SalesOrderId, OrderDate from Sales.SalesOrderHeader WHERE SalesPersonID= @SalesPersonID
You will observe that both the above queries produce the same hash match plan. Simply because now the optimizer does not get the variable value until runtime and uses default selectivity. In my case, the default selectivity was 223.882 and thus a hash match was a better plan for this value. You can see this value when you compare estimated number of rows with actual number of rows. Both the above queries produce different actual number of rows, but the same estimated number of rows; and that is because of default selectivity.
Now, if you know (from the past usage of the table) that most of the times users punch in a SalesPersonID that produces a Nested Loop plan, you can use OPTIMIZE FOR hint. (I am using the term ‘nested loop plan’ and ‘hash plan’ just to differentiate between the two plans). And it might make business sense that there will be few sales persons who would be star performers and will produce records more than 224; majority will have much lesser sales 🙂 (I don’t deny it could be otherwise)
So, this is how I construct the query:
DECLARE @SalesPersonID int; SET @SalesPersonID = 277; SELECT SalesOrderId, OrderDate from Sales.SalesOrderHeader WHERE SalesPersonID= @SalesPersonID OPTION (OPTIMIZE FOR (@SalesPersonID = 288));
Now, whatever be your sales person id, you will always get a nested loop plan. OPTIMIZE FOR hint now instructs the query optimizer to use 288 for @SalesPersonID when the query is compiled and optimized. This value is used only during query optimization, and not during query execution.
Advantage: you get optimized plan for most of your queries.
Downside: some queries that could benefit from the hash loop plan, will have to live with the nested loop plan.
There is always a tradeoff.
Many of you who are reading this now will love to talk more about other solutions and techniques but as you know the concept gets bigger and more complicated when we look into other hints like RECOMPILE, re-usage of plans, plan caching & recompilation, etc. You are most welcome to put in your comments. My intention here was to just introduce the OPTIMIZE FOR hint. And I hope you enjoyed reading this 🙂