In this blog we’ll find how SQL Server works when we do join operation. SQL Server optimizer chooses one of the below physical operator to perform logical join operation.
- Hash Match
- Merge Join
- Nested Loop Join
As far as performance is concern let me tell you, we can’t say which one will best.Each operator has its own advantages and disadvantages. Most of the time query optimizer picks the correct physical join operator to perform join operation. Here we’ll find out how these operators works and in which scenario which operator is suitable.
Two processes together make hash join i.e. Build and Probe. Build process create the build input. Optimizer chooses the smaller input to create the build input. During build process, it creates a build hash table by computing a hash value for each row from the build input. Then from the probe input it creates the hash value for applicable rows using same hash function and looks in build hash table for match. In case of multiple join on the same join column, these operations are grouped into a hash team. There are 3 types of hash join i.e. in-memory hash join, grace hash join, and recursive hash join. To know more on Hash Join refer Hash Join.
Apart from join operation, this operator is used for other operations which includes intersection, union, difference, grouping , distinct. Hash join algorithm is suitable for large unsorted or non indexed input.
Lets try the below example:
Create two tables, Table1 and Table1. Insert some records to both the tables.
Create Table Table1(Col1 int identity, Col2 varchar(20)) Go Create Table Table2(Col1 int identity, Col2 varchar(20)) Go Declare @i int =0 While(@i<10000) Begin Insert Table1 values( 'A'+cast(@i as varchar)) if(@i%2=0) Insert Table2 values( 'A'+cast(@i as varchar)) Set @i+=1 End
Press Ctrl+M to show the Actual Execution Plan and execute the below query
Select T1.Col2 From Table1 T1 Inner Join Table2 T2 ON T1.Col1 = T2.Col1
You can see below the Actual Execution Plan for the above query where Optimizer uses Hash Match Operator to join Table1 and Table2. Here Optimizer chooses Hash Match operator because of high volume of unsorted input rows.
Move hover your mouse pointer to Hash Match operator , you can see the tool tip as below. See in the below image, Table2 is used as Build input because Table2 has less number of records compared to Table1 and Table1 is used as probe input.
If we significantly reduce the number of input rows in the above query by applying filter , Even though there is no Index defined, Optimizer will use Nested Loop Join operator instead of Hash Match Operator.
Lets try the below example,
Select T1.Col2 From Table1 T1 Inner Join Table2 T2 ON T1.Col1 = T2.Col1 Where T1.Col1 = 1
Below is the Actual Execution Plan for the above query, where optimizer chooses Nested Loop Join.
Most of the time optimizer chooses Merge Join operator during Join operation when both the inputs are large and sorted on the join column. If Optimizer uses Merge Join, it scans an index if exist on the join column else it applies a sort operator before Merge Join Operator. During Merge Join operation it reads row from each input and compare them. To Know more on Merge Join refer Merge join.
Merge Join is possible when the inputs are sorted and not small . The Merge Join operator performs the inner join, left outer join, left semi join, left anti semi join, right outer join, right semi join, right anti semi join, and union logical operations.
Lets try the below example:
Case 1: Now we know that Merge Join required both inputs should be sorted. Here we’ll apply Merge Join Hint to join Table1 and Table2 to see how optimizer applies sort operator if inputs are not sorted.
Select T1.Col2 From Table1 T1 Inner Merge Join Table2 T2 ON T1.Col1 = T2.Col1
In case you are using Inner Join Operation with out any join hint and you see your plan is look like above then in that case it is advisable to create an index on join columns to enhance the performance of join operation.
Case 2: Lets create an Unique Clustered Index on join columns and then will try to join the tables.
Create Unique Clustered Index IX_Table1_Col1 ON Table1(Col1) Go Create Unique Clustered Index IX_Table2_Col1 ON Table2(Col1) Go Select T1.Col2 From Table1 T1 Inner Join Table2 T2 ON T1.Col1 = T2.Col1
Even though we are not using any hint in the above query, Optimizer will use Merge Join operator to perform join operation because both inputs are already sorted when we defined index on the join columns. See below the Actual Execution Plan for the above select query.
This Join is most suitable when outer input is small and inner input is large and the join column is indexed. In this kind of join operation it process each row from outer input and loop through all rows of inner input to search for matching row based on join column. To know more refer Nested Loop Join.
Nested loops joins perform a search on the inner table for each row of the outer table, typically using an index. The Nested Loops operator performs the inner join, left outer join, left semi join, and left anti semi join logical operations.
Lets try the below example:
Here we reduce the input rows by applying where clause.
Select T1.Col2 From Table1 T1 Inner Join Table2 T2 ON T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 36
Below is the Actual Execution Plan for the above query.
Now lets compare how it will perform if we apply merge join for the above query. Here I use a range where cost of both the query is same. But if you compare the Nested Loop operator cost and Merge Join operator cost, Nested Loop operator cost is very much smaller then Merge Join operator cost.
For the above query we can’t say which operator is efficient. Now lets reduce the number of input rows and will see how it perform?
Select T1.Col2 From Table1 T1 Inner Join Table2 T2 ON T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 15 Select T1.Col2 From Table1 T1 Inner Merge Join Table2 T2 ON T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 15
In the above screenshot we can see when the number of input is small Nested Loop Join perform better.
All of the above join operators perform well based on the criteria so we can’t say which one is better.
Can you answer to the below question?
To demonstrate merge join Why I created Unique Clustered Index? What will happen If I create Clustered Index without Unique keyword?