SQL Server – Join Operation

In this blog we’ll find how SQL Server works when we do join operation. SQL Server optimizer chooses one of the below physical operator to perform logical join operation.

join-algorithm

  • Hash Match
  • Merge Join
  • Nested Loop Join

As far as performance is concern let me tell you, we can’t say which one will best.Each operator has its own advantages and disadvantages. Most of the time query optimizer picks the correct physical join operator to perform join operation. Here we’ll find out how these operators works and in which scenario which operator is suitable.

Hash Match hash-match

Two processes together make hash join i.e. Build and Probe. Build process create the build input. Optimizer chooses the smaller input to create the build input. During build process, it creates a build hash table by computing a hash value for each row from the build input. Then from the probe input it creates the hash value for applicable rows using same hash function and looks in build hash table for match.   In case of multiple join on the same join column, these operations are grouped into a hash team. There are 3 types of hash join i.e. in-memory hash join, grace hash join, and recursive hash join. To know more on Hash Join refer Hash Join.

Apart from join operation, this operator is used for other operations which includes  intersection, union, difference, grouping , distinct.  Hash join algorithm is suitable for large unsorted or non indexed  input.

Lets try the below example:

Create two tables,  Table1 and Table1. Insert some records to both the tables.

Create Table Table1(Col1 int identity, Col2 varchar(20))
Go
Create Table Table2(Col1 int identity, Col2 varchar(20))
Go

Declare @i int =0

While(@i<10000)
Begin
    Insert Table1 values( 'A'+cast(@i as varchar))
    if(@i%2=0)
    Insert Table2 values( 'A'+cast(@i as varchar))
    Set @i+=1
End

Press Ctrl+M to show the Actual Execution Plan and execute the below query

Select T1.Col2
From Table1 T1 
Inner Join Table2 T2 ON T1.Col1 = T2.Col1
 

You can see below the Actual Execution Plan for the above query where Optimizer uses Hash Match Operator to join Table1 and Table2. Here Optimizer chooses Hash Match operator because of high volume of unsorted input rows.

hashmatchexample1

Move hover your mouse pointer to Hash Match operator , you can see the tool tip as below. See in the below image, Table2 is used as Build input because Table2 has less number of records compared to Table1 and  Table1 is used as probe input.

hashmatchexample2-png

If we significantly reduce the number of input rows in the above query by applying filter ,  Even though there is no Index defined, Optimizer will use Nested Loop Join operator instead of Hash Match Operator.

Lets try the below example,

Select  T1.Col2 
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1 Where T1.Col1 = 1

Below is the Actual Execution Plan for the above query, where optimizer chooses Nested Loop Join.

hashmatchexample3

Merge Join mergejoin

Most of the time optimizer chooses Merge Join operator during Join operation when both the inputs are large and sorted on the join column. If Optimizer uses Merge Join, it scans an index if exist on the join column else it applies a sort operator before Merge Join Operator. During Merge Join operation it reads row from each input and compare them. To Know more on Merge Join refer Merge join.

Merge Join is possible when the inputs are sorted and not small . The Merge Join operator performs the inner join, left outer join, left semi join, left anti semi join, right outer join, right semi join, right anti semi join, and union logical operations.

Lets try the below example:

Case 1: Now we know that Merge Join required both inputs should be sorted. Here we’ll apply Merge Join Hint to join Table1 and Table2 to see how optimizer applies sort operator if inputs are not sorted.

Select  T1.Col2 
From Table1 T1
Inner Merge Join Table2 T2 ON T1.Col1 = T2.Col1 

mergejoin1

In case you are using Inner Join Operation with out any join hint and  you see your plan is look like above then in that case it is advisable to create an index on join columns to enhance the performance of join operation.

Case 2: Lets create an Unique Clustered Index on join columns and then will try to join the tables.

Create   Unique Clustered Index IX_Table1_Col1 ON Table1(Col1)
Go
Create   Unique Clustered Index IX_Table2_Col1 ON Table2(Col1)
Go

Select  T1.Col2 
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1

Even though we are not using any hint in the above query, Optimizer will use Merge Join operator to perform join operation because both inputs are already sorted when we defined  index on the join columns. See below the Actual Execution Plan for the above select query.

mergejoin2

Nested Loop Join  nestedloopjoin

This Join is most suitable when outer input is small  and inner input is large and  the join column is indexed. In this kind of join operation it process each row from outer input and loop through all rows of inner input to search for matching row based on join column. To know more refer Nested Loop Join.

Nested loops joins perform a search on the inner table for each row of the outer table, typically using an index. The Nested Loops operator performs the inner join, left outer join, left semi join, and left anti semi join logical operations.

 

Lets try the below example:

Here we reduce the input rows by applying where clause.

Select T1.Col2
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 36

Below is the Actual Execution Plan for the above query.

 

nestedloopjoin1

Now lets compare how it will perform if we apply merge join for the above query. Here I use a range where cost of both the query is same. But if you  compare the Nested Loop operator cost and Merge Join operator cost, Nested Loop operator cost is very much smaller then Merge Join operator cost.

nestedloopjoin2

For the above query we can’t say which operator is efficient. Now lets reduce the number of input rows and will see how it perform?

Select T1.Col2
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 15

Select T1.Col2
From Table1 T1
Inner Merge Join Table2 T2 ON T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 15

Below is the Actual Execution Plan for the above query.nestedloopjoin3

In the above screenshot we can see when the number of input is small Nested Loop Join perform better.

All of the above join operators perform well based on the criteria so we can’t say which one is better.

Can you answer to the below question?

To demonstrate merge join Why I created Unique Clustered Index? What will happen If I create Clustered Index without Unique keyword?

Avatar

About Sandip Pani

Sandip Pani is a Database/BI developer and speaker and is located in Bangalore, India. He has been working with SQL Server over 11 years. He is MCITP for SQL Server 2008 and specializes in SQL Server for application developers and performance and query tuning. His specialization is into Query Tuning and performance troubleshooting. He is working as Senior Technical Specilist for NextGen Healthcare. He is active in SQL community world. He share and enhance his knowledge of SQL Server by spending time at newsgroups and forums, reading and writing blogs, and attending and speaking at conferences.

View all posts by Sandip Pani →

Leave a Reply

Your email address will not be published. Required fields are marked *