In this blog we’ll find how SQL Server works when we do join operation. SQL Server optimizer chooses one of the below physical operator to perform logical join operation.

join-algorithm

  • Hash Match
  • Merge Join
  • Nested Loop Join

As far as performance is concern let me tell you, we can’t say which one will best.Each operator has its own advantages and disadvantages. Most of the time query optimizer picks the correct physical join operator to perform join operation. Here we’ll find out how these operators works and in which scenario which operator is suitable.

Hash Match hash-match

Two processes together make hash join i.e. Build and Probe. Build process create the build input. Optimizer chooses the smaller input to create the build input. During build process, it creates a build hash table by computing a hash value for each row from the build input. Then from the probe input it creates the hash value for applicable rows using same hash function and looks in build hash table for match.   In case of multiple join on the same join column, these operations are grouped into a hash team. There are 3 types of hash join i.e. in-memory hash join, grace hash join, and recursive hash join. To know more on Hash Join refer Hash Join.

Apart from join operation, this operator is used for other operations which includes  intersection, union, difference, grouping , distinct.  Hash join algorithm is suitable for large unsorted or non indexed  input.

Lets try the below example:

Create two tables,  Table1 and Table1. Insert some records to both the tables.

Press Ctrl+M to show the Actual Execution Plan and execute the below query

You can see below the Actual Execution Plan for the above query where Optimizer uses Hash Match Operator to join Table1 and Table2. Here Optimizer chooses Hash Match operator because of high volume of unsorted input rows.

hashmatchexample1

Move hover your mouse pointer to Hash Match operator , you can see the tool tip as below. See in the below image, Table2 is used as Build input because Table2 has less number of records compared to Table1 and  Table1 is used as probe input.

hashmatchexample2-png

If we significantly reduce the number of input rows in the above query by applying filter ,  Even though there is no Index defined, Optimizer will use Nested Loop Join operator instead of Hash Match Operator.

Lets try the below example,

Below is the Actual Execution Plan for the above query, where optimizer chooses Nested Loop Join.

hashmatchexample3

Merge Join mergejoin

Most of the time optimizer chooses Merge Join operator during Join operation when both the inputs are large and sorted on the join column. If Optimizer uses Merge Join, it scans an index if exist on the join column else it applies a sort operator before Merge Join Operator. During Merge Join operation it reads row from each input and compare them. To Know more on Merge Join refer Merge join.

Merge Join is possible when the inputs are sorted and not small . The Merge Join operator performs the inner join, left outer join, left semi join, left anti semi join, right outer join, right semi join, right anti semi join, and union logical operations.

Lets try the below example:

Case 1: Now we know that Merge Join required both inputs should be sorted. Here we’ll apply Merge Join Hint to join Table1 and Table2 to see how optimizer applies sort operator if inputs are not sorted.

mergejoin1

In case you are using Inner Join Operation with out any join hint and  you see your plan is look like above then in that case it is advisable to create an index on join columns to enhance the performance of join operation.

Case 2: Lets create an Unique Clustered Index on join columns and then will try to join the tables.

Even though we are not using any hint in the above query, Optimizer will use Merge Join operator to perform join operation because both inputs are already sorted when we defined  index on the join columns. See below the Actual Execution Plan for the above select query.

mergejoin2

Nested Loop Join  nestedloopjoin

This Join is most suitable when outer input is small  and inner input is large and  the join column is indexed. In this kind of join operation it process each row from outer input and loop through all rows of inner input to search for matching row based on join column. To know more refer Nested Loop Join.

Nested loops joins perform a search on the inner table for each row of the outer table, typically using an index. The Nested Loops operator performs the inner join, left outer join, left semi join, and left anti semi join logical operations.

 

Lets try the below example:

Here we reduce the input rows by applying where clause.

Below is the Actual Execution Plan for the above query.

 

nestedloopjoin1

Now lets compare how it will perform if we apply merge join for the above query. Here I use a range where cost of both the query is same. But if you  compare the Nested Loop operator cost and Merge Join operator cost, Nested Loop operator cost is very much smaller then Merge Join operator cost.

nestedloopjoin2

For the above query we can’t say which operator is efficient. Now lets reduce the number of input rows and will see how it perform?

Below is the Actual Execution Plan for the above query.nestedloopjoin3

In the above screenshot we can see when the number of input is small Nested Loop Join perform better.

All of the above join operators perform well based on the criteria so we can’t say which one is better.

Can you answer to the below question?

To demonstrate merge join Why I created Unique Clustered Index? What will happen If I create Clustered Index without Unique keyword?