Today we are going to discuss on SQL Server hash match join operator but before we do that, we need to understand the concept of hashing and a hash table.
Hashing is a programming technique where data is converted into symbolic form that makes it easy to be searched quickly and is often used in SQL Server to convert data into a format that is efficient to work with or that make searching more efficient. A Hash Table is a data structure which divides all elements in equal sized categories known as buckets that allows quick access to elements.
Hashing function defines the algorithm as to which bucket the data goes into and when SQL Server joins two tables by hashing the rows from the smaller of the two tables (Build input) to be joined then inserting them into a hash table and then processing the larger table (Probe input) one row at a time against the smaller table searching for matches where rows needs to be joined.
Let us understand this operator using following query statement.
USE [AdventureWorks2012] SELECT SD.SalesOrderID, SD.CarrierTrackingNumber, SD.ProductID, TH.TransactionID FROM Sales.SalesOrderDetail SD INNER JOIN Production.TransactionHistory TH ON SD.SalesOrderID = TH.ReferenceOrderID
In the example, Sales.SalesOrderDetail table is used as Probe input and Production.TransactionHistory table is used a Build input. Build and Probe inputs can be verified by executing following queries.
-- Record count for Production.TransactionHistory (Build table) SELECT COUNT(*) AS 'Count' FROM Production.TransactionHistory -- Record count for Sales.SalesOrderDetail (Probe input) SELECT COUNT(*) AS 'Count' FROM Sales.SalesOrderDetail
Hash match joins are efficient with large data sets when one of the tables is considerably smaller than the other. This operator in execution plan indicates missing indexe(s), missing where clause or could be a calculation or conversion in WHERE clause that makes it non-sargeable.