Probably, a zillion blog posts and articles must have been written on SQL Server UNION vs UNION ALL and the performance consideration between them.
Let this one be one of them too. And treat this post as a reminder
I was at a customer site today reviewing some T-SQL code and I came across a couple of scripts that used UNION operator instead of UNION ALL.
Combines the results of two or more queries into a single result set that includes all the rows that belong to all queries in the union. The UNION operation is different from using joins that combine columns from two tables.
The following are basic rules for combining the result sets of two queries by using UNION:
- The number and the order of the columns must be the same in all queries.
- The data types must be compatible.
- Specifies that multiple result sets are to be combined and returned as a single result set.
- Incorporates all rows into the results. This includes duplicates. If not specified, duplicate rows are removed.
When I asked the developer, why has he used UNION and if he specifically wanted the duplicates to be removed, he looked at me as if I just spoke in Greek. I was stunned too.
So with this simple example I explained him that UNION ALL performs faster than UNION, simply because UNION ALL does not remove duplicates. So if has no need of explicitly removing duplicates, he can use UNION ALL instead.
Another thing that bothered me was (after I left the customer site), did the developer knew that duplicates were being removed? Was he sure that his output was functionally correct?