SQL Server OPENJSON for selecting and comparing values by Tomaž Kaštrun

This article first appeared in the SQLServerGeeks Magazine.
Author: Tomaž Kaštrun
Subscribe to get your copy.

OPENJSON was introduced in SQL Server 2016 and is a table-valued function that parses JSON formatted text and returns objects and properties in form of key:value pairs. These pairs can be used presented as rows and columns, or as a rowset view over JSON file.
This ability to extract the objects and parameters (or keys and values) in a rowset view, opens up a lot of potential useful T-SQL techniques that will go beyond reading JSON files or JSON formats.
Daily wrangling and engineering data will challenge you with variety of tasks, that usually end up too complex for later maintenance or might pose a performance issue. OPENJSON table-valued function has been many times overlooked (among those are also CROSS APPLY, STRING_ESCAPE, STRING_AGG, STRING_SPLIT, TRY_CONVERT, CUME_DIST, LAG, LEAD, FIRST_VALUE) not because people would not heard about it, but – from what I have seen – it is immediately associated with JSON format and people simply ignore it.
Showing two examples that have proven really helpful over past years and has helped me and other data analysts, developers, scientists many times. First example will be on selecting values and second one on comparing values. Both cases can be used in different scenarios, different industries, but it’s simplicity can be really helpful. Both demos are using Master database for the simplicity and brevity but I would propose using your own database

Selecting Values
Many times, you want to have a set of values (as a string with separator) introduced into query. You can either hard-code the values (which I would not recommend), you can iterate through the list, use XML FOR PATH clause, create a temporary object and many other solutions. Since OPENJSON is a table-valued function, you can simply use it with JOIN statement to pass the parameters

USE  [Master];

DECLARE @TableID VARCHAR(100) = '[20,21,22,23,24]';

FROM sys.objects AS o
JOIN sys.schemas AS s
ON s.schema_id = o.schema_id
JOIN OPENJSON(@TableID) AS d ON o.Object_ID =  d.value
-- returns same result set if used explicit SELECT value FROM statement
-- INNER JOIN (SELECT value FROM OPENJSON(@TableID)) as d ON o.Object_ID =  d.value

This is a simple but effective way to get the list of e.g.: invoiceID, customerID, locationID, codeID into your query. OPENJSON in this does create a temporary table of identifiers and filters the data based on the values.

For further reading use Microsoft Docs resources as this link.


Comparing Values
OPENJSON can also be used to compare the set of values, given the same key. Imagine a JSON file with the following keys and values:

    "ColA": 10,
    "ColD": "2021/04/28",
    "Name": "Table1"
}, {
    "ColA": 20,
    "ColD": "2021/04/28",
"Name": "Table2"

}, {
    "ColA": 30,
    "ColD": "2021/04/29",
"Name": "Table3"


This data would be represented as rowset view as:
ColA      ColD                       Name
10         “2021/04/28”   Table1
20         “2021/04/28”   Table2
30         “2021/04/29”   Table3

And your case is to find all the differences between Table1 and Table3 on any given attribute.
OPENJSON would give you the capability to intricately pivot the data (or values) over the same key and either show all the data or simply use/show where there are differences or match.

USE  [Master];

   ,master_db.[value] AS master_values
   ,model_db.[value] AS model_values 
   ,msdb_db.[value] AS msdb_values 

FROM OPENJSON ((SELECT * FROM sys.databases WHERE database_id = 1 FOR JSON AUTO, WITHOUT_ARRAY_WRAPPER)) AS master_db 
ON master_db.[key] = model_db.[key] 
ON master_db.[key] = msdb_db.[key]

In this case I am taking three different databases and joining all the column names (key) and pivoting the data. You would not get the same result with running this query:

SELECT * FROM sys.databases
    database_id IN (1,3,4)

Since the OPENJSON function pivots the keys and values, it is much easier to filter out the rows (that are columns in SELECT * FROM sys.databases statement) by applying WHERE clause or filter out values in ON clause.
I have seen this concept first by my friend and fellow MVP, Miloš Radivojević and he introduced it in the book “SQL Server 2016 developer’s Guide” and from that time, I have been using it in many reports, queries, from Sales data (comparing complaints or searching out differences among customers) to parameter sweeping for machine learning models. Truly simple, yet powerful approach.

This article first appeared in the SQLServerGeeks Magazine.
Author: Tomaž Kaštrun
Subscribe to get your copy.


Leave a Reply

Your email address will not be published.