A TIP ABOUT LINKED SERVER

This article first appeared in the SQLServerGeeks Magazine.
Author: Erland Sommarskog .
Subscribe to get your copy.

Working with linked servers often means hassle. One hassle is with connectivity and authentication. Another is that performance sometimes can very poor. In this month’s article I will discuss some tricks to avoid performance disasters with linked servers with help of dynamic SQL.

If you want to run the examples in this article, you need to install a copy of the small demo database Northwind. If you don’t have it already, you can download it from Microsoft’s Github or my web site. You will also need a linked server. If you have access to two SQL Server instances, you can install Northwind on one instance and run the examples from the other instance. If you only have one instance available, you can still work with the examples by setting up a loopback server this way:

EXEC sp_addlinkedserver LNKSRV, '', 'SQLNCLI', @datasrc = @@servername

The easiest way to query a table a remote server is to use four-part notation like this:

SELECT * FROM LNKSRV.Northwind.dbo.Orders WHERE OrderID = 11000

Many times, this works fine, and you get a query plan akin to this:

The main operator is a Remote Query and it returns one single row. That is, the local instance sends over the parameter value 11000 to the remote server which applied this filter locally. The technical term that optimizer people use is that the predicate is remoted. However, remoting does not always happen. Here is one example:

DECLARE @lookfor nvarchar(20) = 'snabb'
SELECT * FROM LNKSRV.Northwind.dbo.Orders 
WHERE patindex('%' + @lookfor + '%', ShipName) > 0

Look at the query plan:

The remote query returns all 830 orders in Northwind and the filter is applied locally. The patindex function cannot be remoted, because the optimizer does not know whether the remote server supports the patindex function. Keep in mind that the remote data source could be running something else than SQL Server.

In this particular situation, you can easily avoid the issue by using LIKE instead, as LIKE conditions generally are remoted. However, it is far from always there exists a rewrite, so let’s work with the example with patindex from a general perspective.

As you can imagine, dragging over all rows from the remote table can lead to enormous performance problems if the remote table has hundreds of millions of rows. While this particular query always will require a full scan, it is a lot more efficient if this happens on the remote server.

An easy way out, or so it may seem at first glance, is to use OPENQUERY:

SELECT *
FROM   OPENQUERY (LNKSRV, 'SELECT * FROM Northwind.dbo.Orders 
                           WHERE charindex(''%snabb%'', ShipName) > 0')

OPENQUERY simply sends the query you pass to the remote server, so you know that it will be evaluated entirely on the remote server.

You may note here that I have replaced the variable with the search string itself. But in many situations, we want to use a variable and not a hard-coded string. And this is where our trouble starts. You cannot replace snabb with @lookfor in the call to OPENQUERY, because the remote server has no knowledge about this variable. Alas, OPENQUERY does not provide a means to accept parameters. Nor does OPENQUERY accept a variable for the query string – it must be a string literal. (For good reasons: SQL Server needs to know the shape of the result set at compile time.) This often sends you down a rabbit hole of complicated dynamic SQL. There is all reason to avoid this if you can, so let’s look at alternatives first.

A great solution is to use sp_executesql on the remote server and send a parameterised query like this:

DECLARE @lookfor nvarchar(20) = 'snabb'
DECLARE @sql nvarchar(MAX) = 
   		'SELECT * FROM Northwind.dbo.Orders
   		 WHERE patindex(''%'' + @lookfor + ''%'', ShipName) > 0'
EXEC LNKSRV.Northwind.sys.sp_executesql 
     	@sql, N'@lookfor nvarchar(20)', @lookfor

Note that the query string does not include the database name. Because the call to sp_executesql includes the database name, it will execute in the context of Northwind anyway. (If you are not acquainted with dynamic SQL and sp_executesql, see the beginning of my article The Curse and Blessings of Dynamic SQL.)

This works only if the remote server also runs SQL Server. However, there is a generic alternative, to wit, EXEC() AT, that works with any remote data source:

DECLARE @lookfor nvarchar(20) = 'snabb'
DECLARE @sql nvarchar(MAX) = 
'SELECT * FROM Northwind.dbo.Orders WHERE patindex(''%'' + ? + ''%'' > 0'
EXEC(@sql, 'snabb') AT LNKSRV 

Here the question mark (?) serves as a parameter placeholder, and you need to supply an actual value for each ? to EXEC() AT. (For more details on EXEC() AT, see section 2.4 in my article on dynamic SQL.)

If you want to join the result from the remote table with local tables, you can use INSERT-EXEC to receive the result in a temp table and then use that temp table in your query:

INSERT #temp (....)
   	EXEC LNKSRV.db.sys.sp_executesql @query, @params, @par1, @par2, ...

If you think this looks complicated, let’s turn to OPENQUERY and see how fun that is. If you start using OPENQUERY without too much thought behind it, you may end up writing something like this:

CREATE OR ALTER PROCEDURE get_remote_data @lookfor nvarchar(20) AS
DECLARE @sql nvarchar(MAX)
SELECT @sql = 'SELECT * FROM OPENQUERY(LNKSRV, ' +
              '''SELECT * FROM Northwind.dbo.Orders ' +
              'WHERE patindex(N''''%' + @lookfor + '%'''', ShipName) > 0'')'
PRINT @sql
EXEC sp_executesql @sql
go
EXEC get_remote_data 'snabb'

I can only recommend you to take this path, if you already have made a conscious decision to lose your mind. Do you really fancy the idea of staring at something like this for a whole day, trying to under-stand if the reason why it does not work is because you have one single quote too many or too few somewhere? Or whether those per-cent characters should be elsewhere? The above does work – but I can assure you that I did not get it right in my first attempt. And while it may be working, this is open for SQL injection – both on the remote server and the local server. On top of that, because search value is inlined into the query string, the query contributes to cache pollution on the remote server.

   

When you work with OPENQUERY, you need to take a systematic approach when you build your dynamic query. First, we need a helper function:

CREATE FUNCTION quotestring_n(@str nvarchar(MAX)) RETURNS nvarchar(MAX) AS

 

BEGIN
  	 DECLARE @ret nvarchar(MAX),
           	@sq  nchar(1) = ''''
   	SELECT @ret = replace(@str, @sq, @sq + @sq)
   	RETURN('N' + @sq + @ret + @sq)
END

This function accepts as string as a parameter, and returns it wrapped in N” to make it an nvarchar literal. Furthermore, all single quotes withing string are doubled to abide with the SQL syntax.

Now when we have this function, we can build the query in two steps. First, we build the remote query string and then we wrap that in a local query:

CREATE OR ALTER PROCEDURE get_remote_data @lookfor nvarchar(20) AS
DECLARE @remotesql nvarchar(MAX),
        	@localsql  nvarchar(MAX)

SELECT @remotesql = 
 	'SELECT * FROM Northwind.dbo.Orders
 WHERE patindex(' + dbo.quotestring_n('%' + @lookfor + '%') + ', ShipName) > 0'
SELECT @localsql = 'SELECT * FROM OPENQUERY(LNKSRV, ' + 
                    dbo.quotestring_n(@remotesql) + ')'
PRINT @localsql
EXEC sp_executesql @localsql
go
EXEC get_remote_data 'snabb'

We will now move to another situation where performance can be a challenge. Consider a query like this:

SELECT ...
FROM   LNKSRV.db.dbo.bigremotetbl r
JOIN   dbo.tinylocaltbl l ON r.somecol = l.somecol

As the names suggest, bigremotetbl is a big table, with millions of rows or more, while tinylocaltbl has just a handful. For good performance you want the optimizer to settle for a plan where the contents of tinylocaltbl.somecol is sent over to the remote server, and the part of the query against bigremotetbl is evaluated there. The optimizer is certainly able to produce such a plan as testified by this example.

DROP TABLE IF EXISTS #orderids

 

CREATE TABLE #orderids (id int NOT NULL PRIMARY KEY)
INSERT #orderids (id) VALUES(10876), (11000), (11035)

SELECT O.* 
FROM   LNKSRCV.Northwind.dbo.Orders O
WHERE  O.OrderID IN (SELECT t.id FROM #orderids t)

This is the query plan I get:

However, this may not always happen. As we have seen, the query could have a condition that cannot be remoted. With a join, there is a second risk: the optimizer may for one reason or another get the estimates wrong. In either situation, this could lead to the remote table to be dragged over to the local server resulting in a performance disaster.

I will have to admit that I am not inclined to take my chances that the optimizer will get things right. When I encounter a situation like this, I rather pass the data in the local table to the remote server and run a query there as a disaster prevention. My preferred method is to package the data in the local table in an XML string which I pass to the remote server with sp_executesql. In the remote query, I shred the XML data into a local temp table which I then join to the main table on the remote server.

Here is a rewrite of the query below to achieve this:

DECLARE @xmlstr nvarchar(MAX),
        @query  nvarchar(MAX)
SELECT @xmlstr = (SELECT id FROM #orderids FOR XML RAW, ROOT('root'))
SELECT @query = '
   DECLARE @xml xml = cast(@xmlstr AS xml)
   CREATE TABLE #xmldata (id int NOT NULL PRIMARY KEY)
   INSERT #xmldata(id)
      SELECT T.c.value(''@id'', ''int'')
FROM   @xml.nodes(''/root/row'') AS T(c)

 

   SELECT *
   FROM   dbo.Orders O
   WHERE  O.OrderID IN (SELECT x.id FROM #xmldata x)'
EXEC LNKSRV.Northwind.sys.sp_executesql @query, N'@xmlstr nvarchar(MAX)',              
                                        @xmlstr

When forming the XML string, I use FOR XML RAW, which is the easiest to use for this purpose. You may note that the data type of @xmlstr is nvarchar and not xml. This is because the xml data type is not supported in calls to linked servers. Instead, I cast @xmlstr to xml on the remote side, so that I can shred it into #xmldata.

You may ask whether the temp table on remote server is needed; couldn’t we use @xml.nodes directly? Indeed, we can, but then optimizer would not have any clue about the data, and make a blind assumption which could lead to poor performance. For this reason, it is generally a good idea to shred XML data into a temp table, so that the optimizer gets some statistics to work from.

You may also ask if we couldn’t use a table-valued parameter instead. The answer is the same as for the xml data type: not supported with linked servers. What is a viable alternative on SQL 2016 and later is to use JSON instead if you so fancy. Since JSON is handled as nvarchar in SQL Server anyway, this removes the initial cast in @query above

An alternative is to use OPENQUERY and transform the local temp table to an IN list:

DECLARE @getlist      nvarchar(MAX),
        @remotequery  nvarchar(MAX),
        @localquery   nvarchar(MAX)
-- SQL 2016 and earlier.
--SELECT @getlist = 
--   (SELECT concat(id, ', ')
--    FROM   #orderids
--    FOR XML PATH(''), TYPE).value('.', 'nvarchar(MAX)')
--SELECT @getlist = substring(@getlist, 1, len(@getlist) - 1)  
SELECT @getlist = string_agg(convert(varchar(10), id), ',') FROM #orderids
SELECT @remotequery = 
    N'SELECT *
      FROM   Northwind.dbo.Orders O
      WHERE  O.OrderID IN (' + @getlist + ')'
SELECT @localquery = 
   N'SELECT *
     FROM   OPENQUERY(LNKSRV, ' + dbo.quotestring_n(@remotequery) + ')'
PRINT @localquery

 

EXEC sp_executesql @localquery

The comma-separated list is saved into @getlist, which we construct with help of the string_agg function. (On older versions of SQL Server you can use the funny quirk with FOR XML PATH commented out above for the same purpose). Once we have @getlist, we build the query in two steps as we did above: we first build @remotequery, and then we embed it into @localquery.

I am less fond of this solution. As I noted above, it causes litter in the plan cache on the remote server, since the values in #getlist appear as constants in the query. However, if you are accessing another data source than SQL Server, the solution with sending over XML/JSON may not work with that data source.

Finally, I should reinforce that this is a path you would take only when you are very sure that the local table is a lot smaller than the remote table. If you apply it when the relations are the opposite, you are certainly in for bad performance.

This article first appeared in the SQLServerGeeks Magazine.
Author: Erland Sommarskog .
Subscribe to get your copy.

   

Leave a Reply

Your email address will not be published.