SQL Server with R Services

With SQL Server 2016, you can run R scripts to retrieve data directly from SQL Server Databases. This has come with new feature with SQL Server 2016.

To work with R Services, Microsoft has introduced a system stored procedure namely Sys.sp_execute_external_script.  This stored procedure provides the structure necessary to run R Script against SQL Server data on the SQL Server Environment.  In the SQL Server 2016, it supports only the R Language where Phython is supported  in the SQL Server 2016.

Before you use this stored procedure, you must enable your SQL Server instance to support external scripts.  First, you need to check whether R Services is installed correctly and running.  If you have not installed the R Services or not enabled the external scripts and not sure how to go ahead, please refer to MSDN article Set up SQL Server R Services (In-Database).

Once you have R Services ready, you should be able to try with the sp_execute_external_script stored procedure.  The following syntax show the procedure basic elements:

sp_execute_external_script

@language = N’language’,

@script = N’script’,

[ @input_data_1 = ‘SelectStatement’ [,] ]

[ @input_data_1_name =  N’InputDataSet’ [,] ]

[ @output_data_1_name = N’OutputDataSet’ [,] ]

[ @params =  N’@ParameterName DataType [ OUT | OUTPUT ] [ ,…n ]’ [,] ]

[ @ParameterName = ‘value’ [ OUT | OUTPUT ] [,] [ ,…n ] ]

[ WITH <execute_option> ] [;]

<execute_option>::=

{

{ RESULT SETS UNDEFINED }

| { RESULT SETS NONE }

| { RESULT SETS ( <result_sets_definition> ) }

}

 

Using SQL Server data in R scripts

As fun as it might be to run R scripts alone in SQL Server, the real power that comes with R Services is the ability to incorporate SQL Server data into those scripts. For that, we need to include the procedure’s @input_data_1 parameter, which specifies the SELECT statement to use to retrieve data from our database, as shown in the following example:

DECLARE @rscript NVARCHAR(MAX);

SET @rscript = N’OutputDataSet <- InputDataSet;’;

DECLARE @sqlscript NVARCHAR(MAX);

SET @sqlscript = N’

SELECT FirstName, LastName, SalesYTD

FROM Sales.vSalesPerson

WHERE SalesYTD > 2000000

ORDER BY SalesYTD DESC;’;

EXEC sp_execute_external_script

@language = N’R’,

@script = @rscript,

@input_data_1 = @sqlscript;

GO

 

As with the R script, I’m using a T-SQL variable (@sqlscript) to store the SELECT statement. I then assign that variable to the @input_data_1 parameter.

I’ve also updated the R script itself to assign the InputDataSet value to the OutputDataSet variable. InputDataSet is the default name of the variable used to represent the query specified in the @input_data_1 parameter. In other words, I’m assigning the data returned by the SELECT statement to the OutputDataSet variable, giving us the results shown in the following table:

 

(No column name) (No column name) (No column name)
Linda Mitchell 4251368.5497
Jae Pak 4116871.2277
Michael Blythe 3763178.1787
Jillian Carson 3189418.3662
Ranjit Varkey Chudukatil 3121616.3202
José Saraiva 2604540.7172
Shu Ito 2458535.6169
Tsvi Reiter 2315185.611

 

All we’ve done here is to pass our T-SQL query results to the R script, which returns those same results as that of the SELECT statement. Of course, this is something we could have done without using an R script; however, by being able to pass our query results to the R script, we can then use the analytical power built into R to do something

with that data, which is the whole point of SQL Server R Services and the sp_execute_external_script stored procedure.

For example, the following R script divides the SalesYTD totals by 7 and then rounds that figure to two decimal points:

DECLARE @rscript NVARCHAR(MAX);

SET @rscript = N’

OutputDataSet <- InputDataSet;

OutputDataSet[,3] <- round(InputDataSet$SalesYTD / 7, 2);’;

DECLARE @sqlscript NVARCHAR(MAX);

SET @sqlscript = N’

SELECT FirstName, LastName, SalesYTD

FROM Sales.vSalesPerson

WHERE SalesYTD > 2000000

ORDER BY SalesYTD DESC;’;

EXEC sp_execute_external_script

@language = N’R’,

@script = @rscript,

@input_data_1 = @sqlscript;

GO

The idea here is it find the average monthly sales based on the assumption that seven months have passed so far this year and then round that figure. To do so, I’ve added a second line to the R script. To the left of the assignment operator, I’ve specified OutputDataSet[,3] to indicate that we want to update the third column within data frame.

To the right of the assignment operator, we reference the InputDataSet variable again, but this time add a dollar sign ($) and the name of the source column (SalesYTD). We then divide this amount by 7and use the round function to round the amount to two decimal points, giving us the results shown in the following table:

(No column name) (No column name) (No column name)
Linda Mitchell 607338.36
Jae Pak 588124.46
Michael Blythe 537596.88
Jillian Carson 455631.2
Ranjit Varkey Chudukatil 445945.19
José Saraiva 372077.25
Shu Ito 351219.37
Tsvi Reiter 330740.8

 

 

Data Platform Virtual Summit 2020

Subscribe to SQLServerGeeks YouTube channel. If you want more learning content in your inbox, subscribe to SQLServerGeeks Bulletin.

SQLServerGeeks YouTube | SQLServerGeeks Bulletin | SQLServerGeeks Twitter

Avatar

About srikanth manda

I have got more than 9 years of experience in SQL Server and Microsoft Business Intelligence. Currently am working as Technical Architect in TCS (Tata Consultancy Services), Hyderabad. Very well versed in trouble shooting, Performance tuning, resolving the issues. Presently, exploring on the Bigdata and Hadoop, Spark and Scala.

View all posts by srikanth manda →

Leave a Reply

Your email address will not be published. Required fields are marked *