SQL Server Optimizing Update Queries for Large Data Volumes

Updating very large tables can be a time taking task and sometimes it might take hours to finish. In addition to this, it might also cause blocking issues.

Here are few tips to SQL Server Optimizing the updates on large data volumes.

  1. Removing index on the column to be updated.
  2. Executing the update in smaller batches.
  3. Disabling Delete triggers.
  4. Replacing Update statement with a Bulk-Insert operation.

With that being said, let’s apply the above points to optimize an update query.

The code below creates a dummy table with 200,000 rows and required indexes.

CREATE TABLE tblverylargetable 
  ( 
     sno  INT IDENTITY, 
     col1 CHAR(800), 
     col2 CHAR(800), 
     col3 CHAR(800) 
  ) 
GO 
DECLARE @i INT=0 
WHILE( @i < 200000 ) 
  BEGIN
      INSERT INTO tblverylargetable 
      VALUES     ('Dummy', 
                  Replicate('Dummy', 160), 
                  Replicate('Dummy', 160)) 
      SET @i=@i + 1 
  END
GO
CREATE INDEX ix_col1 
  ON tblverylargetable(col1) 
GO 
CREATE INDEX ix_col2_col3 
  ON tblverylargetable(col2) 
  INCLUDE(col3)

Consider the following update query which is to be optimized. It’s a very straight forward query to update a single column.

UPDATE tblverylargetable 
SET    col1 = 'D'
WHERE  col1 = 'Dummy'

The query takes 2:19 minutes to execute.

Let’s look at the execution plan of the query shown below. In addition to the clustered index update, the index ix_col1 is also updated. The index update and Sort operation together take 64% of the execution cost.

1_SQL_Server_Optimizing_Update_Queries_for_Large_Data_Volumes

1. Removing index on the column to be updated

The same query takes 14-18 seconds when there isn’t any index on col1. Thus, an update query runs faster if the column to be updated is not an index key column. The index can always be created once the update completes.

2.  Executing the update in smaller batches 

 The query can be further optimized by executing it in smaller batches. This is generally faster. The code below updates the records in batches of 20000.

DECLARE @i INT=1 
WHILE( @i <= 10 ) 
  BEGIN
      UPDATE TOP(20000) tblverylargetable 
      SET    col1 = 'D'
      WHERE  col1 = 'Dummy'
      SET @i=@i + 1 
  END

The above query takes 6-8 seconds to execute. When updating in batches, even if the update fails or it needs to be stopped, only rows from the current batch are rolled back.

3. Disabling Delete triggers

 Triggers with cursors can extremely slow down the performance of a delete query. Disabling After delete triggers will considerably increase the query performance.

4. Replacing Update statement with a Bulk-Insert operation

 An update statement is a fully logged operation and thus it will certainly take considerable amount of time if millions of rows are to be updated.The fastest way to speed up the update query is to replace it with a bulk-insert operation. It is a minimally logged operation in simple and Bulk-logged recovery model. This can be done easily by doing a bulk-insert in a new table and then rename the table to original one. The required indexes and constraint can be created on a new table as required.

The code below shows how the update can be converted to a bulk-insert operation. It takes 4 seconds to execute.

SELECT sno, 
       CASE col1 
         WHEN 'Dummy' THEN 'D'
         ELSE col1 
       END AS col1, 
       col2, 
       col3 
INTO   tblverylargetabletemp 
FROM   tblverylargetable

The bulk-insert can then be further optimized to get additional performance boost.

Hope this helps!!!

 
Like us on FaceBook Follow us on Twitter | Join the fastest growing SQL Server group on FaceBook

14 Comments on “SQL Server Optimizing Update Queries for Large Data Volumes”

  1. Good General practices…

    I don’t think you need to delete the index in question, just DISABLE it. This way you do not have to backup the defination.

    Also consider the effect of Transaction Log, with large data modifcations or inserts. Consider the performance impact of Autocommit vs Explicit Commit transaction models. Have a look at http://blogs.msdn.com/b/sqlsakthi/archive/2011/04/17/what-is-writelog-waittype-and-how-to-troubleshoot-and-fix-this-wait-in-sql-server.aspx. Even though his talking about bottle neck on writelog, it still shows interseting point on affects of autocommit on large inserts.

  2. Thanks Mohit – yes..disable will be a better option, missed it. I didn’t find significant performance gain with Explicit commit in case of update although it works great for inserts.

    Thanks,

    Ahmad

  3. I believe you will see a gain in performance if you sort the incoming data by the primary key of the data being updated. I hope this helps.

  4. Thanks for this post.

    For the batch approach, if I need to update a couple of million rows, I then :
    – store only the PRIMARY KEY of the tableto get UPDATEd, T1, in another table, T2
    – cluster-index that column in T2
    – then proceed by taking the TOP X thousand rows ORDER BY the T2 PRIMARY KEY column, UPDATE those in T1, then DELETE those from T2.

    This way I can get a constant speed for the UPDATEs 😉

  5. your welcome Nicolas.. only for couple of million rows.. however update+delete will generate lot of t-logs..

  6. want to update a huge table with more than 2000000 row .

    here is my code :
    DECLARE @i INTEGER
    DECLARE @TotalNum Int
    DECLARE @Step int
    select @Step = round((select count(*) from T1) / 500000,0)
    set @i=1
    WHILE( @i <=@Step)
    BEGIN
    Update T2
    set SellingPrice = A.maxprice
    from T2 inner join
    (SELECT MAX(SellingPrice) AS maxprice, Product, Color, Size, DimensionCode, WarehouseCode
    FROM T1
    GROUP BY Product, Color, Size, DimensionCode, WarehouseCode) as A
    on A.Product = T2.Product
    and A.color = T2.Color
    and A.Size=T2.Size
    and A.DimensionCode = T2.DimensionCode
    and A.WarehouseCode = T2.WarehouseCode
    where T2.SellingPrice =0 or T2.SellingPrice is null
    SET @i=@i + 1
    END

    but this code is still slow
    do you have any idea

  7. Hi Nadia – Please share the execution plan for the given query.. also check how much time does it take to update a single record using the query you mentioned, if it takes time then the query needs to be optimized.

    Thanks,
    Ahmad Osama

  8. Hi Ahmad,

    I need to update two columns in three tables, and aprroximate records will be 20K.

    So, which approach is better? to use a Stored Procedure or Batch Process?

  9. Hi Megha – you can use any of the above method whether in a procedure or as adhoc parameterised sql query. the batch process can be easily wrapped into a procedure.

  10. I need to update target table that is around 200 million rows. Source table has around 2 million rows. Both table are matched on the basis of primary key and then update all columns. Total column count is 75 those are being updated.
    What can be the best approach to optimize this operation?

Leave a Reply

Your email address will not be published. Required fields are marked *