Deleting Duplicate Rows in SQL Server

Dear All,

There are many ways to delete duplicate rows in a table. The choice depends on many factors like; density of duplicates; whether the whole row has to be identical to be considered duplicate or is it based on one or many attributes, etc.

Technique 1: when the density of duplicates is very high and you do not have any unique identifier in the table:

SELECT DISTINCT * INTO dbo.TableTmp FROM dbo.TableSource;
DROP TABLE dbo.TableSource;
EXEC sp_rename 'dbo.TableTmp', 'TableSource';
-- Add constraints, indexes, triggers, etc

Technique 2: When there is a unique identifier available, say KeyColumn and attribute like IDCol can be used to identify the duplicates:

DELETE FROM dbo.TableSource
WHERE EXISTS
  (SELECT *
   FROM dbo.TableSource AS T2
   WHERE T2.IDCol = dbo.TableSource.IDCol
     AND T2.KeyColumn > dbo.TableSource.KeyColumn);

Technique 3: Using CTE & Row_NUMBER function; no dependency on unique identifier or complete row to be duplicate.

WITH TableDup AS

(

SELECT *,

ROW_NUMBER() OVER(PARTITION BY IDCol ORDER BY IDCol) AS RN_col

FROM dbo.TableSource

)

DELETE FROM TableDup WHERE RN_col > 1;

 

Regards

Rahul Sharma

Like us on FaceBook Follow us on Twitter | Join the fastest growing SQL Server group on FaceBook

Follow me on TwitterFollow me on FaceBook

Data Platform Virtual Summit 2020

Subscribe to SQLServerGeeks YouTube channel. If you want more learning content in your inbox, subscribe to SQLServerGeeks Bulletin.

SQLServerGeeks YouTube | SQLServerGeeks Bulletin | SQLServerGeeks Twitter

One Comment on “Deleting Duplicate Rows in SQL Server”

Leave a Reply

Your email address will not be published. Required fields are marked *