Deleting Duplicate Rows in SQL Server

Dear All,

There are many ways to delete duplicate rows in a table. The choice depends on many factors like; density of duplicates; whether the whole row has to be identical to be considered duplicate or is it based on one or many attributes, etc.

Technique 1: when the density of duplicates is very high and you do not have any unique identifier in the table:

SELECT DISTINCT * INTO dbo.TableTmp FROM dbo.TableSource;
DROP TABLE dbo.TableSource;
EXEC sp_rename 'dbo.TableTmp', 'TableSource';
-- Add constraints, indexes, triggers, etc

Technique 2: When there is a unique identifier available, say KeyColumn and attribute like IDCol can be used to identify the duplicates:

DELETE FROM dbo.TableSource
WHERE EXISTS
  (SELECT *
   FROM dbo.TableSource AS T2
   WHERE T2.IDCol = dbo.TableSource.IDCol
     AND T2.KeyColumn > dbo.TableSource.KeyColumn);

Technique 3: Using CTE & Row_NUMBER function; no dependency on unique identifier or complete row to be duplicate.

   

WITH TableDup AS

(

SELECT *,

ROW_NUMBER() OVER(PARTITION BY IDCol ORDER BY IDCol) AS RN_col

FROM dbo.TableSource

)

DELETE FROM TableDup WHERE RN_col > 1;

 

Regards

Rahul Sharma

Like us on FaceBook Follow us on Twitter | Join the fastest growing SQL Server group on FaceBook

Follow me on TwitterFollow me on FaceBook

   

One Comment on “Deleting Duplicate Rows in SQL Server”

Leave a Reply

Your email address will not be published.