How to delete duplicate rows in SQL Server?

Fearghal picture Fearghal · Aug 22, 2013 · Viewed 909.1k times · Source

How can I delete duplicate rows where no unique row id exists?

My table is

col1  col2 col3 col4 col5 col6 col7
john  1    1    1    1    1    1 
john  1    1    1    1    1    1
sally 2    2    2    2    2    2
sally 2    2    2    2    2    2

I want to be left with the following after the duplicate removal:

john  1    1    1    1    1    1
sally 2    2    2    2    2    2

I've tried a few queries but I think they depend on having a row id as I don't get the desired result. For example:

DELETE
FROM table
WHERE col1 IN (
    SELECT id
    FROM table
    GROUP BY id
    HAVING (COUNT(col1) > 1)
)

Answer

Tim Schmelter picture Tim Schmelter · Aug 22, 2013

I like CTEs and ROW_NUMBER as the two combined allow us to see which rows are deleted (or updated), therefore just change the DELETE FROM CTE... to SELECT * FROM CTE:

WITH CTE AS(
   SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7],
       RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1)
   FROM dbo.Table1
)
DELETE FROM CTE WHERE RN > 1

DEMO (result is different; I assume that it's due to a typo on your part)

COL1    COL2    COL3    COL4    COL5    COL6    COL7
john    1        1       1       1       1       1
sally   2        2       2       2       2       2

This example determines duplicates by a single column col1 because of the PARTITION BY col1. If you want to include multiple columns simply add them to the PARTITION BY:

ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn)