Delete Duplicate Rows in MS SQL Server

Last Updated : 06 Sep, 2024

In MS SQL Server, managing duplicate rows is a common task that can affect the integrity and performance of a database. To address this issue, SQL Server provides several methods for identifying and deleting duplicate rows.

In this article, We will explore three effective approaches: using the GROUP BY and HAVING clause, Common Table Expressions (CTE) and the RANK() function.

How to Delete Duplicate Rows in SQL Server?

To Delete Duplicate Rows in MS SQL Server, we will use the below method that helps us to perform delete duplicate rows from the table as defined below:

Using GROUP BY and HAVING Clause
Using Common Table Expressions (CTE)
Using RANK() Function

To delete duplicate rows in MS SQL Server use the following syntax:

WITH CTE AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY [column1], [column2], ... 
                      ORDER BY [column1], [column2], ...) AS RowNumber
    FROM [table_name]
)
DELETE FROM CTE
WHERE RowNumber > 1;

Different Ways To Find And Delete Duplicate Rows From A Table In SQL Server

Let us look at an example, and learn how to delete duplicate rows in MS SQL Server. First, let's create a table and insert some duplicate rows in the table. Let us create a table named Geek:

CREATE TABLE Geek(
    Name NVARCHAR(100) NOT NULL,
    Email NVARCHAR(255) NOT NULL,
    City NVARCHAR(100) NOT NULL
);

INSERT INTO Geek (Name, Email, City) VALUES
    ('Nisha', 'nisha@gfg.com', 'Delhi'),
    ('Megha', 'megha@gfg.com', 'Noida'),
    ('Khushi', 'khushi@gfg.com', 'Jaipur'),
    ('Khushi', 'khushi@gfg.com', 'Jaipur'),
    ('Khushi', 'khushi@gfg.com', 'Jaipur'),
    ('Hina', 'hina@gfg.com', 'Kanpur'),
    ('Hina', 'hina@gfg.com', 'Kanpur'),
    ('Misha', 'misha@gfg.com', 'Gurugram'),
    ('Misha', 'misha@gfg.com', 'Gurugram'),
    ('Neha', 'neha@gfg.com', 'Pilani');

1. Using `GROUP BY` and `HAVING` Clause

Problem Statement:

Remove duplicate rows from the Geek table, keeping only one entry for each unique combination of Name, Email, and City.

SQL Query:

-- Select distinct rows to find duplicates
SELECT Name, Email, City
FROM Geek
GROUP BY Name, Email, City
HAVING COUNT(*) > 1;

Output:

Name	Email	City
Khushi	khushi@gfg.com	Jaipur
Hina	hina@gfg.com	Kanpur
Misha	misha@gfg.com	Gurugram

Explanation:

The GROUP BY clause groups rows by Name, Email, and City.
The HAVING COUNT(*) > 1 clause filters groups where there are more than one occurrence, indicating duplicates.

2. Using Common Table Expressions (CTE)

Problem Statement:

Remove duplicate rows from the Geek table while retaining only the first occurrence of each unique combination of Name, Email, and City.

SQL Query:

WITH CTE AS (
    SELECT 
        Name,
        Email,
        City,
        ROW_NUMBER() OVER (PARTITION BY Name, Email, City ORDER BY (SELECT NULL)) AS rn
    FROM Geek
)
-- Delete duplicates where the row number is greater than 1
DELETE FROM CTE
WHERE rn > 1;

This query does not produce a direct output but removes the duplicate entries.

Explanation:

The ROW_NUMBER() function assigns a unique sequential integer to rows within a partition of Name, Email, and City.
PARTITION BY specifies the columns used to identify duplicates.
ORDER BY (SELECT NULL) allows row numbers to be assigned arbitrarily.
The DELETE statement removes rows where rn > 1, keeping only the first occurrence.

3. Using `RANK()` Function

Problem Statement:

Remove duplicate rows from the Geek table while retaining only the row with the highest rank for each unique combination of Name, Email, and City.

SQL Query:

WITH RankedCTE AS (
    SELECT 
        Name,
        Email,
        City,
        RANK() OVER (PARTITION BY Name, Email, City ORDER BY (SELECT NULL)) AS rnk
    FROM Geek
)
-- Delete duplicates where the rank is greater than 1
DELETE FROM Geek
WHERE EXISTS (
    SELECT 1
    FROM RankedCTE
    WHERE RankedCTE.Name = Geek.Name
      AND RankedCTE.Email = Geek.Email
      AND RankedCTE.City = Geek.City
      AND RankedCTE.rnk > 1
);

Output:

This query does not produce a direct output but removes the duplicate entries.

Explanation:

The RANK() function assigns a rank to each row within a partition of Name, Email, and City.
Rows with the same rank will have the same value within the partition.
The DELETE statement removes rows where rnk > 1, ensuring only the highest-ranked row (first occurrence) is kept.

Conclusion

Deleting duplicate rows in SQL Server is crucial for maintaining clean and accurate datasets. The methods outlined—using GROUP BY with HAVING, Common Table Expressions (CTE), and the RANK() function—offer versatile solutions for different scenarios. Whether you need to identify duplicates, remove them while retaining the first occurrence, or prioritize rows based on ranking, SQL Server provides robust tools to achieve these goals

Delete Duplicate Rows in MS SQL Server

khushboogoyal499

Improve

Article Tags :

Delete Duplicate Rows in MS SQL Server

How to Delete Duplicate Rows in SQL Server?

Different Ways To Find And Delete Duplicate Rows From A Table In SQL Server

1. Using GROUP BY and HAVING Clause

Problem Statement:

SQL Query:

Explanation:

2. Using Common Table Expressions (CTE)

Problem Statement:

SQL Query:

Explanation:

3. Using RANK() Function

Problem Statement:

SQL Query:

Output:

Explanation:

Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?

1. Using `GROUP BY` and `HAVING` Clause

3. Using `RANK()` Function