How to Delete Duplicate Rows in MySQL?
Duplicate rows can cause problems with data accuracy and integrity. They can also make it difficult to query and analyze data. In this article, we will explain several methods to remove duplicate rows from your MySQL tables, ensuring your data stays clean and accurate.
Note: Some features are only supported in MySQL 8.0 and later versions. If you are using an older version of MySQL, these features may result in errors.
Common Ways to Delete Duplicate Rows in MySQL
There are multiple strategies for handling and removing duplicate rows in MySQL:
- Using the DELETE Statement
- Using the DISTINCT Keyword
- Using the GROUP BY Clause
- Using the HAVING Clause
Demo MySQL Database
Let's look at different ways to delete duplicate rows from a table in MySQL with these practical examples. we will first create a sample customers table and insert duplicate values in it.
CREATE TABLE customers (
customer_id INT,
customer_name VARCHAR(255),
email VARCHAR(255)
);
INSERT INTO customers (customer_id, customer_name, email)
VALUES
(1, 'John Doe', 'john.doe@example.com'),
(2, 'Jane Doe', 'jane.doe@example.com'),
(3, 'Muzamil Amin', 'Muzamilaminitoo@gmail.com'),
(1, 'John Doe', 'john.doe@example.com'),
(4, 'Alice Johnson', 'alice.johnson@example.com'),
(2, 'Jane Doe', 'jane.doe@example.com');
Output:
1. Remove Duplicate Rows Using the DELETE Statement
The DELETE statement can be used to delete duplicate rows from a table. The following is an example of keeping one occurrence of each customer_id
in the customers
table while removing the duplicate entries.
WITH CTE AS (
SELECT customer_id,
customer_name,
email,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY customer_id) AS row_num
FROM customers
)
DELETE FROM customers
WHERE customer_id IN (
SELECT customer_id
FROM CTE
WHERE row_num > 1
)
AND EXISTS (
SELECT 1
FROM CTE
WHERE customers.customer_id = CTE.customer_id
AND CTE.row_num > 1
);
Explanation:
- CTE (Common Table Expression): The
ROW_NUMBER()
function is used to assign a unique number to each row with the samecustomer_id
. The first occurrence getsrow_num = 1
. - DELETE Statement: Deletes rows where
row_num > 1
, which corresponds to duplicates, keeping the first occurrence of eachcustomer_id
. EXISTS
Clause: TheEXISTS
clause ensures that the rows are properly matched from the CTE, ensuring that we are deleting only the duplicates of eachcustomer_id
2. Remove Duplicate Rows Using the DISTINCT Keyword
The DISTINCT keyword can be used to prevent duplicate rows from being returned in a query result. The following is an example of how to use the DISTINCT keyword to prevent duplicate rows from being returned in a query.
SELECT DISTINCT customer_id
FROM customers;
Output: This query will return a list of all of the unique customer IDs in the customers table as shown in Table 2.
3. Remove Duplicate Rows Using the GROUP BY Clause
The GROUP BY clause can be used to group rows in a table by one or more columns. The following is an example of how to use the GROUP BY clause to group rows in the customers table by customer ID:
SELECT customer_id
FROM customers
GROUP BY customer_id;
This query will return a list of all of the unique customer IDs in the customers table, along with the number of rows associated with each customer ID.
4. Remove Duplicate Rows Using the HAVING Clause
The HAVING clause can be used to filter the results of a GROUP BY query. The following is an example of how to use the HAVING clause to filter the results of a GROUP BY query to only include groups with more than one row:
SELECT customer_id
FROM customers
GROUP BY customer_id
HAVING COUNT(*) > 1;
This query will return a list of all of the customer IDs in the customers table that are associated with more than one row.
Conclusion
Duplicate rows can be a problem in MySQL databases. There are a few different ways as we discussed above to delete duplicate rows from MySQL tables. The best method to use depends on the specific situations like the number of duplicate rows, the size of the table, the performance of the MySQL server and the desired results. In general, the DELETE statement is the most efficient way to delete duplicate rows from a table.