SQL Query: Delete Duplicates from a table

Spread the love

Question

We need to remove the duplicate rows of data from our database.

Data

For this exercise, I have created dummy data. The SQL file can be found here.

Solution

Let’s first explore our data

SELECT *
FROM Employee;

Employee_ID	FirstName	LastName	Phone	Email
1	Adam	Owens	1112223334	adamowens@gmail.com
2	Michelle	Wilis	2223334445	michellewilis@gmail.com
3	Natasha	Lee	3334445556	natashalee@gmail.com
4	Riley	Jones	4445556667	rileyjones@gmail.com
5	Adam	Owens	1112223334	adamowens@gmail.com
6	Natasha	Lee	3334445556	natashalee@gmail.com

We can see that there are two rows (ID = 1 & 5, 3 & 6) with the same data in all the columns except Employee ID. Let’s figure out a way to remove those rows. We can break the query into two parts.

Write a query that identifies the duplicate rows by grouping the duplicating columns.
- We will use ROW_NUMBER() for this.
We can then use the above query as a CTE to join our main table and remove the duplicate data.

Let’s write a query to determine the duplicate data

SELECT 
    *,
    ROW_NUMBER() OVER( PARTITION BY FirstName, LastName, Phone, Email ORDER BY Employee_ID DESC) AS row_num
FROM Employee;

Employee_ID	FirstName	LastName	Phone	Email	row_num
5	Adam	Owens	1112223334	adamowens@gmail.com	1
1	Adam	Owens	1112223334	adamowens@gmail.com	2
2	Michelle	Wilis	2223334445	michellewilis@gmail.com	1
6	Natasha	Lee	3334445556	natashalee@gmail.com	1
3	Natasha	Lee	3334445556	natashalee@gmail.com	2
4	Riley	Jones	4445556667	rileyjones@gmail.com	1

ROW_NUMBER outputs 1 for the first unique value and then increments by 1 for a duplicate value.
In our query, we partition the data by FirstName, LastName, Phone, and Email columns since those should tell us which row is duplicated.
We then order our Employee_ID in descending order to get the latest ID at the top.

We can now simply use the above query as a CTE and remove the rows where row_num is greater than 1.

-- Let's remove the duplicate data
WITH CTE AS (
    SELECT 
        *,
        ROW_NUMBER() OVER( PARTITION BY FirstName, LastName, Phone, Email ORDER BY Employee_ID DESC) AS row_num
    FROM Employee
)
DELETE FROM CTE
WHERE row_num > 1;

-- Let's view the data again
SELECT *
FROM Employee;

Employee_ID	FirstName	LastName	Phone	Email
2	Michelle	Wilis	2223334445	michellewilis@gmail.com
4	Riley	Jones	4445556667	rileyjones@gmail.com
5	Adam	Owens	1112223334	adamowens@gmail.com
6	Natasha	Lee	3334445556	natashalee@gmail.com

Hopefully, this series helps you with the most common questions asked in a SQL Interview. Feel free to drop any questions you have encountered and don’t find an article in the series.

You can find the SQL file here.

Spread the love