October 15, 2024

SQL Query: Delete Duplicates from a table

Spread the love

Question

We need to remove the duplicate rows of data from our database.

Data

For this exercise, I have created dummy data. The SQL file can be found here.

Solution

Let’s first explore our data

SELECT *
FROM Employee;
Employee_IDFirstNameLastNamePhoneEmail
1AdamOwens1112223334adamowens@gmail.com
2MichelleWilis2223334445michellewilis@gmail.com
3NatashaLee3334445556natashalee@gmail.com
4RileyJones4445556667rileyjones@gmail.com
5AdamOwens1112223334adamowens@gmail.com
6NatashaLee3334445556natashalee@gmail.com

We can see that there are two rows (ID = 1 & 5, 3 & 6) with the same data in all the columns except Employee ID. Let’s figure out a way to remove those rows. We can break the query into two parts.

  • Write a query that identifies the duplicate rows by grouping the duplicating columns.
    • We will use ROW_NUMBER() for this.
  • We can then use the above query as a CTE to join our main table and remove the duplicate data.

Let’s write a query to determine the duplicate data

SELECT 
    *,
    ROW_NUMBER() OVER( PARTITION BY FirstName, LastName, Phone, Email ORDER BY Employee_ID DESC) AS row_num
FROM Employee;
Employee_IDFirstNameLastNamePhoneEmailrow_num
5AdamOwens1112223334adamowens@gmail.com1
1AdamOwens1112223334adamowens@gmail.com2
2MichelleWilis2223334445michellewilis@gmail.com1
6NatashaLee3334445556natashalee@gmail.com1
3NatashaLee3334445556natashalee@gmail.com2
4RileyJones4445556667rileyjones@gmail.com1
  • ROW_NUMBER outputs 1 for the first unique value and then increments by 1 for a duplicate value.
  • In our query, we partition the data by FirstName, LastName, Phone, and Email columns since those should tell us which row is duplicated.
  • We then order our Employee_ID in descending order to get the latest ID at the top.

We can now simply use the above query as a CTE and remove the rows where row_num is greater than 1.

-- Let's remove the duplicate data
WITH CTE AS (
    SELECT 
        *,
        ROW_NUMBER() OVER( PARTITION BY FirstName, LastName, Phone, Email ORDER BY Employee_ID DESC) AS row_num
    FROM Employee
)
DELETE FROM CTE
WHERE row_num > 1;

-- Let's view the data again
SELECT *
FROM Employee;
Employee_IDFirstNameLastNamePhoneEmail
2MichelleWilis2223334445michellewilis@gmail.com
4RileyJones4445556667rileyjones@gmail.com
5AdamOwens1112223334adamowens@gmail.com
6NatashaLee3334445556natashalee@gmail.com

Hopefully, this series helps you with the most common questions asked in a SQL Interview. Feel free to drop any questions you have encountered and don’t find an article in the series.


You can find the SQL file here.


Spread the love

Leave a Reply

Your email address will not be published. Required fields are marked *