Skip to main content

Use a Common Table Expression (CTE) in SQL Server to group duplicate rows using OVER PARTITION BY and then version each row of group using ROW_NUMBER(). Later on, delete all the duplicates in the groups except the first row i.e., with ROW_NUMBER() = 1. This way all duplicates will be deleted. This approach is much cleaner and readable and on top of it, there is no need to touch schema of the table at all.

-- ****************************************************************************
-- Article 1: SQL Server - Deleting Duplicate Records
-- http://sqlauthentic.blogspot.com/2010/09/sql-server-deleting-duplicate-records.html
-- ****************************************************************************

-- An easy method for deleting the duplicate records from a table is by using the
-- Common table expressions.
--
-- Let us create a table - State with columns StateID and StateName of data types
-- Int and Varchar(20) respectively.

Create table State (StateID Int,StateName varchar(10))
Go
Insert into State
Select 1, 'XXXX'
Union All
Select 2, 'YYYY'
Union All
Select 1, 'XXXX'
Union All
Select 3, 'ZZZZ'
Union All
Select 2, 'YYYY'
Union All
Select 1, 'XXXX'
Union All
Select 3, 'ZZZZ'
Union All
Select 1, 'XXXX'
Union All
Select 2, 'YYYY'
Union All
Select 2, 'YYYY'
Union All
Select 1, 'XXXX'
Union All
Select 3, 'ZZZZ'

-- Now if we check the data in the State table, it will return 12 records with
-- duplicate values as shown below:

select * from State;

-- Now our intention is to delete the duplicate records so that only 3 distinct
-- records must exist. For this we can use the Common table expression as below:

with DistinctStates (
    StateID, StateName, RecCount
) as (
    select StateId, StateName,
        row_number() over (
            partition by StateId, StateName
            order by StateId
        ) as RecCount
    from State
) delete from DistinctStates
    where RecCount > 1;

-- Once we execute the above script all the duplicate Records will be deleted from
-- the table resulting 3 distinct records:

select * from State;

-- ****************************************************************************
-- Article 2: Delete Duplicate Records using CTE in TSQL
-- http://www.intstrings.com/ramivemula/articles/delete-duplicate-records-using-cte-in-tsql/
-- ****************************************************************************

-- ============================================================================
-- Delete Duplicate Records using CTE in TSQL
--
-- In this tutorial, a much more cleaner approach without touching the schema will
-- be demonstrated. We use Common Table Expression (CTE) and we group duplicate
-- rows using OVER PARTITION BY and then version each row of group using
-- ROW_NUMBER(). Later we delete all the duplicates in the groups except the first
-- row i.e., with ROW_NUMBER() = 1. This way we can delete all duplicates. This
-- approach is much cleaner and readable and on top of it, there is no need to
-- touch schema of the table at all.
--
-- Code is self explanatory and comments are provided for ease of understanding.
--
-- Source:
-- http://www.intstrings.com/ramivemula/articles/delete-duplicate-records-using-cte-in-tsql/
-- ============================================================================

/*-------------------------------------------------------------------------
*************************** Create TestDB Database ************************
-------------------------------------------------------------------------*/

if not exists (select name from master.dbo.sysdatabases where name = 'TestDB')
    begin
        create database TestDB;
    end;
go

use [TestDB];
go

/*-------------------------------------------------------------------------
*************************** Create Employee Table *************************
-------------------------------------------------------------------------*/

if exists (select * from sys.sysobjects where type = 'U' and name = 'Employee')
    begin
        drop table dbo.Employee;
    end;
go

create table dbo.Employee(
    EmpId int not null,
    EmpName nvarchar(32) not null
);
go

/*-------------------------------------------------------------------------
********************** Enter duplicate Employee Details *******************
-------------------------------------------------------------------------*/

insert into [dbo].[Employee] ([EmpId],[EmpName]) values (1,'Rami');
insert into [dbo].[Employee] ([EmpId],[EmpName]) values (2,'Venu');
insert into [dbo].[Employee] ([EmpId],[EmpName]) values (3,'Todd');
insert into [dbo].[Employee] ([EmpId],[EmpName]) values (4,'Young');

insert into [dbo].[Employee] ([EmpId],[EmpName]) values (1,'Rami');
insert into [dbo].[Employee] ([EmpId],[EmpName]) values (2,'Venu');
insert into [dbo].[Employee] ([EmpId],[EmpName]) values (3,'Todd');
insert into [dbo].[Employee] ([EmpId],[EmpName]) values (4,'Young');

insert into [dbo].[Employee] ([EmpId],[EmpName]) values (1,'Rami');
insert into [dbo].[Employee] ([EmpId],[EmpName]) values (2,'Venu');
insert into [dbo].[Employee] ([EmpId],[EmpName]) values (3,'Todd');
insert into [dbo].[Employee] ([EmpId],[EmpName]) values (4,'Young');
go

/*-----------------------------------------------------------------------------
 ************************ Delete Duplicates using CTE **********************

We use OVER PARTITION BY to group duplicate records, where we suplly EmpId and
EmpName and grouping the records based on their values. ROW_NUMBER() is used as
to version each row of the group. Then using a DELETE query on CTE, the
underlying Employee Table duplicates are removed.

****************************************************************************
-----------------------------------------------------------------------------*/

with DeleteRecords as (
    select EmpId,
        EmpName,
        ROW_NUMBER() over(partition by EmpId, EmpName order by EmpId) as DeleteNumber
    from dbo.Employee
)
delete from DeleteRecords where DeleteNumber > 1;

/*-------------------------------------------------------------------------
************************* Select from Employee Table **********************
-------------------------------------------------------------------------*/

select * from dbo.Employee order by EmpId;

/*-------------------------------------------------------------------------
**************************** Drop TestDB Database *************************
-------------------------------------------------------------------------*/

use master;
go

if exists (select name from master.dbo.sysdatabases where name = 'TestDB')
    begin
        drop database TestDB;
    end;
go