Saturday, April 6, 2013

How can I remove duplicate rows?

How can I remove duplicate rows?



How can I remove duplicate rows?




Some Days ago , was searching for the best way to remove duplicate rows from a fairly large table (i.e. 300,000+ rows).

The rows of course will not be perfect duplicates because of the existence of the RowID identity field. and i found two good solutions.


Lets say a Table

MyTable
-----------
RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null




Solution 1 :


--Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) 
--RowId as the row to keep. Then, just delete everything that didn't have a row id:


DELETE MyTable 
FROM MyTable
LEFT OUTER JOIN (
   SELECT MIN(RowId) as RowId, Col1, Col2, Col3 
   FROM MyTable 
   GROUP BY Col1, Col2, Col3
) as KeepRows ON
   MyTable.RowId = KeepRows.RowId
WHERE
   KeepRows.RowId IS NULL

--Use
CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn))) 
--instead of MIN(RowId) if you have a GUID instead of an integer




Solution 2 :


delete from dupes
from MyTable dupes, MyTable fullTable
where dupes.dupField = fullTable.dupField 
  and dupes.secondDupField  = fullTable.secondDupField 
  and dupes.uniqueField > fullTable.uniqueField






Hope you have enjoyed to read this post,
let me know if you have more suitable suggestions.

you might be more interested in MSSQL, database, SQL, TIPS AND TRICSMySQL

Good Day [/ Night] ! Happy Google + ing ! 




0 comments:

Post a Comment

Any Questions or Suggestions ?

About

Professional & Experienced Freelance Developer From India, Technologist, Software Engineer, internet marketer and Open Sources Developer with experience in Finance, Telecoms and the Media. Contact Me for freelancing projects.

Enter your email address:

Delivered by FeedBurner