Some Days ago , was searching for the best way to remove duplicate rows from a fairly large table (i.e. 300,000+ rows).
The rows of course will not be perfect duplicates because of the existence of the RowID identity field. and i found two good solutions.
Lets say a Table
MyTable ----------- RowID int not null identity(1,1) primary key, Col1 varchar(20) not null, Col2 varchar(2048) not null, Col3 tinyint not null
Solution 1 :
--Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) --RowId as the row to keep. Then, just delete everything that didn't have a row id: DELETE MyTable FROM MyTable LEFT OUTER JOIN ( SELECT MIN(RowId) as RowId, Col1, Col2, Col3 FROM MyTable GROUP BY Col1, Col2, Col3 ) as KeepRows ON MyTable.RowId = KeepRows.RowId WHERE KeepRows.RowId IS NULL --Use CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn))) --instead of MIN(RowId) if you have a GUID instead of an integer
Solution 2 :
delete from dupes from MyTable dupes, MyTable fullTable where dupes.dupField = fullTable.dupField and dupes.secondDupField = fullTable.secondDupField and dupes.uniqueField > fullTable.uniqueField
Hope you have enjoyed to read this post,
let me know if you have more suitable suggestions.
you might be more interested in MSSQL, database, SQL, TIPS AND TRICS, MySQL
Good Day [/ Night] ! Happy Google + ing !