Transact SQL :: Removing Duplicates Rows With OVER Clause
Jun 2, 2015
I have an existing stored table with duplicate rows that I want to delete.Using a cte gives me
WITH CTE AS
(
SELECT rn = ROW_NUMBER()
OVER(
PARTITION BY employeeid, dateofincident, typeid, description
ORDER BY Id ASC), *
FROM dbo.TableName
)
DELETE FROM cte
WHERE rn > 1
This is what I want to do basically. But this is only deleting in my CTE, is there anyway I can update my existing table "TableName" with this, without using temp tables?
I'm trying to pull records from a source/staging table where there is a duplicate row in it.I don't need that as the requirement is to garbage in /garbage out.when I do that from mart and use joins btw fact and dimensions, Im not getting this duplicate record as Im using distinct/group by. If I removed it, then it returns more than 3000 rows which is not correct. Is there a way I can keep these duplicates without removing group by...Im using correct joins and filters.
I have a table with 22 million Business records. I can see that there are duplicates when I group by BusinessName and Address and Phone. I'd like to place only the duplicates into a table, with a ranking, oldest business key gets a ranking of 1.
As a bonus I'd like each group to have a distinct group name (although not necessary, just want to know how to do this)
Later after I run more verifications to make sure these are not referenced elsewhere I'll delete everything with a matchRank > 1 out of the main Business table.
DROP TABLE [dbo].[TestBusiness]; GO CREATE TABLE [dbo].[TestBusiness]( [Business_pk] INT IDENTITY(1,1) NOT NULL, [BusinessName] VARCHAR (200) NOT NULL, [Address] VARCHAR(MAX) NOT NULL,
I have a members table and have added an extra few thoushand members to it. Now I need to remove the duplicates.
It doesnt matter which duplicate i remove as long as there are unique email addresses.
so here is the format of the table:
id email firstname lastname datebirth
if i do a:
SELECT COUNT(DISTINCT Email) AS Expr1 FROM Customer
it returns 21345
and
SELECT Count(Email) FROM Customer
returns 28987
I can get the unique email addresses into another table by going:
SELECT DISTINCT emailaddress INTO DistinctCustomer FROM Customer
but this will only return unique email addresses. How do i select distinct email address and all other fields into a new table? or just remove duplicates where email address appears more then once?
HiI have inherited a web app with the following table structure, and need toproduce a table without any duplicates. Email seems like the best uniqueidentifier - so only one of each e-mail address should be in the table.Following http://www.sqlteam.com/item.asp?ItemID=3331 I have been able toget a duplicate count working:select Email, count(*) as UserCountfrom dbo.Membersgroup by Emailhaving count(*) > 1order by UserCount descBut the methods for create a new table without duplicates fail. My code forthe 2nd method is:sp_rename 'Members', 'temp_Members'select distinct *into Membersfrom temp_MembersTable....CREATE TABLE [dbo].[Members] ([MemberID] [int] IDENTITY (1, 1) NOT NULL ,[Username] [varchar] (10) COLLATE Latin1_General_CI_AS NOT NULL ,[Password] [varchar] (10) COLLATE Latin1_General_CI_AS NOT NULL ,[Email] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,[Title] [varchar] (10) COLLATE Latin1_General_CI_AS NOT NULL ,[FirstName] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,[Surname] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,[Address1] [varchar] (35) COLLATE Latin1_General_CI_AS NOT NULL ,[Address2] [varchar] (35) COLLATE Latin1_General_CI_AS NOT NULL ,[City] [varchar] (25) COLLATE Latin1_General_CI_AS NOT NULL ,[Country] [varchar] (25) COLLATE Latin1_General_CI_AS NOT NULL ,[Profession] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,[Publication] [varchar] (40) COLLATE Latin1_General_CI_AS NOT NULL ,[DateAdded] [smalldatetime] NOT NULL ,[SendMail] [smallint] NOT NULL) ON [PRIMARY]GOThanks B.
I was wondering if someone could help me with the results on this query, at the moment I am getting values repeated and I was wondering if it was possible to have some of the columns grouped, I have tried to have grouping at the end of the query but this still did not group the rows.
Thanks in advance for your answer - Sean
The structure that i'm trying to acheive is like the following: with each colour having multiple quantitys for each size:
colourdesc| sizedesc | xs | s | m | l ----------- black |qoh| | 0 | 2 | 0 | 7 ----------- white |qoh| | 0 | 0 | 0 | 0 -----------
The database has Name,Email, and skill. Though the name is distinct it is repeated as it has different skills. I would like to remove duplicate names and add the corresponding skill to the only one row.
From the stored procedure, combining 3 tables I got the output as:
NameemaildepartmentSkill ArunemailidTech teamTechnical ArunemailidTech teamLeadership ArunemailidTech teamDecision Making BinayemailidMarketingTechnical BinayemailidMarketingDecision Making
I would like to remove the duplicate Name fields and combine the Skill in a single row as other fields are same.
So the output should be
NameemaildepartmentSkill ArunemailidTech teamTechnical, Leadership, Decision Making BinayemailidMarketingTechnical,Decision Making
Hi, I'm in the midst of an Access 2003 to SQL server 2000 upsizing project and have come across a table on Sql Server that has a field that looks like it's supposed to be the PK but it contains duplicates. What I'd like to do is to have a cursor start at the first value and increment the next value by 1. Could someone explain how I'd go about this?
I have a very large table that can contain up 3 to 5 duplicate records. Every month around 100,000 new records come in. Sometimes it's an ammended record, other times is just duplicated by error.
Is it possible to keep the latest record dumped into the table and delete the others? Does SQL track the order of the data being dropped into the table?
The layout would look like this. There are 10-15 other columns in the table where adjustments can also be made.
Lease# Year Month Production 12345 2008 10 1,231 12345 2008 10 1,250 12345 2008 10 1,250
-- declared variables declare @database_name varchar(100), @table_name varchar(100), @primary_key_field varchar(100) declare @list varchar(8000) -- set values to variables set @list = '' set @database_name = 'data200802_dan' set @table_name = 'other02' set @primary_key_field = 'callid'
use database
select @list = @list + column_name + ', ' from information_schema.columns where table_name = @table_name --table name and column_name != @primary_key_field --unique identifier select @list = substring(@list, 1, len(rtrim(@list)) - 1)
--above 5 lines btw came from a helper in the msdn forum. thanks
SELECT DISTINCT @list INTO '#' + @table_name FROM @table_name @table_name + ':' IF (SELECT COUNT(*) FROM @database_name + '.dbo.' + @table_name) = 0 BEGIN INSERT INTO @database_name + '.dbo.' + @table_name + '(' + @list + ')' SELECT @list FROM '#' + @table_name END ELSE BEGIN DELETE @database_name + '.dbo.' + @table_name +' ( ' + @list + ')' GOTO @table_name END DROP TABLE '#' + @table_name
the query above is basically.. selecting all the fields from a table in database W/OUT their primary key. then putting them in a temp table.. delete all the records in the original table. then paste the records from the temp table into the original table.
is there a way for this to work? i don't know how to use the variables w/ this script. please help me correcting this query..
I have a query which finds duplicate spec_items linked to a work order. What I want to do it remove the duplicates (and in some cases there will be more than one) leaving only the record with the highest [sr.id]
select sr.id, sr.linked_to_worknumber, sr.spec_checklist_id from spec_checklist_remind sr inner join spec_checklist_remind sc on sc.linked_to_worknumber = sr.linked_to_worknumber group by sr.id,sr.linked_to_worknumber, sr.spec_checklist_id Having sr.spec_checklist_id = 30 and count(*)>1 order by sr.linked_to_worknumber
;WITH ctePreAgg AS ( select top 500 act_reference "ActivityRef", row_number() over (partition by act_reference order by act_reference) as rowno, t3.s_initials "Initials" from mytablestuff order by act_reference
[code]...
But what I would love to do next is take each of the above rows - and return the initials either in one column with all the nulls and duplicate values removed, separated by a comma ..
OR the above but using variable number of columns based on the maximum number of different initials for each row.this is not strictly required, but maybe neater for further work on the view
I am making a website where users go to a page that lists every Program in their area. The first time the page loads they see all the Programs, then then can filter it down with drop down lists. I have everything working except for the Category because some programs have more than one category. The select is working good but I get duplicates.
WHERE p.ProgramCountyID IS NOT NULL AND p.ProgramCity IS NOT NULL AND p.ProgramHours IS NOT NULL AND p.ProgramGrades IS NOT NULL AND p.ProgramTransportation IS NOT NULL AND p.ProgramID = pc.ProgramID AND pc.CategoryID IS NOT NULL
GROUP BY p.ProgramID
ORDER BY p.ProgramName ASC
When I have just p.ProgramID in the GROUP BY clause, I get the error:
"column name" is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
But when I put all the column names in the GROUP BY clause, I still get duplicates. What can I do to stop this. When the user selects a category the pc.CategoryID IS NOT NULL changes to pc.CategoryID = 3 (or whatever they select) and everything works the way its supposed to. I just want each individual program to show only once when the page first loads.
I have a quite big SQL query which would be nice to be used using UNION betweern two Select and Where clauses. I noticed that if both Select clauses have Where part between UNION other is ignored. How can I prevent this?
I found a article in StackOverflow saying that if UNION has e.g. two Selects with Where conditions other one will not work. [URL] ....
I have installed SQL Server 2014 and I tried to use tricks mentioned in StackOverflow's article but couldn't succeeded.
Any example how to write two Selects with own Where clauses and those Selects are joined with UNION?
Serial Count 001 2 the count is 2 because Serial 001 has an MSDSID of 20 and 22 002 1 the count is 1 because Serial 002 only has MSDSID 21 003 2 the count is 2 because Serial 003 has an MSDSID of 21 and 22 004 1 the count is 1 because Serial 002 only has MSDSID 23
It would be even better if the results just showed where the count is greater than 1.
I am doing some audit and i have below query, how can i get rid of duplicates from the below query any T SQL to get rid of duplicates...
I am using SP_Who2 and sql server Audit for auditing all data happening on sql server databases and dumping them to tables Audit_DBAudit abd Audit_sp_who2 and from then i am trying to get data which is not repeating/duplicate ...
SELECT A.ProgramName ,a.HostName,[Server_principal_name],[Server_instance_name],[Database_name],[Object_name],F.Statement FROM Audit_DBAudit as F Join [Audit_sp_who2] AS a on LTRIM(RTRIM(F.server_principal_name))=LTRIM(RTRIM(A.Login))
How do I only select rows with duplicate dates for each person (id)? (The actual table has approximately 13000 rows with approximately 3000 unique ids)
I have table with columns as ID, DupeID1, DupeID2. ID column is unique. DupeID1 and DupeID2 -- the combination should only be there once. I don't want reverse combination of duplicates, i.e. DupeID2, DupeID1 in the table. How can I delete the reverse duplicates from this table?
I need to make a selection on join datasets with 2 conditions and populate the results in another dataset(Report).It is working with the fist condition "AccountingTypeCharacteristicCodeId = 3"...
Please give the DML to SELECT the rows avoiding the duplicate rows. Since there is a text column in the table, I couldn't use aggregate function, group by (OR) DISTINCT for processing.
Table :
create table test(col1 int, col2 text) go insert into test values(1, 'abc') go insert into test values(2, 'abc') go insert into test values(2, 'abc') go insert into test values(4, 'dbc') go
I am trying to remove rows of data when the value in a textbox in the row is 0.
My expression for a value of a textbox in my report layout is:
=Fields!salary_FTE.Value+Fields!Leave_FTE.Value
Could you please show me how I edit this expression so a row of data does not show when the value in this textbox is 0, or, should I create this filter in my query rather than in the report layout?
Hi, everyone This is my data comming from Long StroredProcedure.In this Procedure will have 5 CTE's. I want to do with other function. i.e Removing Repetetive data form ACTION column. This ACTION column consist with BUY/SELL. Whenever two or More BUY/SELL i want to remove that one.
I have a table employee_test having the sample data. The rows with EmployeeID=6 are duplicate rows. I want to delete the duplicates retaining one row for the employeeid=6. Note :- I don't want to use a temporary table. I want to do this using a single query or at the most in a SP query batch. Please advise.
I have a report that is built using the report builder and the report model that was created. With the fields that are displayed in the report, it could be possible for the database to have multiple rows with the same values for all the fields (that are displayed in the report). The PK will be different, but this is not displayed in the report as the PK won't make sense in the report. So what happens is that the report displays just 1 record even though the database has multiple records because all the fields (that are displayed in the report) are identical. Is there a way to make the report display all the rows irrespective of duplicates?