T-SQL (SS2K8) :: Rank Duplicates But Only Rows Involved In Duplicates?
Oct 22, 2014
I have a table with 22 million Business records. I can see that there are duplicates when I group by BusinessName and Address and Phone. I'd like to place only the duplicates into a table, with a ranking, oldest business key gets a ranking of 1.
As a bonus I'd like each group to have a distinct group name (although not necessary, just want to know how to do this)
Later after I run more verifications to make sure these are not referenced elsewhere I'll delete everything with a matchRank > 1 out of the main Business table.
DROP TABLE [dbo].[TestBusiness];
GO
CREATE TABLE [dbo].[TestBusiness](
[Business_pk] INT IDENTITY(1,1) NOT NULL,
[BusinessName] VARCHAR (200) NOT NULL,
[Address] VARCHAR(MAX) NOT NULL,
;WITH ctePreAgg AS ( select top 500 act_reference "ActivityRef", row_number() over (partition by act_reference order by act_reference) as rowno, t3.s_initials "Initials" from mytablestuff order by act_reference
[code]...
But what I would love to do next is take each of the above rows - and return the initials either in one column with all the nulls and duplicate values removed, separated by a comma ..
OR the above but using variable number of columns based on the maximum number of different initials for each row.this is not strictly required, but maybe neater for further work on the view
I have this query below that I created to do a count, but I don't think this is what I needed.
I need to find the duplicates. Example, if
CLI_ID1 12345 has 4 CLIP records, each CLIP record should have a different CLIP rank. I need to find scenarios where 2 (or more) of the CLIP records have the same CLIP RANK. If there are duplicate CLIP_RANKs within the same CLI_ID,
Select Distinct cli_id1, count(clip_rank) countrank FROM impact.dbo.CLI LEFT JOIN impact.dbo.CLIO ON CLI.CLI_ID1 = CLIO.clio_id1
left join impact.dbo.clip ON cli_id1 = clip_id1 Where (clio_trm = '' or clio_trm = NULL or clio_trm is null) group by cli_id1 order by cli_id1
I have the piece of sql code here below that keeps giving out duplicates. How to resolve this.
isnull((select distinct (SUM(a1.ActualDebit) - SUM(a1.ActualCredit) ) from #MainAccount a1 LEFT OUTER JOIN #BudgetAccount bb ON aa.AccountID = bb.AccountID AND a1.PeriodStartdate = bb.PeriodStartDate and a1.DateMonth=bb.DateMonth and a1.Budget = bb.Budget WHERE a1.AccountID = aa.AccountID and a1.Refdate >= @FROMDATE and a1.Refdate <= @TODATE GROUP BY a1.group1, a1.Group2),0) As Actual_CurrentMonth,
WITH cte_OrderProjectType AS ( select Orderid, min(TypeID) , min(CTType) , MIN(Area) from tableA A inner join tableB B ON A.PID = B.PID left join tableC C ON C.TypeID = B.TypeID LEFT JOIN tableD D ON D.AreaID = B.ID group by A.orderid )
This query uses min to eliminate duplicates. It takes 1.30 seconds to complete..
Is there any way I can improve the query performance ?
I have tried and attached the computed results and also expecting results.
IF OBJECT_ID('tempdb..#tmpExam1')IS NOT NULL DROP TABLE #tmpExam1 IF OBJECT_ID('tempdb..#tmpExam2')IS NOT NULL DROP TABLE #tmpExam2 IF OBJECT_ID('tempdb..#tmpExam3')IS NOT NULL DROP TABLE #tmpExam3
Auto_ID Account_ID Account_Name Account_Contact Priority 1 3453463 Tire Co Doug 1 2 4363763 Computers Inc Sam 1 3 7857433 Safety First Heather 1 4 2326743 Car Dept Clark 1 5 2342567 Sales Force Amy 1 6 4363763 Computers Inc Jamie 2 7 2326743 Car Dept Jenn 2
I'm trying to delete all duplicate Account_IDs, but only for the highest priority (in this case it would be the lowest number).
I know the following would delete duplicate Account_IDs:
DELETE FROM staging_account WHERE auto_id NOT IN (SELECT MAX(auto_id) FROM staging_account GROUP BY account_id)
The problem is this doesn't take into account the priority; in the above example I would want to keep auto_ids 2 and 4 because they have a higher priority (1) than auto_ids 6 and 7 (priority 2).
How can I take priority into account and still remove duplicates in this scenario?
I have a bunch of contacts that I've scored how well their names match to other contacts in the same business. I can programmatically figure out how to parse the results, but would like to know how to do this via SQL. My problem is for Business_fk 968976 I have 7 contacts. In the end I should have 4 contacts based on name match. For the business key listed Gerardo Lopez is in the ContactScore table twice for Contact keys 7355719 and 57028145. I then have two rows like so:
Each reference each other, and 2 is a good case, a more difficult case would have key 1 listed 10 times showing a ContactMatch_fk of 2 - 11, and then Contact_fk 2 listed 10 times with a ContactMatch_fk of 1, 3-11.I know 57028145 maps to 7355719 from the first row in the ContactScore table, so when Contact_fk of 7355719 comes up I should be able to skip it and not process that match. Hopefully that makes sense. Anyway here is the test data:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ContactScore]') AND type in (N'U')) DROP TABLE [dbo].[ContactScore]; GO CREATE TABLE [dbo].[ContactScore] ( [ContactScore_pk]INT NOT NULL, [Contact_fk]INT NOT NULL,
How do I only select rows with duplicate dates for each person (id)? (The actual table has approximately 13000 rows with approximately 3000 unique ids)
I have a patient record and emergency contact information. I need to find duplicate phone numbers in emergency contact table based on relationship type (RelationType0 between emergency contact and patient. For example, if patient was a child and has mother listed twice with same number, I need to filter these records. The case would be true if there was a father listed, in any cases there should be one father or one mother listed for patient regardless. The link between patient and emergency contact is person_gu. If two siblings linked to same person_gu, there should be still one emergency contact listed.
Below is the schema structure:
Person_Info: PersonID, Person Info contains everyone (patient, vistor, Emergecy contact) First and last names Patient_Info: PatientID, table contains patient ID and other information Patient_PersonRelation: Person_ID, patientID, RelationType Address: Contains address of all person and patient (key PersonID) Phone: Contains phone # of everyone (key is personID)
The goal to find matching phone for same person based on relationship type (If siblings, then only list one record for parent because the matching phones are not duplicates).
I have a table employee_test having the sample data. The rows with EmployeeID=6 are duplicate rows. I want to delete the duplicates retaining one row for the employeeid=6. Note :- I don't want to use a temporary table. I want to do this using a single query or at the most in a SP query batch. Please advise.
I have a report that is built using the report builder and the report model that was created. With the fields that are displayed in the report, it could be possible for the database to have multiple rows with the same values for all the fields (that are displayed in the report). The PK will be different, but this is not displayed in the report as the PK won't make sense in the report. So what happens is that the report displays just 1 record even though the database has multiple records because all the fields (that are displayed in the report) are identical. Is there a way to make the report display all the rows irrespective of duplicates?
I have an existing stored table with duplicate rows that I want to delete.Using a cte gives me
WITH CTE AS ( SELECT rn = ROW_NUMBER() OVER( PARTITION BY employeeid, dateofincident, typeid, description ORDER BY Id ASC), * FROM dbo.TableName ) DELETE FROM cte WHERE rn > 1
This is what I want to do basically. But this is only deleting in my CTE, is there anyway I can update my existing table "TableName" with this, without using temp tables?
I need some help. I have created a database that looks like the following: FirstName Table link to Main Table. I have created a Stored procedure that looks like this: Create procedure dbo.StoredProcedure ( @FirstName varchar(20) ) Declare FirstNameID int Insert Into Main Table ( FirstName ) Values ( @FirstName ) Select @FirstNameID = Scope_Identity() How could I redesign this to check if a value exists and if it exists then simply use that value instead of creating a new duplicate value?
I have a dilema..... I have a databas eof about 60,000 users and i need to get rid of those users where there is a duplicate email address. I have written an asp utilty that works but is far too taxing on our little server and i thinkk itwill kill it. what it does is for each email address it compares it against all the others.... so for each address it checks against 60,000 other records 60,000 times.... you know what i mean. its pretty phucked.... i tested it on just one record and took about 5mins.
anyway ive been trying to do it in SQL with no luck
i'm trying to get duplicates out of the my database
SELECT COUNT(*) AS Amount, Firstname, surname, Internalextension FROM iac.dbo.sf_profil GROUP BY FirstName, surname, internalextension HAVING COUNT(*) > 1 order by firstname, surname
How do i alter the query just retrieve records which have firstname and lastname which are similar but different extension numbers ?
Hi, This is the query which shows me the duplicates Some of the records have more than one records I would like to know how to delete the extra records so that I will end up with one record per row.
select Pricing_Source, VaR_Identifier, Price_Date, PX_Last, Count(*) as 'count' from tblPricesClean group by Pricing_Source, VaR_Identifier, Price_Date, PX_Last having count(*) > 1 order by count desc
Is there a way to find duplicates in one field? For example my query has person_nbr and for each person_nbr on one day they could have used multiple payer_names. I want to be able to count each person_nbr one time but also I want to group by description(which is the name of the provider) and by payer name to see how many person's that the provider seen with each payer. My problem is that if the person had more than one payer they are counted twice. Is there some type of aggregate function to use the first payer in the list??
With PersonMIA (person_id,person_nbr,first_name,last_name,date_of_birth) as ( select distinct person_id,person_nbr,first_name,last_name,date_of_birth from (select count(*) as countenc,a.person_id,a.person_nbr, a.first_name,a.last_name, a.date_of_birth from person a join patient_encounter b on a.person_id = b.person_id group by a.person_id,a.person_nbr,a.first_name,a.last_name,a.date_of_birth )tmp where tmp.countenc <=1 ) select person_nbr,payer_name,first_name,last_name,description,year(create_timestamp),create_timestamp from ( select distinct c.description,tmp.person_id,tmp.person_nbr,tmp.first_name, tmp.last_name,tmp.date_of_birth,d.payer_name,b.create_timestamp from PersonMIA tmp join person a on a.person_id = tmp.person_id join patient_encounter b on a.person_id = b.person_id join provider_mstr c on b.rendering_provider_id = c.provider_id join person_payer d on tmp.person_id = d.person_id where c.description = 'Leon MD, Enrique' group by c.description,tmp.person_id,tmp.person_nbr,tmp.first_name,tmp.last_name, tmp.date_of_birth,d.payer_name,b.create_timestamp )tmp2 where year(create_timestamp) IN (2005,2006) group by person_nbr,payer_name,first_name,last_name,description,create_timestamp
Hi, I'll see if I can explain this clearly. The query below selects rows from the "hdr_ctl_nbr_status" table if the value in the field "tcn" from that table is found in the table "temp_tcn". I want all fields from the "hdr_ctl_nbr_status" table to be selected BUT only one row. In other words for a tcn with a value "12345678" there are 10 rows returned from the hdr_ctl_nbr_status table, I want only 1. Is there a way I can use SELECT DISTINCT to do this ? I know this usually functions on one or more fields but I want the DISTINCT to be on tcn only BUT return all fields in the query.
Select h.*,'' from hdr_ctl_nbr_status as h WITH (NOLOCK) where h.tcn in (select tcn from temp_tcn)
I have two columns of int data in the a table, as my example data shows below.
I want my data returned to be something like those in #test3, but my question is this, how can I do it without using #test2 and #test3?
By the way, the business requirement doesn't care it's min/max or any ID when one side has duplicated values.
Thanks!
Use tempdb Go
if object_ID ('#test') is not null drop table #test
create table #test (col1 int, col2 int) insert into #test Select 123, 222 union Select 124, 222 union Select 125, 222 union Select 111, 223 union Select 111, 224
if object_ID ('#test2') is not null drop table #test2 create table #test2 (col1 int, col2 int) Insert into #test2 Select distinct col1, min(col2) from #test group by col1
if object_ID ('#test3') is not null drop table #test3 create table #test3 (col1 int, col2 int) Insert into #test3 Select min(col1), col2 from #test2 group by col2
I am attempting to execute the Stored Procedure at the foot of thismessage. The Stored Procedure runs correctly about 1550 times, butreceive the following error three times:Server: Msg 512, Level 16, State 1, Procedure BackFillNetworkHours,Line 68Subquery returned more than 1 value. This is not permitted when thesubquery follows =, !=, <, <= , >, >= or when the subquery is used asan expression.I've done some digging, and the error message is moderatelyself-explanatory.The problem is that there is no Line 68 in the Stored Procedure. It'sthe comment line:-- Need to find out how many hours the employee is scheduled etc.Also, there are no duplicate records in the Employee table nor theWeeklyProfile table. At least I assume so - if the following SQL todetect duplicates is correct!SELECT E.*FROMEmployee Ejoin(select EmployeeIDfromEmployeeGroup by EmployeeIDhaving count(*) > 1) as E2On(E.EmployeeID = E2.EmployeeID)SELECTW.*FROMWeekProfile Wjoin(SelectWeekProfileIDFROMWeekProfileGROUP BYEmployeeID, MondayHours, WeekProfileIDHAVING COUNT(*) > 1) AS W2ONW.WeekProfileID = W2.WeekProfileIDNOTE: In the second statement, I have tried for MondayHours thruFridayHours.Anyone got any ideas? The TableDefs are set up in this thread:<http://groups-beta.google.com/group/comp.databases.ms-sqlserver/browse_frm/thread/fff4ef21e9964ab8/f5ce136923ebffc3?q=teddysnips&rnum=1&hl=en#f5ce136923ebffc3>The Stored Procedure that causes the error is here:--************************************************** ***********CREATE PROCEDURE BackFillNetworkHoursASDECLARE @EmployeeID intDECLARE @TimesheetDate DateTimeDECLARE @NumMinutes intDECLARE @NetworkCode int-- Get the WorkID corresponding to Project Code 2002SELECT@NetworkCode = WorkIDFROM[Work]WHERE(WorkCode = '2002')-- Open a cursor on a SELECT for all Network Support Employees whereany single workday comprises fewer than 7.5 hoursDECLARE TooFewHours CURSOR FORSELECTEmployeeID,CONVERT(CHAR(8), Start, 112) AS TimesheetDate,SUM(NumMins) AS TotalMinsFROM(SELECTTI.EmployeeID,W.WorkCode,TI.Start AS Start,SUM(TI.DurationMins) AS NumMinsFROMTimesheetItem TI LEFT JOIN[Work] W ON TI.WorkID = W.WorkIDWHERE EXISTS(SELECT*FROMEmployee EWHERE((TI.EmployeeID = E.EmployeeID) AND(E.DepartmentID = 2)))GROUP BY TI.EmployeeID, TI.Start, W.WorkCode) AS xGROUP BYEmployeeID,CONVERT(char(8), Start, 112)HAVINGSUM(NumMins) < 450ORDER BYEmployeeID,CONVERT(CHAR(8), Start, 112)-- Get the EmployeeID, Date and Number of Minutes from the cursorOPEN TooFewHoursFETCH NEXT FROM TooFewHours INTO @EmployeeID, @TimesheetDate,@NumMinutesWHILE (@@FETCH_STATUS=0)BEGINDECLARE @NewWorkTime datetimeDECLARE @TimesheetString varchar(50)DECLARE @Duration intDECLARE @RequiredDuration int-- Set the correct date to 08:30 - by default the cast from thecursor's select statement is middaySET @TimesheetString = @TimesheetDate + ' 08:30'SET @NewWorkTime = CAST(@TimesheetString AS Datetime)-- Need to find out how many hours the employee is scheduled to workthat day.SET @RequiredDuration = CASE (DATEPART(dw, @NewWorkTime))WHEN 1 THEN(SELECT CAST((60 * SundayHours) AS int) FROM WeekProfile WHERE(EmployeeID = @EmployeeID))WHEN 2 THEN(SELECT CAST((60 * MondayHours) AS int) FROM WeekProfile WHERE(EmployeeID = @EmployeeID))WHEN 3 THEN(SELECT CAST((60 * TuesdayHours) AS int) FROM WeekProfile WHERE(EmployeeID = @EmployeeID))WHEN 4 THEN(SELECT CAST((60 * WednesdayHours) AS int) FROM WeekProfile WHERE(EmployeeID = @EmployeeID))WHEN 5 THEN(SELECT CAST((60 * ThursdayHours) AS int) FROM WeekProfile WHERE(EmployeeID = @EmployeeID))WHEN 6 THEN(SELECT CAST((60 * FridayHours) AS int) FROM WeekProfile WHERE(EmployeeID = @EmployeeID))WHEN 7 THEN(SELECT CAST((60 * SaturdayHours) AS int) FROM WeekProfile WHERE(EmployeeID = @EmployeeID))ENDIF @NumMinutes < @RequiredDurationBEGIN-- Set the Start for the dummy work block to 08:30 + the number ofminutes the employee has already worked that daySET @NewWorkTime = DateAdd(minute, @NumMinutes, @NewWorkTime)-- Set the duration for the dummy work block to be required durationless the amount they've already workedSET @Duration = @RequiredDuration - @NumMinutes-- Now we have the correct data - insert into table.INSERT INTO TimesheetItem(EmployeeID,Start,DurationMins,WorkID)VALUES(@EmployeeID,@NewWorkTime,@Duration,@NetworkCode)ENDFETCH NEXT FROM TooFewHours INTO @EmployeeID, @TimesheetDate,@NumMinutesENDCLOSE TooFewHoursDEALLOCATE TooFewHoursGO--************************************************** ***********ThanksEdward
I have a table, TEST_TABLE, with 6 columns (COL1, COL2, COL3, COL4,COL5, COL6).... I need to be able to select all columns/rows whereCOL3, COL4, and COL5 are unique....I have tried using DISTINCT and GROUP BY, but both will only allow meto access columns COL3, COL4, and COL5..... i need access to allcolumns...I just want to get rid of duplicate rows (duplicates ofCOL3, COL4, and COL5)...Thanks in advance.Joe
Hello! Just looking for advise on dealing with duplicates in database. I have a contact table that have a bunch of duplicated customer records. My goal is to combine all duplicated records into one record. This involves couple tables:contact,contact history ,calendar. All tables related by common column "accountno". What would be the best approach for this?