T-SQL (SS2K8) :: Delete And Merge Duplicate Records From Joined Tables?
Oct 21, 2014
Im trying to delete duplicate records from the output of the query below, if they also meet certain conditions ie 'different address type' then I would merge the records. From the following query how do I go about achieving one and/or the other from either the output, or as an extension of the query itself?
Im trying to delete duplicate records from the output of the query below, if they also meet certain conditions ie 'different address type' then I would merge the records. From the following query how do I go about achieving one and/or the other from either the output, or as an extension of the query itself?
Delete and merge duplicate records from joined tables? I am trying to delete duplicate records from the output of the query below, if they also meet certain conditions ie 'different address type' then I would merge the records. From the following query how do I go about achieving one and/or the other from either the output, or as an extension of the query itself?
Select a1z103acno AccountNumber, a1z103frnm FirstName, a1z103lanm LastName, a1z103ornm OrgName, a3z103adr1 AddressLine1, A3z103city City, A3z103st State, A3z103zip Zip, a6z103area AreaCode, a6z103phon PhoneNumber, a8z103mail Email from proddta.fz103a1 with (nolock) inner join proddta.fz103a2 with (nolock) ON a1z103acno = a2z103acno INNER JOIN proddta.fz103a3 with (nolock) ON a2z103adid = a3z103adid and a2z103actv = 'Y' and a2z103prim = 'Y' LEFT OUTER JOIN proddta.fz103a5 with (nolock) ON a1z103acno = a5z103acno and a5z103actv = 'y' and a5z103prim = 'Y' INNER JOIN proddta.fz103a6 with (nolock) ON a5z103phid = a6z103phid LEFT OUTER JOIN proddta.fz103a8 with (nolock) ON a1z103acno = a8z103acno and a8z103actv = 'Y' and a8z103prim = 'Y'
Hi All,I've the following table with a PK defined on an IDENTITY column(INSERT_SEQ):CREATE TABLE MYDATA (MID NUMERIC(19,0) NOT NULL,MYVALUE FLOAT NOT NULL,TIMEKEY INTEGER NOT NULL,TIMEKEY_DTTM DATETIME NULL,IID NUMERIC(19,0) NOT NULL,EID NUMERIC(19,0) NOT NULL,INSERT_SEQ NUMERIC(19,0) IDENTITY(1,1) NOT NULL)GOALTER TABLE MYDATAADD CONSTRAINT PK_MYDATAPRIMARY KEY (INSERT_SEQ)GOThe TIMEKEY_DTTM field is generated, from the value actually insertedinto theTIMEKEY field, by the following trigger:CREATE TRIGGER TIMEKEY1ON MYDATAFOR INSERT ASBEGINDECLARE @M_TIMEKEY_DTTM DATETIMESELECT @M_TIMEKEY_DTTM = DATEADD(SECOND, INS.TIMEKEY +EP.GMT_OFFSET * 0 ,'1970-01-01 00:00:00.000')FROM INSERTED INS, LOCATIONINFO EPWHERE INS.EID = EP.EIDUPDATE MYDATASET TIMEKEY_DTTM = @M_TIMEKEY_DTTMFROM INSERTED INS, MYDATA MDWHERE MD.INSERT_SEQ = INS.INSERT_SEQENDGOThere is also a composite, non unique, index defined on thetuple:(MID,IID,TIMEKEY,EID)CREATE INDEX IX_METDATA ON MYDATA (MID,IID,TIMEKEY,EID)GOAs a consequence of an application design change, I would also changethis index to be UNIQUE, but when I try to drop and create it I get anerror, because the tables stores some duplicated rows...In order to succesfully upgrade the index definition, I wrote some DMLstaementsto lookup and remove the duplicated rows, keeping only the firstrecord inserted, i.e. the one with the lowest INSERT_SEQ:---- This table stores then umber of duplicated records eventuallydiscovered-- into the MYDATA table; the initial value for the NUM_DUPLICATESfield is-- 0 (no duplicated record)--DROP TABLE DUPLICATESGOCREATE TABLE DUPLICATES (TABLENAME VARCHAR(17),NUM_DUPLICATES NUMERIC(19,0) )GOINSERT INTO DUPLICATES VALUES ('MYDATA',0)GOINSERT INTO DUPLICATES VALUES ('CATEGORIESDATA',0)GO---- ///////// CLEAN UP OF MYDATA TABLE--DROP TABLE TMP_MYDATAGOCREATE TABLE TMP_MYDATA (MID NUMERIC(19,0) NOT NULL,TIMEKEY INTEGER NOT NULL,IID NUMERIC(19,0) NOT NULL,EID NUMERIC(19,0) NOT NULL,INSERT_SEQ NUMERIC(19,0) )GO---- Insert into the TMP_MYDATA table all the duplicated records for-- the tuple (MID,IID,TIMEKEY,EID) and NULL for the INSERT_SEQ field--INSERT INTO TMP_MYDATA (MID,IID,TIMEKEY,EID)SELECT MID,IID,TIMEKEY,EIDFROM MYDATAGROUP BY MID,IID,TIMEKEY,EIDHAVING COUNT(*)>1GO---- Updates the INSERT_SEQ field to the lowest value in the group-- of duplicated records--UPDATE TMP_MYDATASET TMP_MYDATA.INSERT_SEQ = (SELECT MIN(INSERT_SEQ)FROM MYDATAWHERE TMP_MYDATA.MID = MYDATA.MID ANDTMP_MYDATA.IID = MYDATA.IID ANDTMP_MYDATA.TIMEKEY = MYDATA.TIMEKEY ANDTMP_MYDATA.EID = MYDATA.EID )GO---- Updates the value of NUM_DUPLICATES for the MYDATA table.--UPDATE DUPLICATESSET NUM_DUPLICATES = (SELECT COUNT(*) FROM TMP_MYDATA)WHERE TABLENAME = 'MYDATA'GO---- Delete from the MYDATA table all the duplicated records,-- keeping only the row with the lowest INSERT_SEQ-- The delete is performed only if there are duplicated recors;-- this is achieved using a "short circuit" AND on the number ofrecords-- stored into the NUM_DUPLICATES field of the DUPLICATES table for-- the MYDATA table...--DELETE FROM MYDATAWHERE ( SELECT NUM_DUPLICATES FROM DUPLICATES WHERE TABLENAME ='MYDATA') > 0 ANDEXISTS ( SELECT 1FROM TMP_MYDATAWHERE MYDATA.MID = TMP_MYDATA.MID ANDMYDATA.IID = TMP_MYDATA.IID ANDMYDATA.TIMEKEY = TMP_MYDATA.TIMEKEY ANDMYDATA.EID = TMP_MYDATA.EID ANDMYDATA.INSERT_SEQ > TMP_MYDATA.INSERT_SEQ )GOThis tecnique works fine on a normal table (1M recs) but is not veryperformanton huge tables (>10M records)!Do you know a better way to achieve the task of removing all theduplicates records, preserving the lowest INSERT_SEQ betwee theduplicates and also preserving the sequence seed, so that a new recordinserted at time t1>t0 is enumerated with an INSERT_SEQ|t1 >max(INSERT_SEQ)|t0 ?Thanks a lot for your help!PatrizioPS. sorry for such a large post!
I have a script that is supposed to run thru 2 joined tables and update a field in the 3rd table. The script works but takes approx. 4 hours to run against 250k records.
UPDATE a SET Con_Mailings = STUFF((SELECT '; ' + c.ListName FROM [server].[xxxxx_MSCRM].[dbo].ListBase c with (nowait) INNER JOIN [server].[xxxxxx_MSCRM].[dbo].[ListMemberBase] b with (nowait) ON b.ListID = c.ListID WHERE b.EntityID = a.TmpContactID FOR XML PATH('')),1,1,'') FROM [xx_Temp].[dbo].[Lyris_CombinedTest] a
I should end up with something like this in the con_mailings field:
Table A has columns CompressedProduct, Tool, Operation
Table B in a differnt database has columns ID, Product, Tool Operation
I cannot edit table A. I can select records from A and insert into B. And I can select only the records that are in both tables.
But I want to be able to select any records that are in table A but not in Table B.
ie. I want to select records from A where the combination of Product, Tool and Operaton does not appear in Table B, even if all 3 on their own do appear.
This code return all the records from A. I need to filter out the records found in Table B.
SELECT ID, CompressedProduct, oq.Tool, oq.Operation FROM OPENQUERY (Lisa_Link, 'SELECT DISTINCT CompressedProduct, Tool, Operation FROM tblToolStatus ts JOIN tblProduct p ON ts.ProductID = p.ProductID JOIN tblTool t ON ts.ToolID = t.ToolID JOIN tblOperation o ON ts.OperationID = o.OperationID WHERE ts.ToolID=66 ') oq LEFT JOIN Family f on oq.CompressedProduct = f.Product and oq.Tool = f.Tool and oq.Operation = f.Operation
Hello, I have the following Query: 1 declare @StartDate char(8)2 declare @EndDate char(8)3 set @StartDate = '20070601'4 set @EndDate = '20070630'5 SELECT Initials, [Position], DATEDIFF(mi,[TimeOn],[TimeOff]) AS ProTime6 FROM LogTable WHERE 7 [TimeOn] BETWEEN @StartDate AND @EndDate AND8 [TimeOff] BETWEEN @StartDate AND @EndDate9 ORDER BY [Position],[Initials] ASC The query returns the following data: Position Initials ProTime -------------------------------------------------- -------- ----------- ACAD JJ 127 ACAD JJ 62 ACAD KK 230 ACAD KK 83 ACAD KK 127 ACAD TD 122 ACAD TJ 127
What I'm having trouble with is the fact that I need to return a results that has the totals for each set of initials for each position. For Example, the final output that I'm looking to get is the following: Postition Initials ProTime ACAD JJ 189ACAD KK 440ACAD TD 122ACAD TJ 127 Any assistance greatly appreciated.
I use a tabel for storin log data from a mail server. I noticed that I'm getting duplicate records, is there a way to delete the socond and/or third entry so I dont have any duplicates?
Userid is auto number, lastname and emailaddress are PK.
I want to delete duplicate records. If lastname and emailaddress are the same, only keep a record which createdate is the most newest date. See above example I only want to the record which userid is 3. I have alreday created a code which I attached below. This code onle keep a record which userid is 1.
Anybody can help me to solve this problem? Thanks.
============== My current code ==================== delete from userprofile where userprofile.userid in --list all rows that have duplicates (select p.userid from userprofile as p where exists (select lastname, emailaddress from userprofile where lastname = p.lastname and emailaddress = p.emailaddress group by lastname, emailaddress having count (userid)>1)) and userprofile.userid not in --list on row from each set of duplicate (select min(p.userid) from userprofile as p where exists (select lastname, emailaddress from userprofile where lastname = p.lastname and emailaddress = p.emailaddress group by lastname, emailaddress having count (userid)>1) group by lastname, emailaddress)
I have a little dilemma. I have a table ALLTABLE that has duplicate records and I want to delete them. ALLTABLE has these columns with these values for example:
As you can see there is some duplicates in the first 3 rows and the final 2 (the entity number is the only difference). I want the table to look like this:
Policy Premium Class State Entity Number ADC-WC-0010005-0 25476 63 31 1 ADC-WC-0010005-0 1457 63 29 4 ADC-WC-0010092-1 2322 63 37 1 ADC-WC-0010344-0 515 63 01 1
Thank you so much for the help. It is really appreciated.
I have the following table customerid customername ------------------------ 1 AAA 1 AAA 2 BBB 2 BBB 2 BBB 3 CCC 3 CCC
Here, I need to delete duplicate records from the above table. After deleting the duplicate records the table should be like this: customerid customername ------------------------ 1 AAA 2 BBB 3 CCC
We have a data warehouse staging database in which we capture change history for hundreds of tables from a source system. In the source system, records are updated in place, but in our data warehouse we capture these changes by "terminating" the existing record and adding a new record reflecting the changes. In the data warehouse we add two columns to every table -- effective_date and expiration_date -- which indicate the dates the record was in effect in the source system. By convention, an expiration_date of 6/6/2079 means the record is currently still active in the source system. Each day we simply compare yesterday's version of the record (in the data warehouse) against today's version (in the source system). If differences are found in any of the columns, we terminate the record and add a new one, setting those dates appropriately.
In this example, the employee_id column is the natural key in the source system. We add the effective_date and expiration_date in the data warehouse, so those three columns together make up the key in the data warehouse. The employee_name, employee_dept, and last_login_date columns all come from the source system as well.
In the select output, you can follow the trail of changes for each of these three employees. Bob moved from dept 7 to 8 at some point; Frank didn't change departments at all; Cheryl moved from dept 6 to 9 and later back to 6. However, the last_login_date was updated frequently for all these employees.
We've tracked hundreds of tables this way for years, some with hundreds of columns. For optimization purposes, I'm now interested in trimming the fat a bit. That is, we track changes in many columns that we don't really need in our data warehouse. Some of these columns are rapidly-changing, causing all sorts of unnecessary terminate/inserts in the data warehouse. My goal is to remove these columns, reclaim the disk space and increase the ETL speed. So in this example, let's get rid of the last_login_date column.
alter table mytbl drop column last_login_date select * from mytbl order by employee_id, effective_date
Now in the select output, you can see we have many "effective duplicate" records. For example, nothing changed for Bob between 1/1/2014 and 1/31/2014 -- those really should be one record, not three. Here's the challenge: I'm looking for an efficient way to merge these "effective duplicates" together, through set-based sql updates/deletes/inserts (hoping to avoid any RBAR operations). Here's what the table ultimately should look like (cheating to get there):
Note that Bob only has two records (he changed department), Frank only has one record (no changes), and Cheryl has three records (two department changes).
My inclination would be to drop the unwanted columns, then GROUP BY all the remaining columns from the source system, and taking the MIN effective_date and MAX expiration_date. However, this doesn't work for cases like Cheryl's -- she moved to another department, then back again, so that change history needs to be retained.
As I mentioned, we have hundreds of tables, and I'd like to strip out dozens (maybe hundreds) of unused columns, so ultimately there will be millions of these pseudo-duplicates that need to be merged together. These are huge tables, so I really need to find an efficient set-based approach to this.
I have inherited a database with no primary keys in it. :(
There are, of course, duplicate records by the "logical key", i.e., the list of columns that should have been used in a primary key, if they only had one. I can easily identify when I have duplicate records using group by and having clauses. That's the easy part.I can identify duplicate records by "key columns" and the subset of those records that are "identical duplicates on all columns". I need to delete the "extra copies"
I can't do it by hand using sql server manager, it won't let me, because it recognizes that more than one row matches that row. In Oracle, I would use the pseudo-column "rowid", which would expose the internal identifier for the row.SqlServer doesn't appear to have this concept. is there another way other than creating an empty shadow table, copying all the distinct duplicate records into the shadow table, deleting all the duplicates in the old table, then copying the distinct duplicates back into the old table.
I uploaded some data about 2 or 3 times and it keep appending it to thetable.Now I want to keep only first duplicate and delete rest of.Suppose part number 123 has been added 3 times so I want to keep only 1record.Thanks
I was importing records via DTSWizard, and I was having problems so I turned off Enforce Replicaton, Enforce FK Constraints on a couple of fields. I'm new with SQL Server so I'm not sure if this even caused the problem. (Do I need to turn these back on, or is this a Developer switch of some kind?)
The end result left me with duplicate records in the table, and I'm not able to delete any of them. This is the Error I got...
A problem occurred attempting to delete row 1. Error Source: Microsoft.VisualStudio,Datatools. Error Message: The row value(s) updated or deleted either do not make the row unique or they after multiple rows(2 rows).
If someone could tell me what I need to do so I can delete the records I'd really appreciate it.
Hello Frnds....Can anybody give the answer of this question as How to Delete duplicate records from Table ? I Know that with check option and also with Unique Constraint we can avoid to enter duplicate records in table but How to delete from table which does not have any constraints ?
any useful SQL Queries that might be used to identify lists of potential duplicate records in a table?
For example I have Client Database that includes a table dbo.Clients. This table contains various columns which could be used to identify possible duplicate records, such as Surname | Forenames | DateOfBirth | NINumber | PostalCode etc. . The data contained in these columns is not always exactly the same due to differences caused by user data entry; so some records may have missing data from some of the columns and there could be spelling differences too. Like the following examples:
1 | Smith | John Raymond | NULL | NI990946B | SW12 8TQ 2 | Smith | John | 06/03/1967 | NULL | SW12 8TQ 3 | Smith | Jon Raymond | 06/03/1967 | NI 99 09 46 B | SW12 8TQ
The problem is that whilst it is easy for a human being to review these 3 entries and conclude that they are most likely the same Client entered in to the database 3 times; I cannot find a reliable way of identifying them using a SQL Query.
I've considered using some sort of concatenation to a new column, minus white space and then using a "WHERE column_name LIKE pattern" query, but so far I can't get anything to work well enough. Fuzzy Logic maybe?
the results would produce a grid something like this for the example above:
ID | Surname | Forenames | DuplicateID | DupSurname | DupForenames 1 | Smith | John Raymond | 2 | Smith | John 1 | Smith | John Raymond | 3 | Smith | Jon Raymond 9 | Brown | Peter David | 343 | Brown | Pete D next batch of duplicates etc etc . . . .
Hi,I have an sql database that has the primary key set to three fields,but has not been set as unique(I didn't create the table).I have 1 record that has 2 duplicates and I am unable to delete theduplicate entries.If I try to delete any of the three records(they are identical) I getthe message 'key column is insufficient or incorrect. Too many rowswere affected by update'.I am trying to do this within Enterprise Mgr.Any suggestion?Thanks much
for rows having no start and endtime assume it as regular intervals.
So i need to show available appointment with duration one hour with the available schedule which is for every five minutes
Like My first appointment for today starts at 8:30 but 8- 8:30 is unblock so there could be an appointment but as this chunk is less than 60 so need to create it
For Schedule table below is what i've to create for temporary basis as this will be available in nightly load in a table.
DECLARE @num int=5 ,@LASTtime TIME =CAST('23:55' as TIME) ,@Time TIME =CAST('00:00' as TIME) ,@Timeprev TIME =CAST('00:00' as TIME) WHILE ( @Time<>@LASTtime) BEGIN
Can we insert into multiple table using merge statement?I'm using SQL Server 2008 R2 and below is my MERGE query...
-> I'm checking if the record exist in Contact table or not. If it exist then I will insert into employee table else I will insert into contact table then employee table.
WITH Cont as ( Select ContactID from Contact where ContactID=@ContactID) MERGE Employee as NewEmp Using Cont as con
How can I delete duplicate entries from tables in my database using Query Analyzer, as there are many duplicate entries in my tables, I want to delete them.
I have around 3 tables having around 20 to 30gb of data. My table A related to table B by a FK and same way table B related to table C by FK. I would like to delete all rows satisfying certain condition from table A and all corresponding related records from table B and C. I have created a query to delete the grandchild first, followed by child table and finally parent. I have used inner join in my delete query. As you all know, inner join delete operations, are going to be extremely resource Intensive especially on bigger tables.
What is the best approach to delete all these rows? There are many constraints, triggers on these tables. Also, there might be some FK relations to other tables as well.
I managed to find the 'Deleting Duplicate Records' from SQLTeam.com (thanks, by the way!!).. I managed to modify it for one of my tables (one of 14).
-- Add a new column
Alter table dbo.tblMyDocsSize add NewPK int NULL go
-- populate the new Primary Key declare @intCounter int set @intCounter = 0 update dbo.tblMyDocsSize SET @intCounter = NewPK = @intCounter + 1
-- ID the records to delete and get one primary key value also -- We'll delete all but this primary key select strComputer, strATUUser, RecCount=count(*), PktoKeep = max(NewPK) into #dupes from dbo.tblMyDocsSize group by strComputer, strATUUser having count(*) > 1 order by count(*) desc, strComputer, strATUUser
-- delete dupes except one Primary key for each dup record deletedbo.tblMyDocsSize fromdbo.tblMyDocsSize a join #dupes d ond.strComputer = a.strComputer andd.strATUUser = a.strATUUser wherea.NewPK not in (select PKtoKeep from #dupes)
-- remove the NewPK column ALTER TABLE dbo.tblMyDocsSize DROP COLUMN NewPK go
drop table #dupes
Now that I've got that figured out, I need to write the same thing to fix the other 13 tables (with different column info)- and I'll need to run this daily.
Basically I've put together some vbscript that gathers inventory data and drops it into an MSDE db (sorry - goin for 'free' stuff right now). Problem is it has to run daily so that I'm sure to capture computers that turned on at different times etc which ever-increases my database 'till I bounce off the 2GB limit of MSDE.
So the question is, what would be the best way to do this? Can I put the code into a stored procedure that I can execute each day?
I have tried joining several tables and the result displays duplicate rows of virtually every line/row. I have tried using distinct but this didn't work. I know it could because there's several columns from some of the tables named the same.
I have a database with many tables. I would like to Delete all rows with practiceID=55 from all Parents tables and all corresponding rows from its child tables. Tables are linked with foreign key constraints (but there is no ON DELETE CASCADE).
How to write a generalized code for removing rows from both parent and child tables.
Query should pick parent table one by one and delete rows with practiceID=55 and all corresponding rows from its child tables
RID, RType, GID 001, m, g01 002, m, g01 002, m, g02 002, m, g03 003, m, g01 003, m, g03 a, T, g01 a, T, g02 a, T, g03 b, T, g02 b, T, g03 b, T, g04
4. Group
GID g01 g02 g03 g04
I'd like to find the record in table #1 "Matter" which has exact record of "GID" in table #3 "Security Assignment" compare with table #2 "Category"
In this case, it is record of "002" bacause "002" in table#1 "Matter" and the record "a" in table #2 "category" both has exact GID records(g01, g02, g03) in table #3, "Security Assignment"
How can I create qury to find all the possible record in the table #2?
one table must be deleted based on a filter (I mean the table is not delete completely but only some records), I would like to delete same records in the second table. for ex: table 1: pk: 1,2,3,4,5 table 2: pk: 1,2,3,4,5 table 1: deleting 1,2, 3 thus also in table 2 pk: 1,2,3 must be deleted. At the and of process Table1 and Table2 must have the same records (always also in the case of failure, errors and so on ).
The target is avoid using triggers. OUTPUT is not useful because it writes what is deleted (or may be useful but how to use it?).
I need to get the replacement records between the 2 tables. I have table A and table B with same structure. I have 5 fields. Table A has 50,000 records and table B has 20,000 records. I have fields id , name, address,meter_flag,end_Date.
Some of the records in Table B are just replacement records of table A. I mean for example I have records like this in Table A
id name address meter_flag end_date
23 john 1201 salt lake dr no 2011-12-28
24 tom 1222 gibson ln yes 2011-12-16
25 alex 1334 qayak dr no 2011-12-17
In Table B
23 john 1344 mc kinney st yes 2011-12-18
24 tom 1222 gibson ln yes 2011-12-16
56 gary 1335 pruitt rd no 2011-12-18
25 alex 1334 qayak dr no 2011-12-17
So here in Table B i have an update for john with id 23 in table A in address field and meter_flag has changed to yes. There is new record with id 25 in table b but that is not in table A. so I need to find all these difference records by querying these 2 table