Merging Duplicate Rows
Jul 23, 2005
Hello All,
I have an issue with dupliate Contact data. Here it is:
I have a Contacts table;
CREATE TABLE CONTACTS
(
SSN int,
fname varchar(40),
lname varchar(40),
address varchar(40),
city varchar(40),
state varchar(2),
zip int
)
Here is some sample data:
SSN: 1112223333
FNAME: FRANK
LNAME: WHALEY
ADDRESS: NULL
CITY: NULL
STATE NY
ZIP 10033
SSN: 1112223333
FNAME: NULL
LNAME: WHALEY
ADDRESS: 100 MADISON AVE
CITY: NEW YORK
STATE NY
ZIP NULL
How do I merge the 2 rows to create one row as follows:
via SQL or T-SQL
SSN: 1112223333
FNAME: FRANK
LNAME: WHALEY
ADDRESS: 100 MADISON AVE
CITY: NEW YORK
STATE NY
ZIP 10033
Pointers appreciated.
Thanks
View 5 Replies
ADVERTISEMENT
Apr 21, 2014
We have a data warehouse staging database in which we capture change history for hundreds of tables from a source system. In the source system, records are updated in place, but in our data warehouse we capture these changes by "terminating" the existing record and adding a new record reflecting the changes. In the data warehouse we add two columns to every table -- effective_date and expiration_date -- which indicate the dates the record was in effect in the source system. By convention, an expiration_date of 6/6/2079 means the record is currently still active in the source system. Each day we simply compare yesterday's version of the record (in the data warehouse) against today's version (in the source system). If differences are found in any of the columns, we terminate the record and add a new one, setting those dates appropriately.
In this example, the employee_id column is the natural key in the source system. We add the effective_date and expiration_date in the data warehouse, so those three columns together make up the key in the data warehouse. The employee_name, employee_dept, and last_login_date columns all come from the source system as well.
drop table mytbl
create table mytbl (
effective_date smalldatetime,
expiration_date smalldatetime,
employee_id int,
employee_name varchar(30),
[code]....
In the select output, you can follow the trail of changes for each of these three employees. Bob moved from dept 7 to 8 at some point; Frank didn't change departments at all; Cheryl moved from dept 6 to 9 and later back to 6. However, the last_login_date was updated frequently for all these employees.
We've tracked hundreds of tables this way for years, some with hundreds of columns. For optimization purposes, I'm now interested in trimming the fat a bit. That is, we track changes in many columns that we don't really need in our data warehouse. Some of these columns are rapidly-changing, causing all sorts of unnecessary terminate/inserts in the data warehouse. My goal is to remove these columns, reclaim the disk space and increase the ETL speed. So in this example, let's get rid of the last_login_date column.
alter table mytbl
drop column last_login_date
select *
from mytbl
order by employee_id, effective_date
Now in the select output, you can see we have many "effective duplicate" records. For example, nothing changed for Bob between 1/1/2014 and 1/31/2014 -- those really should be one record, not three. Here's the challenge: I'm looking for an efficient way to merge these "effective duplicates" together, through set-based sql updates/deletes/inserts (hoping to avoid any RBAR operations). Here's what the table ultimately should look like (cheating to get there):
create table mytbl2 (
effective_date smalldatetime,
expiration_date smalldatetime,
employee_id int,
employee_name varchar(30),
employee_dept int
[code]...
Note that Bob only has two records (he changed department), Frank only has one record (no changes), and Cheryl has three records (two department changes).
My inclination would be to drop the unwanted columns, then GROUP BY all the remaining columns from the source system, and taking the MIN effective_date and MAX expiration_date. However, this doesn't work for cases like Cheryl's -- she moved to another department, then back again, so that change history needs to be retained.
As I mentioned, we have hundreds of tables, and I'd like to strip out dozens (maybe hundreds) of unused columns, so ultimately there will be millions of these pseudo-duplicates that need to be merged together. These are huge tables, so I really need to find an efficient set-based approach to this.
View 2 Replies
View Related
Jun 28, 2006
Say I have a table with the columns (and example data):
CustomerNo, ContactNo, ActivityNo
null, null, 1
100, null, 1
null, 666, 1
null, null, 2
200, null, 2
null, 777, 2
From this I would like to get the result:
CustomerNo, ContactNo, ActivityNo
100, 666, 1
200, 777, 2
How do I solve this. Im getting grey hair here...
View 5 Replies
View Related
Jan 20, 2008
Hi All,
I have a query that I'm working on, but instead of giving the query, I wanted to ask a basic syntax question. If more info is needed, let me know. If you have 2 rows that have a common relationship, but differing information in some fields, can you merge them all onto one row? I've done this with Sum(case) expressions, but I don't want to 'add' anything. In the following example, the ActivityID refers to a break. ActivityID can be:
0=Pick up
1=Drop Off
2=Lunch
3=Break
So if I wanted to see 2 breaks on 1 row in the following example, would this be possible:
Veh ActID ArrTime DepTime
1 3 7:00 8:00
1 3 10:00 11:00
Veh ActID ArrTime DepTime ArrTime DepTime
1 3 7:00 8:00 10:00 11:00
Thanks in advance for your help!
Craig
View 4 Replies
View Related
Oct 20, 2006
Hello everyone!
Maybe you can help me out:
I have 2 tables (A and B), Table A has 2 fields t1 and t2...each of them has a number that is related to the primary key of Table B.
What i want is to make a query that presents:
t1,t2, B.descriptionof_t1, B.description_t2
Anyone that can help me?
Thanks in advance...
View 5 Replies
View Related
Mar 6, 2008
Hi all,
I'm facing the following problem:
TextData ObjectID SPID StartTime EndTime
------------------------------------------------------------------------------------------------------------
Select 1 111111111 52 2008-03-06 11:19:51.250 NULL
Select 1 111111111 52 NULL 2008-03-06 11:19:51.250
I want to achive this result by either an update statement or a select query:
TextData ObjectID SPID StartTime EndTime
------------------------------------------------------------------------------------------------------------
Select 1 111111111 52 2008-03-06 11:19:51.250 2008-03-06 11:19:51.250
Is this possible? There is no primary key
Thanks!
Rgds,
Worf
View 5 Replies
View Related
Jul 15, 2007
it doesn't appear possible to merge cells by col in RS 2005 ? am i missing some more advanced feature or is it a great big over sight on MS's part?
i basicly want a 3 column table, with the rows in the first column merged and the text turned on it's side showing bottom to top.
pretty ordinary kind of thing to do i would have thought
View 9 Replies
View Related
Mar 5, 2012
I'm using a shipping program called endicia professional that allows for database manipulation to make my processing easier. I've managed to fix the database here and there but have had an issue combining orders from a single customer when theybuy more than one item. Ideally I would like to have it combine rows when a customer purchases items going to the same address. To avoid having an issue where the address line is the same ie two people live in the same appt complex and it combines these I thought we could use qualifiers as the purchase will have name, order Id that should be unique enough
Order-id name address sku
1234 John 46 easy ln. A27
1234 John 46 easy ln. B32
Results:
Order-id name address sku
1234 John 46 easy ln. A27,b32
View 6 Replies
View Related
Apr 29, 2004
Hi Im having trouble with this it seems simple enough but its not!
I have a source Table called Access_table example
Name Role1 Role2 Role3 Role4 Role5
a 1 0 0 0 0
a 0 0 1 0 0
b 1 0 0 0 0
c 0 1 0 0 0
d 0 0 0 0 1
e 0 0 1 0 0
e 0 1 0 0 0
f 1 0 0 0 0
g 0 0 1 0 0
I need to create a view that basically finds all the names with double Roles and merge the results into 1 row example.
Name Role1 Role2 Role3 Role4 Role5
a 1 0 1 0 0
e 0 1 1 0 0
I cannot change the information in the source table and the results need to be in a view as the roles will change. Every time I try and do this I duplicate the row again. Can anybody suggest a solution.
Thanks in advance.
View 2 Replies
View Related
Jul 20, 2005
I need to populate a table from several sources of raw data. For agiven security (stock) it is possible to only receive PARTS ofinformation from each of the different sources. It is also possibleto have conflicting data.I am looking to make a composite picture of a given security using thefollowing rules:1) The goal is to replace all NULL and Blank values with data2) Order of precedence (from highest to lowest) is Non-NULL Non-Blank--> Blank --> NULL3) In the case of Non-NULL Non-Blank values that conflict (aredifferent) leave existing value (even if NULL or Blank)For example:Given the following rows:Symbol Identity IdSource Exchange Type SubType Name-------- ------------ --------- --------- ------- ---------------------------TZA 901145102 CUSIP XNYS Stock NULL TV AZTECATZA 901145102 NULL NULL NULL NULLWSM 969904101 CUSIP XNYS Stock NULL WILLIAMSSONOMAWSM 969904101 NULL XNYS Stock NULLWILLIAMS-SONOMAWSM CUSIP XNYS Stock Common NULLWSM NULL CUSIP XASE Stock NULL WILLIAMSSONOMATYC 902124106 CUSIP XNYS Stock NULL TYCOTYC 902124106 CUSIP XNYS Stock NULL TYCOINTERNATIONALI am looking for the following results ('*' indicates changed value)Symbol Identity IdSource Exchange Type SubType Name-------- ------------ --------- --------- ------- ---------------------------TZA 901145102 CUSIP XNYS Stock NULL TV AZTECATZA 901145102 *CUSIP *XNYS *Stock NULL *TV AZTECAWSM 969904101 CUSIP XNYS Stock *Common WILLIAMSSONOMAWSM 969904101 *CUSIP XNYS Stock *CommonWILLIAMS-SONOMAWSM *969904101 CUSIP NULL Stock Common NULLWSM *969904101 CUSIP XASE Stock *Common WILLIAMSSONOMATYC 902124106 CUSIP XNYS Stock NULL TYCOTYC 902124106 CUSIP XNYS Stock NULL TYCOINTERNATIONAL
View 6 Replies
View Related
Dec 15, 2014
I am still fairly new to SQL, having been tasked with creating a csv file from data now someone else has left.
I can do the csv export using sqlcmd and I have the query sorted and am pulling out the right data, but it generates two rows, as one of the tables has multiple records per cardholder. See the query below, I know there is a way of doing it with XML PATH, I think, but it has got me slightly confused.
set nocount on
selectdbo.card.EncodedNumber,
dbo.card.IsEnabled,
dbo.Cardholder.FirstName,
dbo.cardholder.LastName,
dbo.card.ExpiryTime,
dbo.PersonalDataString.Value
[Code] .....
View 2 Replies
View Related
Aug 18, 2015
I have 2 columns (ID, Msg_text) in a table where i need to combine every 3 rows into single row. What would be the best option i have? I know by using 'STUFF' and 'XML PATH' i can convert all the rows into a single row but here i'm looking for every 3 rows into a single row.
View 3 Replies
View Related
Jan 14, 2015
I have this query and it works except for I am getting duplicate primary keys with unique column value. I want to combine them so that I have one primary key, but keep all the columns. Example:
Key column 1 column 2 column 3 column 4
A 1 1
A 2 2
B 2 3
B 5 5
it should look like:
A 1 1 2 2
B 2 3 5 5
Here is my query:
SELECT *
FROM [TLC Inventory].dbo.['2014 new$']
WHERE [TLC Inventory].dbo.['2014 new$'].mis_key LIKE '2%'
AND dbo_Product_Info#description NOT LIKE 'NR%'
AND dbo_Line_Info#description NOT LIKE 'OBSOLETE%'
Do I use a sum function?
View 7 Replies
View Related
Jul 8, 2015
I have a table data as shown below.
IDFNLN
1x
1y
2a
2b
3g
4t
I want output as shown below.
IDFNLN
1xy
2ab
3g
4t
I want the two duplicate rows to be merged into one. How to achieve it.
View 10 Replies
View Related
Aug 18, 2014
SQL 2012
I have a source table in the staging database stg.fact and it needs to be merged into the warehouse table whs.Fact.
stg.fact is not a delta feed it is basically an intra-day refresh.
Both tables have a last updated date so its easy to see which have changed.
It will be new (insert) or changed (update) data that I am interested in, there are no deletions.
As this could be in the millions of rows that are inserts or updates then this needs to be efficient.
I expect whs.Fact to go to >150 million rows.
When I have done this before I started with T-SQL Merge statement and that was not performant once I got to this size.
My original option was to do this is SSIS with a lookup task that marks the inserts and updates and deal with them seperately. However I set up the lookup tranformation the reference data set will have a package variable in the SQL commnd. This does not seem possible with the lookup in 2012! Currently looking at Merge Join transformation and any clever basic T-SQL that could work as this will need to be fast, and thats where I think that T-SQL may be the better route.
Both tables will have >100,000,000 rows
Both tables have the last updated date
The Tables are in different databases but on the same SQL Instance
Each table holds 5 integer columns, one Varchar, one datatime
Last time I used Merge it was a wider table with lots of columns so don't know if this would be an option.
View 6 Replies
View Related
Nov 28, 2007
Dear Gurus,I have table with following entriesTable name = CustomerName Weight------------ -----------Sanjeev 85Sanjeev 75Rajeev 80Rajeev 45Sandy 35Sandy 30Harry 15Harry 45I need a output as followName Weight------------ -----------Sanjeev 85Rajeev 80Sandy 30Harry 45ORName Weight------------ -----------Sanjeev 75Rajeev 45Sandy 35Harry 15i.e. only distinct Name should display with only one value of Weight.I tried with 'group by' on Name column but it shows me all rows.Could anyone help me for above.Thanking in Advance.RegardsSanjeevJoin Bytes!
View 4 Replies
View Related
Jun 25, 2001
I used the following select statement to get duplicate records on Case_number column
select cases.distinct case_link, cases.case_number
from cases
group by case_link
having case_number > 1
I got the error message that
"'cases.warrant_number' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
AND
cases.case_number' is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.
Any idea on a better statement to use. THANKS FOR YOUR HELP!
View 3 Replies
View Related
Jun 29, 2001
Hi,
I have a table and this is what i did to get the desired result
Select A.col1,count(A.col1)
from Tab1
group by col1
having count(A.Col1) > 1
i tried this - but it didnot worked - it returned col1 as blanks -
Select A.col1,B.Col2,count(A.col1)
from Tab1 A, Tab2 B
where A.col1 = B.col1
group by A.col1 , b.col2
having count(A.Col1) > 1
As I was looking for all the rows that are apperaing more than once.
Now - The problem -
I have to join this table to another table Tab2 to get the other details.
My Tab2 is a table from where I have to pull the Customer DEtails like name,address etc.
How should I write this query?
Any thinuhts?
TIA
View 1 Replies
View Related
Jul 20, 2006
Hi,
i wanna know, how can i check if i have duplicate rows in my table?
thanks
View 12 Replies
View Related
May 16, 2007
Hi. I'm a SQL Server newbie, very experienced with Access, developing an ASP.NET database editor web app. I query the database with a statement more or less in the following form:
SELECT organisation.OrgID, organisation.Name, organisation.whatever FROM services INNER JOIN servicegrouping ON services.serviceID=servicegrouping.serviceID INNER JOIN organisations ON servicegrouping.OrgID = organisations.OrgID WHERE services.service=x OR services.service=y
In other words, I have a database of organisations. The services offered by the organisations are in a separate table, and I only want to return organisations that offer services X or Y.
Okay, now if I did this in Access, this query would return just one record for each organisation that meets the condition, unless I was to include a field from the services table in the SELECT clause, in which case of course I would get one record for each organisation and unique service offered.
But in MS SQL, the query returns duplicate rows if there is more than service offered by the organisation that meets the WHERE condition (=x or =y). Why is this and what do I need to do to my SQL statement to ensure I only get unique rows?
View 2 Replies
View Related
Feb 6, 2008
Hi,
I've a query which gets a set of data from multiple tables -
select *
FROM A
inner JOIN q
ON (RIGHT(q.name,CHARINDEX('-',REVERSE(q.name))-1)= a.id)
inner JOIN t
ON (t.id = q.id)
inner JOIN s
ON (q.name = s.name )
inner join l
on (s.name = l.name
and t.name = l.name)
WHERE A.id = 764
and s.name = '764'
I get repeated # of rows for each id. I've some 136 rows for each q.id ( there are 6 q.ids and hence I get 816 rows instead of 136) These 136 rows are actually divided among thse q.ids as
id=5, 4 rows
id=6, 8 rows
id=7, 24 rows
id=8, 40 rows
id=10, 60 rows
total=136 rows
Let me know what I'm missing here
Thanks for your help!
Subha
View 4 Replies
View Related
Feb 1, 2008
Hello,
I have a question, what does a statement look like that finds the duplicate rows and combines them,
I have a table named PRODUCTS in it 3 columbs Cost, Stock, Part_number.
I need to find all Part_numbers that dublicate, Combine the rows into 1 & combine (sum, add) their stock together is the new row & take an avarerage of their cost and use it as cost in the new row where they combine.
Please help me, I am stalled. Looked all over the internet & could not find anything, I really need this for a project I can not finish.
I have the following SQL statement:
SELECT part_number,
COUNT(part_number) AS NumOccurrences
FROM Products
GROUP BY Part_number
HAVING COUNT(part_number) > 1
View 7 Replies
View Related
May 6, 2008
I have a csv file that I need to import daily into a SQL Server 2005 table. Much of the table contents could just be overwritten with the new csv file, however there are a set of Rows within the table that need to be appended to , rather than overwritten.
There is no Primary Key in the csv file that can be used.
I'm not sure this is the best approach, but what I have been trying to do, is append the entire csv file to the existing table, and then go back and delete the duplicates.
When I run the Delete, it does delete the majority of the records, but leaves a couple hundred behind. The number left behind varies with each run, can't seem to identify a pattern here. Running the Delete a second time does clean up the rows left behind in the first execution of the Delete, and gives the result I want.
Any thoughts as to why this needs to be run twice? Or is a better approach available?
Here is my code -
SELECT [Pkg ID], [Elm (s)], [Type Name (s)], [End Exec Date], [End Exec Time], dupcount=count(*)
INTO temppkgactions
FROM pkgactions
GROUP BY [Pkg ID], [Elm (s)], [Type Name (s)], [End Exec Date], [End Exec Time]HAVING count(*) > 1
DELETE TOP (SELECT COUNT(*) -1 FROM dbo.temppkgactions WHERE dupcount > 1 )
FROM dbo.pkgactions
DROP TABLE temppkgactions
Thanks
View 2 Replies
View Related
Apr 6, 2001
hi,
I want to delete duplicate rows in a table, can any one write a sql for doing that...
please help me in this...
urs
Vj
View 2 Replies
View Related
Aug 9, 2000
Hi,
I have a table with four columns. like id,lastname,
firstname,acctname. I have duplicate values for the three columns other
than id column. like
ID FirstNameLastname Acctname
1 john hopkins jh
2 john hopkins Jh
3 david webb dw
4 david webb dw
5 david webb dw
6 Dan Kennedy DK
I want to eliminate the duplicate rows. id can be any one of them.
Can any one suggest me with a query by which i can do this.
Thanks in advance
Mohan
View 2 Replies
View Related
Jun 25, 2000
I have a table which looks as follow:
field1 field2 field3 field4 field5 ......
A B C A X ......
A B C B Y ......
A B C C Z ......
A B C A Y ......
. . . . . ......
I want to delete all the rows except one row. Anybody can help?
Thank you very much.
View 3 Replies
View Related
Jan 25, 2000
I have a large table that consists of the columns zip, state, city, county. The primary key "zip" has duplicates but the rows are unique.
How do I filter out only the duplicate zips.
Randy Garland
View 2 Replies
View Related
Jan 20, 2000
How do you delete duplicate rows in a table so only one row is left in the table, using T-SQL.
View 1 Replies
View Related
Sep 14, 1999
Hi,
I am encountering a problem. There are lots of duplicate rows in the cobol flat files (due to improper data entry and missing columns values )from where I am transforming data to sql 7. 0 tables using DTS. After transformation , can I some how mark the duplicate rows ? it is not for the purpose of eliminating them, but to enter the missing values and make all the rows complete and unique.
I have the transformed table as a temporary table. Can I add a column like 'status' etc.. and have the column values marked '1' for the repeating rows etc....
Can anyone suggest 'any' possible way of implementing it ?
Thanx
Nisha
View 1 Replies
View Related
Feb 11, 1999
Hai
I have problem in deleting duplicate rows. I have a identity column in my table, if I try to use correlatted sub query with Delete command it gives error.
The other problem I have is I have a date column in my table and update that column with current date and time. If use a query to fetch a records on a particular day , it does not return any rows
select * from rates where ch_date >='02/11/99' and ch_date<='02/11/99'
If I use convert also there is some other problems. Is there any way to force date checkings to be done excluding time.
Thanks
View 6 Replies
View Related
Mar 30, 1999
CAN ANYBODY REPLY FOLLOWING QUESTIONS. I WANT TO DELETE DUPLICATE ROWS
IN MY TABLE WITHOUT USING TRANSACTION TABLE. AND ONE MORE QUESTION HOW
TO GET YESTERDAY DATE BY USING ISQL WINDOW.
THANKS
JK
View 2 Replies
View Related
Jul 9, 1998
Hello,
I have a table (mytable) with the following structure
docs int
field1 varchar(20)
......
the information in the table may look like this
docs field1
1 hello
2 hello
3 test
4 test
5 problem
6 problem
The docs column autoincrements and their is a unique constraint on it. The field1 column does not have any constraints on it.
how does on delete the duplicates without deleting both.
I can write a SQL statement to tell me what docs are dups, and what the field1 values are, but I cannot just delete one
Do I write a cursor? or is there an sql statement that would delete just one?
thanks
Steve Power
View 1 Replies
View Related
Nov 17, 1998
This is an imaginary problem while discussing ROWID in ORACLE.
Consider a table without primary key, unique key, uniuqe index.
A row has inserted into the table many times.
I want to delete all but one dulicated rows. With any 'where' clause all rows(duplicated)
will be deleted. In ORACLE i can achieve this using ROWID as follows:
Delete from Table_name
where < all column values >
and ROWID <> ( Select max(rowid) from Table_name where < all column values > )
How can this be achieved in MS SQL Server 6.5 ?
According to Dr. Codd's Golden rules for RDBMS one is that
One should be able to reach each data value in the database by using
table name, row idenfication value and column name.
Does MS SQL Server 6.5 satisfy this requirement ?
Also How many of Dr. Codd's 13 Golden Rules for RDBMS does MS SQL Server 6.5
Satisfy? Which doesn't ?
Any discussion about Codd's Rules is welcome.
- Gunvant Patil
gunvantp@yahoo.com
View 1 Replies
View Related