T-SQL (SS2K8) :: Ranking Duplicate Contacts Using Multiple Columns
Jun 8, 2015
I'm in the process of trying to identify duplicate contacts. I doing this for millions of contacts and have gotten stuck and could use some elegant solutions!
The business rule is this:
Any contact that has the same name, phone and email address are the same contact
Any contact that has the same name, and email address are the same contact
Any contact that has the same name, email address, but different phone are a different contact.
Any contact that has the same name, email address, and a blank phone can be the same contact as one that has the same name, email address, and has an email address
Rank by the DataSource_fk. 1 being the highest
Put another way:
If 3 contacts have the same name, 2 have phone '1112223344' and all three have the email address 'johndoe@gmail.com' they are the same contact and the lowest DataSource_fk should be ranked the highest.
I've used the Row_number over (Partition by) in the past, but am unsure how to deal with the blanks in email and phone.
DROP TABLE [dbo].[TestBusinessContact];
GO
CREATE TABLE [dbo].[TestBusinessContact]
(
[TestBusinessContact_pk] INT IDENTITY(1,1)NOT NULL,
[Business_fk]INT NOT NULL CONSTRAINT DF_TestBusinessContact_Business_fk DEFAULT(0),
[Code] ......
View 7 Replies
ADVERTISEMENT
Jan 3, 2015
rewrite the below two queries (so that i can avoid duplicates) i need to send email to everyone not the dup[right][/right]licated ones)?
Create table #MyPhoneList
(
AccountID int,
EmailWork varchar(50),
EmailHome varchar(50),
EmailOther varchar(50),
[Code] ....
--> In this table AccountID is uniquee
--> email values could be null or repetetive for work / home / Other (same email can be used more than one columns for accountid)
-- a new column will be created with name as Sourceflag( the value could be work, Home, Other depend on email coming from) then removes duplicates
SELECT AccountID , Email, SourceFlag, ROW_NUMBER() OVER(PARTITION BY AccountID, Email ORDER BY Sourceflag desc) AS ROW
INTO #List
from (
SELECTAccountID
, EmailWorkAS EMAIL
, 'Work'AS SourceFlag
FROM#MyPhoneList (NoLock) eml
WHEREIsOffersToWorkEmail= 1
[code]....
View 9 Replies
View Related
Sep 23, 2005
I'm new to SQL with 2 weeks under my belt....lol, so this may be a simple edit:
When I run the following query, I can get a list of all dups in the contact field:
++++++++++++++++++++++++++++++++++
SELECT full_name,
COUNT(full_name) AS NumOccurrences
FROM contact
GROUP BY full_name
HAVING ( COUNT(full_name) > 1 )
++++++++++++++++++++++++++++++++++
However:
I need to make sure I am de-activating (active = 0) only the contacts where they are listed more then once within the same company table (company.company_id) and the condition is that phone is NULL. I can't seem to make it work. Does anyone have any suggestions for an UPDATE I can use?
View 2 Replies
View Related
Aug 27, 2014
I'd like to first figure out the count of how many rows are not the Current Edition have the following:
Second I'd like to be able to select the primary key of all the rows involved
Third I'd like to select all the primary keys of just the rows not in the current edition
Not really sure how to describe this without making a dataset
CREATE TABLE [Project].[TestTable1](
[TestTable1_pk] [int] IDENTITY(1,1) NOT NULL,
[Source_ID] [int] NOT NULL,
[Edition_fk] [int] NOT NULL,
[Key1_fk] [int] NOT NULL,
[Key2_fk] [int] NOT NULL,
[Code] .....
Group by fails me because I only want the groups where the Edition_fk don't match...
View 4 Replies
View Related
Apr 15, 2014
I am facing a problem in writing the stored procedure for multiple search criteria.
I am trying to write the query in the Procedure as follows
Select * from Car
where Price=@Price1 or Price=@price2 or Price=@price=3
and
where Manufacture=@Manufacture1 or Manufacture=@Manufacture2 or Manufacture=@Manufacture3
and
where Model=@Model1 or Model=@Model2 or Model=@Model3
and
where City=@City1 or City=@City2 or City=@City3
I am Not sure of the query but am trying to get the list of cars that are to be filtered based on the user input.
View 4 Replies
View Related
May 21, 2008
Hello,
I have a question regarding duplicate records, the thing is I'm able to query for duplicated records if I type the following:
select ColumnName from TableName
where ColumnName in
(
select ColumnName from TableName
group by ColumnName having count(*) > 1
)
That gives me duplicate records for one column, but I need find duplicate records in more than one column (4 columns to be exact), but the way I need to find these records is they all have to be duplicate, what I'm trying to say is I don't don't want to find the following:
First Last Age Email
John Smith 25 jsmith@hotmail.com
John Smith 26 jsmith@hotmail.com
John Smith 25 jsmith4@hotmail.com
I need to find the following:
First Last Age Email
John Smith 25 jsmith@hotmail.com
John Smith 25 jsmith@hotmail.com
John Smith 25 jsmith@hotmail.com
So all the columns must be exactly the same, that's the only condition I want to show the records, is there any way to do this?
For the record, I'm using MS SQL Server 2000, thank you.
View 12 Replies
View Related
Jan 29, 2004
i got this error message in my browser
SELECT permission denied on object 'Contacts', database 'Contacts', owner
'dbo'.
this is the code where it stopped
sqlDataAdapter1.Fill(dataSet11);
//Update the data grid
DataGrid1.DataBind();
Now i finally i grant the access to the database for user ASPNET but it
seems not working yet.
I am running SQL Server
What i am missing ?
Thanks
View 3 Replies
View Related
May 21, 2014
I keep thinking this can be done in one query:UPDATE H
SET Phone1 = NULL
FROM Dialer.dbo.Agentless_Hold H
INNER JOIN Dialer.dbo.PhoneLookup D ON H.Phone1 = D.Phone;
UPDATE H
SET Phone2 = NULL
[Code] ....
I need to NULL any phone number in the Hold table where the number is in the Lookup table.
View 5 Replies
View Related
Jan 13, 2015
How to get the lowest U.Price along with FX and Rate.
ID-U.Price-FX-Rate
1280 19.1196 EUR 3.85
1280 46.2462 USD 3.63
1280 6.32 RM 1.00
Required output.
ID-U.Price-FX-Rate
1280 6.32 RM 1.00
View 4 Replies
View Related
Sep 29, 2015
Can you update data from multiple tables in the same UPDATE statement, by joining those tables in a CTE ?
For example, this fails:
DECLARE @UPDCATE_COUNT AS int = 100000;
WITH COMBINED_TABLES AS (
SELECT TOP (@UPDATE_COUNT) T.UpdateID, T.IS_UPDATED, U.[Description]
FROM dbo.Table1 AS U
INNER JOIN dbo.Table2 AS T
[code]....
View 7 Replies
View Related
Jul 25, 2014
I have a case where if the Id field is a specific value, I don't want to allow null in another field, but if the Id value <> a specific value, null is ok.
In the example below, inserting the first record should succeed, the second should succeed, and the 3rd should fail. Right now the 2nd two fail. I gotta be missing something easy, but I can't figure it out.
USE tempdb
GO
IF OBJECT_ID('tempdb.dbo.CheckConstraintTest') IS NOT NULL
DROP TABLE tempdb.dbo.CheckConstraintTest;
CREATE TABLE CheckConstraintTest
[Code] .....
View 4 Replies
View Related
Apr 25, 2014
I have this query
SELECT
'Type'[Type]
,CASE WHEN code='09' THEN SUM(Amt/100) ELSE 0 END
,CASE WHEN code='10' THEN SUM(Amt/100) ELSE 0 END
,CASE WHEN code='11' THEN SUM(Amt/100) ELSE 0 END
,CASE WHEN code='12' THEN SUM(Amt/100) ELSE 0 END
FROM Table1 WHERE (Code BETWEEN '09' AND '12')
GROUP BY Code
and the output
Column 1 Column 2 Column 3 Column 4
Type 14022731.60 0.00 0.00 0.00
Type 0.00 4749072.19 0.00 0.00
Type 0.00 0.00 149214.04 0.00
Type 0.00 0.00 0.00 792210.10
How can I modify the query to come up with output below,
Column 1 Column 2 Column 3 Column 4
Type 14022731.60 4749072.19 149214.04 792210.10
View 9 Replies
View Related
Jul 17, 2014
Following is my table structure
IDRowCount PagID
1448
2267
3297
4216
5405
6254
[Code] ....
PageId is currently set to 0
I have a user input, @IntNoOfRowsPerPage = 800 Means 800 rows per page. So following is the output I require.
AIDRowCountPageId
1448 1
2267 1
3297 2
4216 2
5405 3
6254 3
[Code] ....
The values of PageID are such that summation of RowCount for PageID is <= @IntNoOfRowsPerPage (i.e, 800)
If NTILE function can be used in such scenarios.
View 5 Replies
View Related
Jan 13, 2015
I have multiple databases in the server and all my databases have tables: stdVersions, stdChangeLog. The stdVersions table have field called DatabaseVersion which stored the version of the database. The stdChangeLog table have a field called ChangedOn which stored the date of any change made in the database.
I need to write a query/stored procedure/function that will return all the database names, version and the date changed on. The results should look something like this:
DatabaseName DatabaseVersion DateChangedOn
OK5_AAGLASS 5.10.1.2 2015/01/12
OK5_SHOPRITE 5.9.1.6 2015/01/10
OK5_SALDANHA 5.10.1.2 2014/12/23
The results should be ordered by DateChangedOn.
View 4 Replies
View Related
Jun 3, 2010
I am trying to find a way to add into a table a flattened (comma seperated list) of email addresses based on the multiple columns of nformation in another table (joined by customer_full_name and postcode.
This is to highlight duplicate email addresses for people under the same customer_full_name and Postcode.
I have done this using a loop which loops through concatenating the email addresses but it takes 1minute to do 1000. The table is 19,000 so this isn't really acceptable. I have tried temp tables, table variables and none of this seems to make any difference. I think that it is becuase i am joining on text columns?
Create table #tempa
(
customer_Full_Name varchar(100),
Customer_Email varchar(100),
Postcode varchar(100),
AlternateEmail varchar(max)NULL
)
insert into #tempa (customer_full_name,customer_email,postcode)
[code]....
View 9 Replies
View Related
Jan 21, 2015
I have a table with score info for each group, and the table also contains historical data, I need to get the ranking for the current week and previous week, here is what I did and the result is apparently wrong:
select CurRank = row_number() OVER (ORDER BY cr.CurScore desc) , cr.group_name,cr.CurScore
, lastWeek.PreRank, lastWeek.group_name,lastWeek.PreScore
from
(select group_name,
Avg(case when datediff(day, asAtDate, getdate()) <= 7 then sumscore else 0 end) as CurScore
[Code] ....
The query consists two parts: from current week and previous week respectively. Each part returns correct result, the final merged result is wrong.
View 3 Replies
View Related
Jun 11, 2015
Basically, I'm given a daily schedule on two separate rows for shift 1 and shift 2 for the same employee, I'm trying to align both shifts in one row as shown below in 'My desired results' section.
Sample Data:
;WITH SampleData ([ColumnA], [ColumnB], [ColumnC], [ColumnD]) AS
(
SELECT 5060,'04/30/2015','05:30', '08:30'
UNION ALL SELECT 5060, '04/30/2015','13:30', '15:30'
UNION ALL SELECT 5060,'05/02/2015','05:30', '08:30'
UNION ALL SELECT 5060, '05/02/2015','13:30', '15:30'
[Code] ....
The results from the above are as follows:
columnAcolumnB SampleTitle1 SampleTitle2 SampleTitle3 SampleTitle4
506004/30/201505:30 NULL NULL NULL
506004/30/201513:30 15:30 NULL NULL
506005/02/201505:30 NULL NULL NULL
506005/02/201513:30 15:30 NULL NULL
My desired results with desired headers are as follows:
PERSONSTARTDATE STARTIME1 ENDTIME1 STARTTIME2 ENDTIME2
506004/30/2015 05:30 08:30 13:30 15:30
506005/02/2015 05:30 08:30 13:30 15:30
View 3 Replies
View Related
Aug 17, 2007
I've begun to get the above error from my package. The error message refers to two output columns.
Anyone know how this could happen from within the Visual Studio 2005 UI? I've seen the other posts on this subject, and they all seemed to be creating the packages in code.
Is there any way to see all of the columns in the data flow? Or is there any other way to find out which columns it's referring to?
Thanks!
View 3 Replies
View Related
May 1, 2014
I found some duplicate data as I was going thru the logic of a data pump. The entire row is not duplicated however.I would like to delete only the one row.
This is a sample of the data:
DECLARE @SomeData TABLE
(
FirstName varchar(25)
, MiddleName varchar(25)
, LastName varchar(25)
, StreetAddress varchar(25)
, Suite varchar(25)
, City varchar(25)
, [State] varchar(25)
, PostalCode varchar(10)
[code]...
As you can see, Joe Smith has two rows, but only one of the rows is complete. I would like to delete only the row that has a NULL value in the phone and area code for Joe Smith. There are a few thousand rows that are like this. They have duplicates all but the area code and phone number.I am used to using a CTE to remove duplicates, but I am a little lost on this one. The things that I have tried, have not worked exactly as I planned.
View 4 Replies
View Related
Apr 21, 2014
We have a data warehouse staging database in which we capture change history for hundreds of tables from a source system. In the source system, records are updated in place, but in our data warehouse we capture these changes by "terminating" the existing record and adding a new record reflecting the changes. In the data warehouse we add two columns to every table -- effective_date and expiration_date -- which indicate the dates the record was in effect in the source system. By convention, an expiration_date of 6/6/2079 means the record is currently still active in the source system. Each day we simply compare yesterday's version of the record (in the data warehouse) against today's version (in the source system). If differences are found in any of the columns, we terminate the record and add a new one, setting those dates appropriately.
In this example, the employee_id column is the natural key in the source system. We add the effective_date and expiration_date in the data warehouse, so those three columns together make up the key in the data warehouse. The employee_name, employee_dept, and last_login_date columns all come from the source system as well.
drop table mytbl
create table mytbl (
effective_date smalldatetime,
expiration_date smalldatetime,
employee_id int,
employee_name varchar(30),
[code]....
In the select output, you can follow the trail of changes for each of these three employees. Bob moved from dept 7 to 8 at some point; Frank didn't change departments at all; Cheryl moved from dept 6 to 9 and later back to 6. However, the last_login_date was updated frequently for all these employees.
We've tracked hundreds of tables this way for years, some with hundreds of columns. For optimization purposes, I'm now interested in trimming the fat a bit. That is, we track changes in many columns that we don't really need in our data warehouse. Some of these columns are rapidly-changing, causing all sorts of unnecessary terminate/inserts in the data warehouse. My goal is to remove these columns, reclaim the disk space and increase the ETL speed. So in this example, let's get rid of the last_login_date column.
alter table mytbl
drop column last_login_date
select *
from mytbl
order by employee_id, effective_date
Now in the select output, you can see we have many "effective duplicate" records. For example, nothing changed for Bob between 1/1/2014 and 1/31/2014 -- those really should be one record, not three. Here's the challenge: I'm looking for an efficient way to merge these "effective duplicates" together, through set-based sql updates/deletes/inserts (hoping to avoid any RBAR operations). Here's what the table ultimately should look like (cheating to get there):
create table mytbl2 (
effective_date smalldatetime,
expiration_date smalldatetime,
employee_id int,
employee_name varchar(30),
employee_dept int
[code]...
Note that Bob only has two records (he changed department), Frank only has one record (no changes), and Cheryl has three records (two department changes).
My inclination would be to drop the unwanted columns, then GROUP BY all the remaining columns from the source system, and taking the MIN effective_date and MAX expiration_date. However, this doesn't work for cases like Cheryl's -- she moved to another department, then back again, so that change history needs to be retained.
As I mentioned, we have hundreds of tables, and I'd like to strip out dozens (maybe hundreds) of unused columns, so ultimately there will be millions of these pseudo-duplicates that need to be merged together. These are huge tables, so I really need to find an efficient set-based approach to this.
View 2 Replies
View Related
Oct 8, 2015
any useful SQL Queries that might be used to identify lists of potential duplicate records in a table?
For example I have Client Database that includes a table dbo.Clients. This table contains various columns which could be used to identify possible duplicate records, such as Surname | Forenames | DateOfBirth | NINumber | PostalCode etc. . The data contained in these columns is not always exactly the same due to differences caused by user data entry; so some records may have missing data from some of the columns and there could be spelling differences too. Like the following examples:
1 | Smith | John Raymond | NULL | NI990946B | SW12 8TQ
2 | Smith | John | 06/03/1967 | NULL | SW12 8TQ
3 | Smith | Jon Raymond | 06/03/1967 | NI 99 09 46 B | SW12 8TQ
The problem is that whilst it is easy for a human being to review these 3 entries and conclude that they are most likely the same Client entered in to the database 3 times; I cannot find a reliable way of identifying them using a SQL Query.
I've considered using some sort of concatenation to a new column, minus white space and then using a "WHERE column_name LIKE pattern" query, but so far I can't get anything to work well enough. Fuzzy Logic maybe?
the results would produce a grid something like this for the example above:
ID | Surname | Forenames | DuplicateID | DupSurname | DupForenames
1 | Smith | John Raymond | 2 | Smith | John
1 | Smith | John Raymond | 3 | Smith | Jon Raymond
9 | Brown | Peter David | 343 | Brown | Pete D
next batch of duplicates etc etc . . . .
View 7 Replies
View Related
Oct 21, 2014
Im trying to delete duplicate records from the output of the query below, if they also meet certain conditions ie 'different address type' then I would merge the records. From the following query how do I go about achieving one and/or the other from either the output, or as an extension of the query itself?
SELECT
a1z103acno AccountNumber
, a1z103frnm FirstName
, a1z103lanm LastName
, a1z103ornm OrgName
, a3z103adr1 AddressLine1
, A3z103city City
, A3z103st State
[code]...
View 1 Replies
View Related
Apr 9, 2007
Hello,
I have a table with say 45 columns.
I have a business requirement that requires me to fetch the rows for which col1 , col2, ....col 11 are same and rest can be different. there is an identity column, in the table so I can have duplicate rows also.
how can I effectively write a query that will fetch me all those rows for which my 11 columns are same.
Thanks,
Vishakha
View 6 Replies
View Related
Apr 29, 2015
I have a business need to create a report by query data from a MS SQL 2008 database and display the result to the users on a web page. The report initially has 6 columns of data and 2 out of 6 have JSON data so the users request to have those 2 JSON columns parse into 15 additional columns (first JSON column has 8 key/value pairs and the second JSON column has 7 key/value pairs). Here what I have done so far:
I found a table value function (fnSplitJson2) from this link [URL]. Using this function I can parse a column of JSON data into a table. So when I use the function above against the first column (with JSON data) in my query (with CROSS APPLY) I got the right data back the but I got 8 additional rows of each of the row in my table. The reason for this side effect is because the function returned a table of 8 row (8 key/value pairs) for each json string data that it parsed.
1. First question: How do I modify my current query (see below) so that for each row in my table i got back one row with 19 columns.
SELECT A.ITEM1,A.ITEM2,A.ITEM3,A.ITEM4, B.*
FROM PRODUCT A
CROSS APPLY fnSplitJson2(A.ITEM5,NULL) B
If updated my query (see below) and call the function twice within the CROSS APPLY clause I got this error: "The multi-part identifier "A.ITEM6" could be be bound.
2. My second question: How to i get around this error?
SELECT A.ITEM1,A.ITEM2,A.ITEM3,A.ITEM4, B.*, C.*
FROM PRODUCT A
CROSS APPLY fnSplitJson2(A.ITEM5,NULL) B, fnSplitJson2(A.ITEM6,NULL) C
I am using Microsoft SQL Server 2008 R2 version. Windows 7 desktop.
View 14 Replies
View Related
Nov 23, 2006
Hello,
I have a table T1 and on this table I have an insert trigger. In the trigger I need to check if T1.ID and T1.Type=’xyz’ together are duplicated or not (duplicate dhcek on two columns), if yes return error from trigger. There might be T1.ID and T1.Type=’abc’ duplicated, that is fine.
View 1 Replies
View Related
Feb 17, 2004
I have to write a query which extracts everyone from a table who has the same surname and forenames as someone else but different id's.
The query should have a surname column, a forenames column, and two id columns (from the person column of the table).
I need to avoid duplicates i.e. the first table id should only be returned in the first id column and not in the second - which is what i am getting at the mo.
This is what i have done
select first.surname, first.forenames, first.person, second.person
from shared.people first, shared.people second
where first.surname= second.surname
and first.forenames = second.forenames
and not first.person = second.person
order by first.surname, first.forenames
and i get results like this
Porter Sarah Victoria 9518823 9869770
Porter Sarah Victoria 9869770 9518823 - i.e. duplicates
cheers
View 5 Replies
View Related
Jul 23, 2005
When I add this code in a view and try to save . . .SELECT TOP 610 *FROM dbo.Master INNER JOINdbo.TypeByCase ON dbo.TypeByCase.CaseNum = dbo.Master.CaseNumIt gives the error:ODBC error: [Microsoft][ODBC SQL Server Driver]Column names in eachview or function must be unique. Column name 'CaseNum' in view orfunction 'dbo.BobView1' is specified more than once.Any idea why?Thanks,RBollinger
View 1 Replies
View Related
Jul 23, 2005
Hi there,I would like to know how to get rows with duplicate values in certaincolumns. Let's say I have a table called "Songs" with the followingcolumns:artistalbumtitlegenretrackNow I would like to show the duplicate songs to the user. I considersongs that have the same artist and the same title to be the same song.Note: All columns do not have to be the same.How would I accomplish that with SQL in SQL Server?Thanks to everyone reading this. I hope somebody has an answer. I'vealready searched the whole newsgroups, but couldn't find the solution.
View 2 Replies
View Related
Aug 21, 2007
Hello,Suppose I have the following table...name employeeId email--------------------------------------------Tom 12345 Join Bytes!Hary 54321Hary 54321 Join Bytes!I only want unique employeeIds return. If I use Distinct it will stillreturn all of the above as the email is different/missing. Is there away to query in SQL so that only distinct employeeId is returned? noduplicates.I wouuld like to say WHERE no blank fields are present to get theright row to return.Many thanksYas
View 6 Replies
View Related
Aug 10, 2015
Here is my requirement, How to handle using SSIS.
My flatfile will have multiple columns like :
ID key1 key2 key3 key 4
I have SP which accept 3 parameters ID, Key, Date
NOTE: Key is the coulm name from the Excel. So my sp call look like
sp_insert ID, Key1, date
sp_insert ID, Key2,date
sp_insert ID, Key3,date
View 7 Replies
View Related
Mar 3, 2008
Please can anyone help me for the following?
I want to merge multiple rows (eg. 3rows) into a single row with multip columns.
for eg:
data
Date Shift Reading
01-MAR-08 1 879.880
01-MAR-08 2 854.858
01-MAR-08 3 833.836
02-MAR-08 1 809.810
02-MAR-08 2 785.784
02-MAR-08 3 761.760
i want output for the above as:
Date Shift1 Shift2 Shift3
01-MAR-08 879.880 854.858 833.836
02-MAR-08 809.810 785.784 761.760
Please help me.
View 8 Replies
View Related
Apr 21, 2015
I have a table with single row like below
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Column0 | Column1 | Column2 | Column3 | Column4|
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Value0 | Value1 | Value2 | Value3 | Value4 |
Am looking for a query to convert above table data to multiple rows having column name and its value in each row as shown below
_ _ _ _ _ _ _ _
Column0 | Value0
_ _ _ _ _ _ _ _
Column1 | Value1
_ _ _ _ _ _ _ _
Column2 | Value2
_ _ _ _ _ _ _ _
Column3 | Value3
_ _ _ _ _ _ _ _
Column4 | Value4
_ _ _ _ _ _ _ _
View 6 Replies
View Related
Apr 3, 2000
I have 4 rows which are exactly the same. I want to delete one row but i do not have any unique identifing columns. How should i delete that row ?
View 1 Replies
View Related