Transact SQL :: Comparing Records - Finding Matches / Duplicates
Nov 20, 2015
I have this 40,000,000 rows table... I am trying to clean this 'Contacts' table since I know there are a lot of duplicates.
At first, I wanted to get a count of how many there are.
I need to compare records where these fields are matched:
MATCHED: (email, firstname) but not MATCH: (lastname, phone, mobile).
MATCHED: (email, firstname, mobile)
But not MATCH: (lastname, phone)
MATCHED: (email, firstname, lastname)
But not MATCH: (phone, mobile)
View 9 Replies
ADVERTISEMENT
Sep 29, 2015
I need to compare records between two tables. There is no ID in the tables to do a simple join between them. So, what I'm looking for is: get the first record from table1 and read all record from table2 and give me back the most similar record. The String Distance is a predefined function.
Select a.table1
,b.table2
from table1 a, table2 b
where StringDistance (''a.table1,'b.table2') >90
View 4 Replies
View Related
Aug 21, 2015
I have a scenario to compare previous records based on each ID columns. For each ID, there would be few records, I have a column called "compare", We have to compare all Compare 1 records with Compare 0 Records. If Dt is lesser or equal to comparing DT, then show 0. Else 1
We always only one Compare 0 records in my table, so all compare 1 columns will compare with only one row per ID
My tables look like
Declare @tab1 table (ID Varchar(3), Dt Date, Compare Int)
Insert Into @tab1 values ('101','2015-07-01',0)
Insert Into @tab1 values ('101','2015-07-02',1)
Insert Into @tab1 values ('101','2015-07-03',1)
Insert Into @tab1 values ('101','2015-07-01',1)
Insert Into @tab1 values ('101','2015-06-30',1)
Insert Into @tab1 values ('102','2015-07-01',0)
Insert Into @tab1 values ('102','2015-07-02',1)
Insert Into @tab1 values ('102','2015-07-01',1)
select * from @tab1
1.) In the above scenario for ID = '101', we have 5 records, first record has Compare value 0, which mean all other 4 records need to compare with this record only
2.) If Compare 1 record's Dt is less or equal to Compare 0's DT, then show 0 in next column
3.) If Compare 1 record's Dt is greater than Compare 0's DT, then show 1 in next column
My expected result set should be like ....
View 10 Replies
View Related
Jun 10, 2015
I have a problem where I have 2 compare 2 records from the same table. This part looks easy but the problem is for a User there can be multiple records and I have 2 compare each record with its previous instance based on the timestamp. Not only I have to compare I have to perform some analysis. Below is the Table script and sample output.
Givens: All SQL Server 2008 or 2012 tools at your disposal.
Production database contains the following tables (simplified for example: constraints ignored, etc.) associated with a racing video game’s server.
-- A player of our game
-- Table greater than 10 million rows
CREATE
TABLE [dbo].[User]
(
[UserId]
[bigint] NOT
NULL
,[country]
[int] NULL
-- User’s home country
,[name]
[nvarchar](15)
NULL -- User’s displayable name (‘John’, ‘Bill’)
,[subscriptionTier]
[int] NULL
)
-- 0 == free, 1 == paid, for instance
Assume that rows get written into the event tables at a rate of 1,000 a minute,are never updated once written and currently are only read on a replica/reporting server.
Question Background: Write up a single query that would return the following: List of users and whose “TotalMoneyEarned” value ever grew (between logon events) at a rate of more than 1,000 per minute (we’d consider these suspicious and flag them for later investigation).
For instance, if the sample data were:
-- example of [Events.UserLogon] data -- not the query output we want
EventId UserId TotalMoneyEarned LogonDate
----------- -------------------- ---------------- -----------------------
1 1 1000 2010-10-16 00:19:56.460
2 1 1500 2010-10-16 00:20:56.460
3 1 3000 2010-10-16 00:21:56.460
4 1 10000 2010-10-16 00:29:56.460
Event 1 is okay because there’s nothing to compare it against
Event 2 is okay because the TotalMoneyEarned only grew 500 in a minute
Event 3 should be flagged, as the value grew 1500 in a minute
Event 4 is okay, as it grew 7,000 in 8 minutes (< 1000 per minute)
Query Output (your query should return data in a format like this):
User Flagged Logon Time Rate Since Last Logon (money/minute)
John 2010-10-16 00:21:56 1500
Dave 2010-10-16 00:30:50 3200
Bill 2010-10-16 00:35:23 1000
It is likely that you will need to create sample data for both the User and [Events.Logon] tables. We are looking for a single query that returns data like what is represented in Query Output.
View 3 Replies
View Related
Nov 6, 2015
I have 2 tables A, B with 1 to many relationship
Table A(ProductID), TableB(ProductID, FileID)
I need to find only the records in Table A that may have more than one FileIDs in Table B, since some ProductIDS have multiple FileIDs in Table B...
View 8 Replies
View Related
Nov 2, 2015
INSERT
INTO [Table2Distinct]
([CLAIM_NUMBER]
,[ACCIDENT_DATE]
[code]....
I used the above query and it still inserts all the duplicate records. What is wrong with my statement?
View 5 Replies
View Related
Jun 15, 2007
How do I only select rows with duplicate dates for each person (id)? (The actual table has approximately 13000 rows with approximately 3000 unique ids)
p_key id date pulse
--------------------------------------
1 32 5/25/2006 80
2 32 5/25/2006 85
3 32 4/26/2006 81
4 32 6/15/2006 82
5 15 1/20/2006 75
6 15 3/25/2006 80
7 15 3/25/2006 83
Result table I am looking for:
p_key id date pulse
------------------------------------
1 32 5/25/2006 80
2 32 5/25/2006 85
6 15 3/25/2006 80
7 15 3/25/2006 83
Thanks.
View 8 Replies
View Related
Jun 2, 2015
If it possible to have an if statement match multiple results, as to not have to use the OR multiple times.
Example: I want to say, if Description equals red or blue or green or yellow or orange or black or white or pink, without having to use OR and OR and OR. Description can match 10 different values.
View 9 Replies
View Related
Jul 26, 2002
I have a customer database with the following structure:
Dept ID (int)
Section (varchar)
I need to find only occurrences of a section (eg Admin) where the section name has a record in Dept 1,2 and 3 - only return the result if the record for Admin is associated will these depts.
Example Data:
Dept Section
1 Admin
1 Finance
1 Marketing
2 Admin
2 Sales
2 Marketing
3 Admin
3 Sales
3 Finance
The query would only return 'Admin' since this is the only Section that is represented in Departments 1,2 + 3.
I am not in control of the data design and can't create a bitwise field to achieve this.
tia,
Steve
View 1 Replies
View Related
Jul 7, 2004
I am trying to complete an insert from query but the problem is I have duplicates, so I'm getting an error message. So to correct it I am creating a Find Duplicates statement in the Query analyzer but Its not working can someone tell me whats wrong with this statement (by the way I'm in SQL 2000 Server)
thank you
SELECT EmployeeGamingLicense [TM#]AS [TM# Field], Count([TM#])AS NumberOfDups
FROM TERMINATION
GROUP BY [TM#]
HAVING Count([TM#])>1;
GO
View 14 Replies
View Related
May 16, 2007
I've done a search and I THINK I've got my head round this, but I'd be very grateful if someone could reassure me:
SELECT Email FROM List1 WHERE EXISTS (
SELECT Email FROM List2 WHERE List2.Email= List1.Email
) AND List1.Email <> '44'
That will give me every email address from list one that (a) appears in list two, and (b) isn't '44'. Right?
And to find all the emails from List1 that DON'T occur in List2 (and aren't '44'), I just put "NOT" in front of "EXISTS". Right?
Sorry for asking an obvious question but I'm having a real mental block here. :o
View 6 Replies
View Related
Jun 5, 2008
how can i find duplicates in my database? i want to find them in the website and expressemail field
View 5 Replies
View Related
Jun 16, 2008
Hi guys,
I was just wondering if any one knows how to find duplicate keys using more than one field. I used the below key to find those people who exists in list1 but don't exists in list2. I realized that the results had some duplicates which was expected but how do I then find all those duplicate people. I know how to do it if there was a primary key present I would have done a count (distinct cardnumber) > 1 and i would have done the select statement like this distinct cardnumber, but how do I do it with more that one key??
View 1 Replies
View Related
Jul 20, 2005
I have a company table and I would like to write a query that will return tome any duplicate companies. However, it is a little more complicated thenjust matching on exact company names. I would like it to give me duplicateswhere x number of letters at the beginning of the company name match AND xnumber of letters of the address match AND x number of letters of the citymatch. I will be doing this in batches based on the first letter of thecompany name. So for example I will first process all companies that startwith the letter "A".So for all "A" companies I want to find companies where the first 5 lettersin the company name match and the first 5 characters of the address fieldmatch and the first 5 characters of the city match. THANKS!!!
View 3 Replies
View Related
Oct 9, 2007
I'm trying to find duplicates. Any help will be greatly appreciated! Here's my query:
I'm trying to find only the records where the "Sales_Header_Your_Reference" field is used more then once with a different "Sales Header Document ID" So it looks like this
Sales Header Document ID Sales_Header_Your_Reference
1955718 0002377729
2082721 0002478945
2082728 0002478976
2093598 0002487318
2093599 0002487318
2093601 0002487332
I hope im clear enough, here is my query. I get then listed right, but the count is all wrong, I only want to show the ones , like the one i have highlighted.
Code Block
Select COUNT(Sales_Header_Your_Reference),Tbl_Sales_Header.Sales_Header_Document_Id,Tbl_Sales_Header.Sales_Header_Your_Reference
FROM dbo.Tbl_Sales_Header
INNER JOIN dbo.Tbl_Sales_Invoice_Header
ON dbo.Tbl_Sales_Header.Customer_Bill_Customer_No = dbo.Tbl_Sales_Invoice_Header.Sales_Invoice_Header_Bill_Customer_No AND
dbo.Tbl_Sales_Header.Sales_Header_Order_DateTime = dbo.Tbl_Sales_Invoice_Header.Sales_Invoice_Header_Order_Datetime
WHERE (dbo.Tbl_Sales_Header.Sales_Header_Order_DateTime > CONVERT(DATETIME, '2007-09-15 00:00:00', 102)) AND
(dbo.Tbl_Sales_Header.Customer_Bill_Customer_No = 'Butler') OR
(dbo.Tbl_Sales_Header.Customer_Bill_Customer_No = 'MWI') OR
(dbo.Tbl_Sales_Header.Customer_Bill_Customer_No = 'NLS') OR
(dbo.Tbl_Sales_Header.Customer_Bill_Customer_No = 'HSI') OR
(dbo.Tbl_Sales_Header.Customer_Bill_Customer_No = 'NLSMVD') OR
(dbo.Tbl_Sales_Header.Customer_Bill_Customer_No = 'RNEXT')
GROUP by Tbl_Sales_Header.Sales_Header_Document_Id,Tbl_Sales_Header.Sales_Header_Your_Reference
HAVING COUNT(Sales_Header_Your_Reference) > 1
order by Tbl_Sales_Header.Sales_Header_Document_Id, Tbl_Sales_Header.Sales_Header_Your_Reference
View 9 Replies
View Related
Jan 1, 2008
Problem is that if the [Receiving] table doesn't have a match then no records are return. I want all matches from the [Orders Subtable] and any matches from the [Receiving] Table. If no [Receiving] table matches then I still want all matches from the [Orders Subtable]. Attached is the query.
Note: The query has to run in Access 2000 and I will be coding it in VB.
SELECT Orders.[Orders ID],
[Orders Subtable].ID,
[Orders Subtable].Quantity,
Receiving.Quantity,
Receiving.[Component #]
FROM (Orders
LEFT JOIN Receiving ON Orders.[Orders ID] = Receiving.[Orders ID])
INNER JOIN [Orders Subtable] ON Orders.[Orders ID] = [Orders Subtable].[Orders ID]
GROUP BY Orders.[Orders ID], [Orders Subtable].ID,
[Orders Subtable].Quantity, Receiving.Quantity,
Orders.[Project #], [Orders Subtable].On_Order,
[Orders Subtable].[Component #],
Receiving.[Component #]
HAVING (((Orders.[Project #])="Speed1aaaaa") AND
(([Orders Subtable].On_Order)=True) AND
(([Orders Subtable].[Component #])="R02101A") AND
((Receiving.[Component #])="R02101A"));
View 2 Replies
View Related
Jan 26, 2015
Is there a query or a way to convert duplicates value in a column to non duplicates.
View 14 Replies
View Related
Nov 22, 2005
I am trying to move distinct data from one table to another based on two columns regardless of special characters such as : ( ) ; ' " / - _ = >< etc. The two columns that are the defining factors that I need to match as duplicates are Col1 & Col3, the rest I want to get the maximum data from each column for that row (I hope this made sense).
Here is what I have tried, but it does not seem to work:
Code:
select
ID,
Col1,
max(Col2)as Col2,
Col3,
max(Col4)as Col4,
max(Col5)as Col5,
max(Col6)as Col6,
max(Col7)as Col7,
max(Col8)as Col8,
max(Col9)as Col9
where patindex('%[^-:]%',Col1) = 0 AND patindex('%[^-:]%',Col3) = 0
into Newtable
from dbo.CLEANING_bk
group by Col1,Col3
Basically if there are 10 rows of duplicate data based on Col1 & Col3 (regardless of special characters), I want to move the distinct info to a new table with the maximum amount of info from the other columns of the duplicate rows.
If row 3, 4,5, & 6 are all considered duplicate and Row3/Col2 has info in it that row 4 & 5/Col2 does not, I want to combine it with the single row I export to the new table retaining as much information from the rows of the duplicates as possible.
Can anyone help and let me know why my script is not wanting to play nice?
View 14 Replies
View Related
Aug 15, 2006
I have two lists of contacts. They're similar. I want a list of all the contacts whose email address occurs only in the first list.
SELECT COUNT(DISTINCT EMAIL) FROM List1
returns 13460
SELECT COUNT(DISTINCT EMAIL) FROM List2
returns 13220
SELECT EMAIL FROM List1 WHERE EMAIL NOT IN (SELECT DISTINCT EMAIL FROM List2)
returns 0 rows
How can it be returning no rows? What am I failing to take into consideration?
:confused:
View 6 Replies
View Related
Mar 27, 2008
Finding and replacing duplicates
I have this project; I am trying to work on but cant because of duplicated data in that table.
I tried to make the culume not accept duplicates, but I need to clean it up first and I can just delete them.
Currently the values are 11 digit numbers and I need to replace them with 00000000001
incrementing, making my last one 00000003000
Any idea on where to even begin? I am new to this.
What I am trying to do.
Find replicates numbers in TRAN_DATA
Replace each with a new number but increment the new number starting with 00000000001 and ending with 00000003000
With this I knew I had 3000 plus.
ELECT TRAN_LCTR_NR,
COUNT(TRAN_LCTR_NR) AS NumOccurrences
FROM TRAN_DATA
GROUP BY TRAN_LCTR_NR
HAVING ( COUNT(TRAN_LCTR_NR) > 1 )
View 5 Replies
View Related
Feb 27, 2008
Hi all,
I have two tables - Planning and Appointments:
Planning - contains a list of planned items. Used to define boundaries for a work day and defines based on type what can be done for each item.
Id,
TypeId - the type of the planned items
BeginTime DateTime - begin date and time of the planned item
EndTime DateTime - end date and time for the planned item
In the Planning table we can have as many records per day as we need:
1, First Meeting, 1 Jan 2008 09:00, 1 Jan 2008 11:00
2, First Meeting, 1 Jan 2008 11:00, 1 Jan 2008 12:00
3, First Meeting, 1 Jan 2008 13:00, 1 Jan 2008 15:00
4, First Meeting, 1 Jan 2008 15:00, 1 Jan 2008 18:00
Appointments - contanis a list with appointments
Id,
BeginTime DateTime
EndTime DateTime
1, 1 Jan 2008 09:00, 1 Jan 2008 09:30
2, 1 Jan 2008 10:00, 1 Jan 2008 11:00
3, 1 Jan 2008 11:00, 1 Jan 2008 11:30
4, 1 Jan 2008 14:00, 1 Jan 2008 15:30
What is needed?
What I need is to a find a way to compare the planned items with the appointments and to return all the periods for which a planned time exists:
Free planned time:
1, 1 Jan 2008 09:30, 1 Jan 2008 10:00
2, 1 Jan 2008 11:30, 1 Jan 2008 12:00
3, 1 Jan 2008 13:00, 1 Jan 2008 14:00
4, 1 Jan 2008 15:30, 1 Jan 2008 18:00
So, having two multitudes of periods,where the one specifies the planning templates and the other real used time, I need to find all the periods which can be used for another appointments.
I've tried several aproaches, but I always faced performance problems.
Thanks in advance.
View 1 Replies
View Related
Feb 22, 2008
Hi all,
I have two tables - Planning and Appointments:
Planning - contains a list of planned items. Used to define boundaries for a work day and defines based on type what can be done for each item.
Id,
TypeId - the type of the planned items
BeginTime DateTime - begin date and time of the planned item
EndTime DateTime - end date and time for the planned item
In the Planning table we can have as many records per day as we need:
1, First Meeting, 1 Jan 2008 09:00, 1 Jan 2008 11:00
2, First Meeting, 1 Jan 2008 11:00, 1 Jan 2008 12:00
3, First Meeting, 1 Jan 2008 13:00, 1 Jan 2008 15:00
4, First Meeting, 1 Jan 2008 15:00, 1 Jan 2008 18:00
Appointments - contanis a list with appointments
Id,
BeginTime DateTime
EndTime DateTime
1, 1 Jan 2008 09:00, 1 Jan 2008 09:30
2, 1 Jan 2008 10:00, 1 Jan 2008 11:00
3, 1 Jan 2008 11:00, 1 Jan 2008 11:30
4, 1 Jan 2008 14:00, 1 Jan 2008 15:30
What is needed?
What I need is to a find a way to compare the planned items with the appointments and to return all the periods for which a planned time exists:
Free planned time:
1, 1 Jan 2008 09:30, 1 Jan 2008 10:002, 1 Jan 2008 11:30, 1 Jan 2008 12:00
3, 1 Jan 2008 13:00, 1 Jan 2008 14:00
4, 1 Jan 2008 15:30, 1 Jan 2008 18:00
So, having two multitudes of periods,where the one specifies the planning templates and the other real used time, I need to find all the periods which can be used for another appointments.
I've tried several aproaches, but I always faced performance problems.
Thanks in advance.
View 1 Replies
View Related
Jul 20, 2005
What is the best way to compare two entries in a single table wherethe two fields are "almost" the same?For example, I would like to write a query that would compare thefirst two words in a "company" field. If they are the same, I wouldlike to output them.For example, "20th Century" and "20th Century Fox" in the companyfield would be the same.How do I do this? Do I need to use a cursor? Is it as simple as using"Like?"
View 2 Replies
View Related
Mar 10, 2008
Hi
I am trying to compare two flat files and extract new entry into new file.But in my case there is no key column in both flat files. is any way to find the new entry by checksum with out Key matching?.
Thanks
Guru
View 3 Replies
View Related
Nov 3, 2014
There are many duplicate records on my data table because users constantly register under two accounts. I have a query that identify the records that have a duplicate, but it only shows one of the two records, and I need to show the two records so that I can reconcile the differences.The query is taken from a post on stack overflow. It gives me 196, but I need to see the 392 records.
How to identify the duplicates and show the tow records without having to hard code any values, so I can use the query in a report, and anytime there are new duplicates, the report shows them.
SELECT
[groom_first_name]
,[groom_last_name]
,[bride_first_name]
,[bride_last_name]
[code]....
View 5 Replies
View Related
Oct 24, 2007
Hi,
I have a table with 50 + columns....
I have a single record and i want to basically say are there any that are the same as this one.. but without using 50 where clauses...
Any Suggestions
View 7 Replies
View Related
Jul 9, 2007
HI all I have this table below and i need to get records from col1 >1 and col2 >1
col1 col2 language
1 a s
1 b p
3 c e
3 c f
SELECT col1, col2, language COUNT(*) AS Expr1
FROM table1
GROUP BY col1, col2
HAVING (COUNT(*) > 1)
with the above query it gives me this
col1 col2 expr1
3 c 2
the language column is not displayed what i want is not the exact count but all the records which is duplicate and must resemble the result below
col1 col2 expr1 language
3 c 2 e
3 c 2 f
I need the result above inorder to normalize the table and bring out the language part into another table
thanks
View 2 Replies
View Related
May 9, 2008
Hey Guys, I have a contacts table that contains ID, First Name, Last Name, and Phone Number, Date Entered, Changed. Every time, the data is modified and saved, it will insert a new record in the table. So, Ill create a new record for a contact named Ryan, and then come back a day later and update the last name and phone number. So theSQL table would look like...1 Ryan Scott 818-550-0000 05/08/2008 Null2 Ryan Peters 000-000-0000 05/09/2008 Null How do I write a sql query that will run an update after the insert of the second record to fill in the Changed field with the data that changed?So I want to have record 2, end up looking like this... 2 Ryan Peters 000-000-0000 05/09/2008 LastName,PhoneNumberAny ideas?
View 8 Replies
View Related
Jul 3, 2001
Thanks for your help...
I have two databases in two different servers, I am running this script which shows customers not in the second server. I am getting an error shown below. any idea of how to solve this issue.
Ali
CREATE view v_show_customers_not_in_GP as
select customer_id,company_name,contact_fname,contact_lna me,phone,alt_phone,fax,email,street_1,street_2,cit y,c.name,s.code as state,zip_code
FROM customer v,country c, state s
WHERE v.country_id =c.country_id
and v.state_id = s.state_id
and convert(char(15),customer_id ) NOT IN (select custnmbr from servername.dbname.dbo.RM00101 )
Server: Msg 18452, Level 14, State 1, Line 1
Login failed for user '(null)'. Reason: Not associated with a trusted SQL Server connection.
View 1 Replies
View Related
Aug 4, 2015
I have a date comparison situation in which I will have a column with a date and will have another value containing a GETDATE() minus two weeks. I want to be able to compare both dates and get the MIN date between the two. I'm not sure how to use the MIN(Date) in this scenario since the comparison won't be between two different columns, but between one column and a random date generated by the GETDATE() minus two weeks.
View 6 Replies
View Related
Oct 27, 2015
I have a table that has Serial numbers and MSDSID numbers and many other fields in the table.
What I need to figure out is if the table contains a distinct Serial number with different MSDSIDs.
So If I have in the table
Serial MSDSID Date Size...
001 20 1/1/2015 5
002 21 1/1/2015 3
001 22 2/1/2015 1
003 21 3/1/2015 1
004 23 1/15/2015 5
003 22 1/20/2015 6
004 23 2/21/2015 5
002 21 4/25/2015 4
I would like the results to show:
Serial Count
001 2 the count is 2 because Serial 001 has an MSDSID of 20 and 22
002 1 the count is 1 because Serial 002 only has MSDSID 21
003 2 the count is 2 because Serial 003 has an MSDSID of 21 and 22
004 1 the count is 1 because Serial 002 only has MSDSID 23
It would be even better if the results just showed where the count is greater than 1.
View 5 Replies
View Related
May 6, 2015
I'm trying to pull records from a source/staging table where there is a duplicate row in it.I don't need that as the requirement is to garbage in /garbage out.when I do that from mart and use joins btw fact and dimensions, Im not getting this duplicate record as Im using distinct/group by. If I removed it, then it returns more than 3000 rows which is not correct. Is there a way I can keep these duplicates without removing group by...Im using correct joins and filters.
View 5 Replies
View Related
Jun 8, 2015
I am doing some audit and i have below query, how can i get rid of duplicates from the below query any T SQL to get rid of duplicates...
I am using SP_Who2 and sql server Audit for auditing all data happening on sql server databases and dumping them to tables Audit_DBAudit abd Audit_sp_who2 and from then i am trying to get data which is not repeating/duplicate ...
SELECT
A.ProgramName
,a.HostName,[Server_principal_name],[Server_instance_name],[Database_name],[Object_name],F.Statement
FROM Audit_DBAudit as F
Join [Audit_sp_who2] AS a
on LTRIM(RTRIM(F.server_principal_name))=LTRIM(RTRIM(A.Login))
View 11 Replies
View Related