Identifying Duplicate Data
Mar 9, 2001
Hi everybody,
I'm migrating a table that has above 20,000 records and lot of duplication.Let's say an Employee table with multiple records having slight
diference in the EmployeeName field.Now nobody would like to sit and manually identify them with such hugh number of records.
Is there any way which would help me identify most of them and
reduce the redundancy.
Thanx
Aby...
View 2 Replies
ADVERTISEMENT
Oct 8, 2015
any useful SQL Queries that might be used to identify lists of potential duplicate records in a table?
For example I have Client Database that includes a table dbo.Clients. This table contains various columns which could be used to identify possible duplicate records, such as Surname | Forenames | DateOfBirth | NINumber | PostalCode etc. . The data contained in these columns is not always exactly the same due to differences caused by user data entry; so some records may have missing data from some of the columns and there could be spelling differences too. Like the following examples:
1 | Smith | John Raymond | NULL | NI990946B | SW12 8TQ
2 | Smith | John | 06/03/1967 | NULL | SW12 8TQ
3 | Smith | Jon Raymond | 06/03/1967 | NI 99 09 46 B | SW12 8TQ
The problem is that whilst it is easy for a human being to review these 3 entries and conclude that they are most likely the same Client entered in to the database 3 times; I cannot find a reliable way of identifying them using a SQL Query.
I've considered using some sort of concatenation to a new column, minus white space and then using a "WHERE column_name LIKE pattern" query, but so far I can't get anything to work well enough. Fuzzy Logic maybe?
the results would produce a grid something like this for the example above:
ID | Surname | Forenames | DuplicateID | DupSurname | DupForenames
1 | Smith | John Raymond | 2 | Smith | John
1 | Smith | John Raymond | 3 | Smith | Jon Raymond
9 | Brown | Peter David | 343 | Brown | Pete D
next batch of duplicates etc etc . . . .
View 7 Replies
View Related
May 25, 2000
Hello,
This probably has been addressed before but I was unable to get the search to work properly on this site.
I am needing a script/way of deleting all rows from a DB with the exception of one record left for each row that has duplicate column data. Example :
Row 1
Field1 = 12345 Field2 =xxxxx Field 3=yyyyy Field4=zzzzz etc.
Row 2
Field1 = 12345 Field2 =zzzzzz Field 3=xxxxxx Field4=yyyyyy etc.
Row3
Field1 = 12345 Field2 =20202 Field 3=11111 Field4=zzzzz etc.
Row 4
Field1 = 54321 Field2 =xxxxx Field 3=yyyyy Field4=zzzzz etc.
Etc. Etc.
I want to be able to find the duplicates for Field1 and then delete all but 1 of those rows.( I don't care which one I keep just so only one is left.) The data in the other fields may or may not be unique.
I know how to find the duplicates it's just the deleting part I am having problems with. Any help would be much appreciated. Thanks,
Kerry
View 3 Replies
View Related
Dec 11, 2006
Hello guys! Is it possible to duplicate a primary key?
I would like my database to accept data with the same primary key.
Is it possible?
How do you declare ON DUPLICATE KEY UPDATE?
Please help me. Thanks in advance.
View 1 Replies
View Related
Jul 1, 2004
I want to be able to duplicate a row of data in sql....Does anyone know if there is a sql command that will do that. I have a table with an auto increment primary key and I want to duplicate everything except the key into a new record.
Thanks.
View 2 Replies
View Related
Nov 17, 1998
I am new to SQL server 6.5.
I will need to stress test a sql server 6.5 test database by duplicating
data. To do so, I need to know how to modify the primary key(which are
numbers in character data type) to duplicate the data several times over.
The primary key is character data.
What I have done ion the past is use insert statments -let's say 20 -- and then copy them and change the date in a text editor and change the primary key column and the other coluimns with thea replace of different letters and numbers. This is slow and tedious.
Does any one have a script I can run to do so?
Can I do this sql or do I need some sort of stored procedure? Any help
would be appreciated. Thanks.
DAvid Spaisman
would be greatly appreciated. THanks.
David Spaisman
View 3 Replies
View Related
Aug 28, 2007
Hello all,
I have recently been working on a project that requires one simple table to insert data into. The problem here is that all the data inserted must only access the database via stored procedure and I want to ensure that no duplicate data is inserted in the database.
I have done quite a bit of research for many ways to perform duplicate data testing from building temp tables and on, but nothing has really stood out to me yet. I would really like to find some information on how to perform duplicate data testing using a stored procedure that allows to test the data being inserted before it is saved to the database; therefore, when the user inserts the fields and clicks the insert button, the fields will be tested against the existing data (via stored procedure) within the database before being added.
Can anyone help?
Thanks
View 10 Replies
View Related
Apr 24, 2007
I have a table called emails with a field named emailaddress and some emailaddress are entered more than once. I want to be able to list all emailaddress just once. is there an sql statement that I could use to generate this.
View 3 Replies
View Related
Sep 27, 2007
Hello people, not sure how to do this. I have a unique Claim ID and when joined with the code table has 4 different Code ID and the Claim Amt is also unique to the Claim ID (One Claim Amnt per one Claim ID). I need all of these data for the report my problem is in getting the correct summation of the claim amt since this looks like I have 4 amounts of $1773.31 when in actuality there is only one. The Code ID is also part of the report parameter so I tried assigining the $ amount to just one of the Code ID but that will not work incase that particular Code ID is not selected in the parameter, the claim would read as a zero. Any help would be appreciated. Thanks
Claim ID: Code ID Claim Amt
07089000296 757.39 1773.31
07089000296 V05.3 1773.31
07089000296 V30.01 1773.31
07089000296 V72.19 1773.31
View 6 Replies
View Related
Jun 13, 2000
i have 2 tables not connected in any way
but both have orderid filed (same filed).
In one table this filed (and onther one) are keys,
the second table dose not hace a key at all.
The same order_id CAN repeat itself in each table.
When i try to join the tables (some rows just in one table and some in both):
Select tab1.name, tab1.orderid, tab2.sku
from tab1 inner join tab2 on tab1.orderid=tab2.orderid
The result i get is duplicate.
each row is multiple.
What I'm doing wrong?
View 3 Replies
View Related
Dec 16, 2004
I have a query that for one reason or another produces duplicate information in the result set. I have tried using DISTINCT and GROUP BY to remove the duplicates but because of the nature of the data I cannot get this to work, here is an example fo the data I am working with
ID Name Add1 Add2
1 Matt 16 Nowhere St Glasgow
1 Matt 16 Nowhere St Glasgow, Scotland
2 Jim 23 Blue St G65 TX
3 Bill 45 Red St
3 Bill 45 red St London
The problem is that a user can have one or more addresses!! I would like to be able to remove the duplicates by keeping the first duplicate ID that appears and getting rid of the second one. Any ideas?
Cheers
View 4 Replies
View Related
Apr 30, 2008
How to fetch that replicated records?
Anyone can share the query??
View 2 Replies
View Related
May 7, 2015
set ANSI_NULLS ON
set QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[sectionexpenses]
(@sectionname varchar(30),
@ExpensesName varchar(max),
[code]....
View 3 Replies
View Related
Jun 15, 2007
Hi. Not sure which section this request needs to be put in, but i'm relatively new to SQL.
I have 1 table which contains an user_id (autonumber), user_name, and user_profile.
I have 2 other tables:
config_version (cv) and config (c)
These two tables both access the user table (us) to view the user_id (which is required).
I want to be able to view the user_names for both "cv" and "c" on a seperate page. Using the code below, i'm lost. I created what i wanted in MS Access with the use of a 2nd table (this might be easier to understand than my rant above). However, I don't want a 2nd table.
Can someone provide me with a function, or the "answer" to my problem?
Highlighted Blue - just there to fill in the front page with data (not wanted)
Highlighted Green - doesn't work, but was my first attempt
Code:
public function ShowQuotes
Dim objConn
Dim objADORS
Dim strSQL
Dim row
Set objConn = Server.CreateObject("ADODB.Connection")
objConn.ConnectionString = strConn
objConn.Open
Set objADORS = CreateObject("ADODB.RecordSet")
strSQL = "Select top 50 "
strSQL = strSQL & " cv.config_version_id as cvid, cv.configuration_id as cid, cv.version_number as cnum"
strSQL = strSQL & ", cv.status as cstat, cv.total_price as cprice, cv.modification_date as cmod, cv.version_description as cvrem"
strSQL = strSQL & ", c.remarks as qrem"
strSQL = strSQL & ", p.prod_description as pdesc, p.version as pvers, p.status as pstat"
strSQL = strSQL & ", cust.customer_name as custname"
strSQL = strSQL & ", us.user_nt_login as modname"
strSQL = strSQL & ", us.user_name as usname"
' strSQL = strSQL & ", cv.user_id as modname"
strSQL = strSQL & " from tbl_config_version as cv, tbl_configuration as c, tbl_product as p, tbl_user as us, tbl_customer as cust"
strSQL = strSQL & strWhere 'SETS RESULTS TO BE UNIQUE
strSQL = strSQL & " and us.user_id = c.user_id"
strSQL = strSQL & " and cv.configuration_id = c.configuration_id"
strSQL = strSQL & " and c.product_id = p.product_id"
strSQL = strSQL & " and c.customer_id = cust.customer_id"
strSQL = strSQL & strOrderBy & ";"
Image: www.mcdcs.co.uk/TT.jpg
View 1 Replies
View Related
Jul 20, 2005
This is probably a silly question to most of you, but I'm in the processof splitting off years from a large DB to several smaller ones. Some ofthe existing smaller DBs already have most of the data for theirrespective years. But some of the same data is also on the source DB.If I simply do an insert keying on the year column, and a row beinginserted from the source DB already exists in the target DB, will aduplicate row be created?And if so, how can I avoid that?Thanks,John Steen*** Sent via Developersdex http://www.developersdex.com ***Don't just participate in USENET...get rewarded for it!
View 2 Replies
View Related
May 2, 2008
Hello,
I have a dataset which I would like to remove data from, but I can't seem to find out how to do this.
The dataset contains the following columns:
SCode, NIn, SIn, AIn, NOut, SOut, AOut and TagNumber.
I would like to remove data from the dataset when the following occurs:
("SC123", "NIn123", "s-in-323", "a-in-342", "NOut43", "s-out-231", "a-out-45", "tagnumber12")
("SC123", "NIn123", "s-in-xyz", "a-in-xws", NULL, NULL, NULL, "tagnumber12")
This is when NIn occurs with the same value more than once, and I would like to remove (or ignore, filter) the row when NOut, SOut and AOut are null.
I am new to SSIS and can't see how I could do this (although I'm sure it's possible).
If anyone could show me how I would appreciate it.
Thanks.
View 11 Replies
View Related
Sep 20, 2006
I have a table with 68 columns. If all the columns hold the same value except for one which is a datetime column I want to delete all but one of the duplicate rows. Preferably the latest one but that is not important. Can someone show me how to accomplish this?
View 5 Replies
View Related
Feb 12, 2007
Cannot find this anywhere while creating a report. Is this an option?
thanks.
View 2 Replies
View Related
Oct 2, 2007
Hello Everyone:
I am using the Import/Export wizard to import data from an ODBC data source. This can only be done from a query to specify the data to transfer.
When I try to create the tables, for the query, I am getting the following error:
Msg 2714, Level 16, State 4, Line 12
There is already an object named 'UserID' in the database.
Msg 1750, Level 16, State 0, Line 12
Could not create constraint. See previous errors.
I have duplicated this error with the following script:
USE [testing]
IF OBJECT_ID ('[testing].[dbo].[users1]', 'U') IS NOT NULL
DROP TABLE [testing].[dbo].[users1]
CREATE TABLE [testing].[dbo].[users1] (
[UserID] bigint NOT NULL,
[Name] nvarchar(25) NULL,
CONSTRAINT [UserID] PRIMARY KEY (UserID)
)
IF OBJECT_ID ('[testing].[dbo].[users2]', 'U') IS NOT NULL
DROP TABLE [testing].[dbo].[users2]
CREATE TABLE [testing].[dbo].[users2] (
[UserID] bigint NOT NULL,
[Name] nvarchar(25) NULL,
CONSTRAINT [UserID] PRIMARY KEY (UserID)
)
IF OBJECT_ID ('[testing].[dbo].[users3]', 'U') IS NOT NULL
DROP TABLE [testing].[dbo].[users3]
CREATE TABLE [testing].[dbo].[users3] (
[UserID] bigint NOT NULL,
[Name] nvarchar(25) NULL,
CONSTRAINT [UserID] PRIMARY KEY (UserID)
)
I have searched the "2714 duplicate error msg," but have found references to duplicate table names, rather than multiple field names or column name duplicate errors, within a database.
I think that the schema is only allowing a single UserID primary key.
How do I fix this?
TIA
View 4 Replies
View Related
Dec 31, 2006
I have a table storing only 2 FKs, let's say PID, MID
Is there any way that I can check distinct data before row is added to this table?
For example, current data is
PID MID------------100 2001100 2005101 3002102 1009102 7523102 2449
If my query is about to insert PID 100, MID 2001, since it's existing data, i don't want to add it. Can I use trigger to solve this issue?
Thanks.
View 2 Replies
View Related
Jun 11, 2007
I want to import data from a live site database into a development database (SQL Server 2005 Express) using the DTSWizard. Once I copy/paste the live database to my dev machine, I cant attach the live site database because it has the same name as the database on the dev site.A simple solution I would assume is to change one of the names. But I can't seem to change the "orignal file name". A backup/restore won't work for me because I made table/field changes to the dev database. Thanks --Dietrich
View 2 Replies
View Related
Apr 20, 2008
I'm working on a web app that needs to be able to take a row in the database and duplicate it, creating a new row in the same table with the same data except for the ID field and reference field.So basically: table1.row1 references table2.row1. I need to duplicate the data in table1.row1 (creating table1.row2) with the same reference to table2.row1.Is there any easy way to do this in SQL? I'm just looking for some ideas or a framework to accomplish this.
View 1 Replies
View Related
Apr 11, 2001
Our programs have and would like to continue keep the same data in multiple database. Example phone numbers. Us DBAs are tring to convince them not to.
Other than keeping the data current, what avantages are there to keep data in one location for multiple application to access.
View 3 Replies
View Related
Jun 25, 2000
Hi, I am new to SQL 7.0 and I have a large database but with some duplicate
problems.
For example, in a table tblA we have columns as:
userid, lname, fname, street, state... etc,
The problem is: The same user sometimes use different styles for his/her street and state (suppose he/she always use same userid, lname and fname),
such as,
street State
7531 Plum Drive Maryland
7531 Plum Dr. MD
So a same user can produce a lot of duplicate user infomation. Can anybody
help to elimenate this kind of duplicate data?
Thanks.
Allan
View 2 Replies
View Related
Sep 22, 2004
I just converted an old non-relational database into something that MS SQL likes. The old primary keys were broken up into two columns, one being useful. The column I need to use has some rows with the same values in them.
I am looking for some way in a SQL script to look for the duplicate rows and add "_X" to the data where X is a value incremented by 1 for each duplicate row found.
For example, 3 duplicate rows with "5443aa" would return "5443aa", "5443aa_1","5443aa_2".
Any ideas?
--MartinZ
View 1 Replies
View Related
Feb 12, 2007
I'm trying to merge two Access databases into one SQL server database. I have 3 tables that are all related with primary and foreign keys.
When I try to import my second set of 3 tables I get errors about the keys already existing in the database. Is there any way to force SQL server to assign new keys while preserving my existing relationships? Thanks!
View 10 Replies
View Related
Oct 25, 2013
I'm designing an app for stock keeping. In my DB, I have a field called "ItemSerialNo" and I made it unique(but not the table's primary key). On my front end, I have a text box for Item Serial No and a combo box loaded with the item brands and also a save button. I know that if I try to save a serial no already existing in my DB, the app won't allow me because of the unique property of the field named "ItemSerialNo". But I want to be able to save a serial no already existing in my DB but with a different brand name.
For example, I want to be able to save information like:
1. ITEM SERIAL NO = 12345
BRAND NAME = AA
2. ITEM SERIAL NO = 12345
BRAND NAME = BB.
View 2 Replies
View Related
Dec 10, 2013
We need to move duplicate data from this sheet to other table also having issue that sometime verifiedmemberID is null as well as verifiermember name is null and also having the values in BCP authorisationcode as well as FPoveridecode but transactionmode/bcpmode is 'n' and also having condition that transactionmode/bcpmode is 'y' but bcpauthorisationcode is blank.
MemberStatecodeMemberDistrictCodeURNCompanyCodeHeadMemberIDHeadMemberNamePatientID
PatientNamePatientGenderPatientAgeTerminalIDHospitalCodeRegistrationNoBlockingUserDateUnblocking
InvoiceNoDischargeInvoiceNoDischargeDescDischargeUserDateAmoutClaimedTransactionMode/BCPMode
UnspecifiedAuthCodeUnspecifiedAuthDateBCPAuthorizationCodeBCPAuthorizationDateFPOverideCode
[code]....
View 2 Replies
View Related
Sep 16, 2014
I have heard from my client that they are facing duplicate data issue on one of the fact table.
Basically there is a view built on fact table and client access the data through the views.
There warehouse is loaded daily through SSIS packages. The duplicate records issue is only when the views are queried during the data loading process. The duplicates are gone when the data load is completed.
View 1 Replies
View Related
Apr 10, 2007
Hi All
i want the output like this
date poid sales ref unit cost ordered received sold shrinkage sale type postage delivery payment type
23/3/2007 12345 test - - 1 - tel 20 shipping credit card
for that i have written two sql queries
qry1 =
///
"SELECT im_products_stock_logs.orderid,im_products_stock_logs.log_type,
im_products_stock_logs.log_date, im_products_stock_logs.poid,
products.lead_time,products.cost_price,orders.sales_type,
isnull(orders.totalamt,0) as totalamt, isnull(orders.shippingamt,0)
as shippingamt, orders.delivery_method, orders.payment_method
FROM im_products_stock_logs LEFT OUTER JOIN orders ON
im_products_stock_logs.orderid = orders.orderid LEFT OUTER JOIN Products on
im_products_stock_logs.productid= products.productid WHERE
(im_products_stock_logs.productid = 790) and poid=14 order by log_date desc "
///
qry2=
///
SELECT im_products_stock_logs.log_type, SUM(im_products_stock_logs.qty)
AS qty, im_products_stock_logs.poid, DAY(im_products_stock_logs.log_date)
AS Expr2, YEAR(im_products_stock_logs.log_date) AS Expr3,
MONTH(im_products_stock_logs.log_date) AS Expr4,
{ fn MINUTE(im_products_stock_logs.log_date) } AS Expr5,
{ fn HOUR(im_products_stock_logs.log_date) }
AS Expr6 FROM im_products_stock_logs LEFT OUTER JOIN orders ON
im_products_stock_logs.orderid = orders.orderid LEFT OUTER JOIN
products ON im_products_stock_logs.productid = products.productid WHERE
( im_products_stock_logs.productid = 790 and im_products_stock_logs.colorid = 2 )
GROUP BY im_products_stock_logs.log_type, im_products_stock_logs.poid,
DAY(im_products_stock_logs.log_date), YEAR(im_products_stock_logs.log_date),
MONTH(im_products_stock_logs.log_date), { fn HOUR(im_products_stock_logs.log_date)
}, { fn MINUTE(im_products_stock_logs.log_date) }
ORDER BY YEAR(im_products_stock_logs.log_date) DESC,
MONTH(im_products_stock_logs.log_date) DESC,
DAY(im_products_stock_logs.log_date) DESC,
{ fn HOUR(im_products_stock_logs.log_date) }
DESC, { fn MINUTE(im_products_stock_logs.log_date) } DESC
///
the table use in are
im_products_stock_logs-orderid,log_type,log_date,poid,
products-lead_time,cost_price
orders-sales_type,delivery_method,payment_method
the sample data is
orders
orderid sales_type delivery_method payment_method
1025 tel v shipping 1
products
productid lead_time cost_price
13 4 45.00
im_products_stock_logs
logid productid orderid log_type log_date poid
40 13 1025 sold 23/3/2007 8
I have written the query for the same is
in my query i m getting the duplicate values .
How can I solve it please help me.
thanks
View 20 Replies
View Related
Oct 5, 2007
Hi,
I want to know the different sources, mediums and ways using which the duplicate recoprds or adta can make an entry in our database tables. I found 4 ways for this from many articles on net that are as follows:
Duplicate data might arrive at your database via an interface to another system
Data is loaded into table from other sources because during data loads, the integrity constraints are disabled
Merging data from disparate systems
Inheriting a poorly designed databaseBut is there any other ways also present????? Please tell me???
Thanks,
View 1 Replies
View Related
Sep 27, 2006
I have a table that contains more than 10,000 rows of
duplicate data. The script below copies the data to a temp table then
deletes from the original table. My problem is that after it runs, I now
have 122 rows of triplicate data (but dups are gone). If I rerun the script, it doesn't see the
triplicate data and returns 0 rows. I've use three different versions of
delete dup row scripts with the same result. There are no triggers or
constraints on the table, not even a primary key. What am I missing?-------------------------------------------------------------------
/**********************************************
Delete Duplicate Data
**********************************************/
--Create temp table to hold duplicate data
CREATE TABLE #tempduplicatedata
(
[student_test_uniq] [bigint] NULL,
[test_uniq] [int] NULL,
[concept_id] [smallint] NULL,
[test_id] [varchar](12) NULL,
[questions_correct] [smallint] NULL,
[questions_count] [smallint] NULL,
[percentage_correct] [decimal](6, 3) NULL,
[concept_response_count] [smallint] NULL
)
--Identify and save dup data into temp table
INSERT INTO #tempduplicatedata
SELECT * FROM crt_concept_score
GROUP BY student_test_uniq,
test_uniq,
concept_id,
test_id,
questions_correct,
questions_count,
percentage_correct,
concept_response_count
HAVING COUNT(*) > 1
--Confirm number of dup rows
SELECT @@ROWCOUNT AS 'Number of Duplicate Rows'
--Delete dup from original table
DELETE FROM crt_concept_score
FROM crt_concept_score
INNER JOIN #tempduplicatedata
ON crt_concept_score.student_test_uniq = #tempduplicatedata.student_test_uniq
AND crt_concept_score.test_uniq = #tempduplicatedata.test_uniq
AND crt_concept_score.concept_id = #tempduplicatedata.concept_id
AND crt_concept_score.test_id = #tempduplicatedata.test_id
AND crt_concept_score.questions_correct = #tempduplicatedata.questions_correct
AND crt_concept_score.questions_count = #tempduplicatedata.questions_count
AND crt_concept_score.percentage_correct = #tempduplicatedata.percentage_correct
AND crt_concept_score.concept_response_count = #tempduplicatedata.concept_response_count
--Insert the delete data back
INSERT INTO crt_concept_score
SELECT * FROM #tempduplicatedata
--Check for dup data.
SELECT * FROM crt_concept_score
GROUP BY student_test_uniq,
test_uniq,
concept_id,
test_id,
questions_correct,
questions_count,
percentage_correct,
concept_response_count
HAVING COUNT(*) > 1
--Check table
-- SELECT * FROM crt_concept_score
--Drop temp table
DROP TABLE #tempduplicatedata
GO
View 1 Replies
View Related
Oct 22, 2007
my table structure is
id field1
1 i am from india
2 i am from usa
3 i am from delhi
So i want to remove common data from field1 , means after run the query table should be like
id field1
1 india
2 usa
3 delhi
thanks in advance
saumitra tamrakar
View 8 Replies
View Related