Identifying Duplicates - Find Where 2 Or More Records Have Same Rank
Sep 26, 2014
I have this query below that I created to do a count, but I don't think this is what I needed.
I need to find the duplicates. Example, if
CLI_ID1 12345 has 4 CLIP records, each CLIP record should have a different CLIP rank. I need to find scenarios where 2 (or more) of the CLIP records have the same CLIP RANK. If there are duplicate CLIP_RANKs within the same CLI_ID,
Select Distinct
cli_id1, count(clip_rank) countrank
FROM impact.dbo.CLI
LEFT JOIN impact.dbo.CLIO ON CLI.CLI_ID1 = CLIO.clio_id1
left join
impact.dbo.clip ON cli_id1 = clip_id1
Where (clio_trm = '' or clio_trm = NULL or clio_trm is null)
group by cli_id1
order by cli_id1
I have a table with 22 million Business records. I can see that there are duplicates when I group by BusinessName and Address and Phone. I'd like to place only the duplicates into a table, with a ranking, oldest business key gets a ranking of 1.
As a bonus I'd like each group to have a distinct group name (although not necessary, just want to know how to do this)
Later after I run more verifications to make sure these are not referenced elsewhere I'll delete everything with a matchRank > 1 out of the main Business table.
DROP TABLE [dbo].[TestBusiness]; GO CREATE TABLE [dbo].[TestBusiness]( [Business_pk] INT IDENTITY(1,1) NOT NULL, [BusinessName] VARCHAR (200) NOT NULL, [Address] VARCHAR(MAX) NOT NULL,
I need help flagging duplicate records in ome tables I have.For example if I have Table1 which conatins Field1, Field2 and Field3like belowField1 Field2 Field3 Field4Paul 18 Null NullPaul 18 Null NullJohn 19 Null NullHow would I;1. put a 'Y' in Field3 to mark the two records which are duplicates.2. put a 'Y' in Field4 to mark ONLY ONE of the duplicate records.Regards,Ciarán
I am trying to test some data handling between two different versions of an application.
I have restored the database schema twice, once as DB_old and once as DB_new.
I import a transaction using the new application into DB_new and I import the SAME transaction into the DB_old using the old version of application.
I then have to eyeball the data in SQL Query Analyzer to try to identify problems where the fields have received different values.
I have done this by running a select statement twice telling it to use both of the databases and then viewing it in two grids. There are a lot of columns so I have to do a lot of scrolling across the screen to do the comparison, and since the view is in two separate grids I have to hop back and forth and click the scroll bars, etc.
It seems like there has to be a better way. I don't suppose there is a way to lock the two grids so they both scroll together is there?
I was thinking maybe I could insert each of the selects into a temporary table and then do some kind of comparison to identify which values were different in each column. Some of the columns will have differences, like the timestamp, but if I could somehow identify which columns were different then I could eyeball them to identify which of those were okay to be different and which of them were actually bugs from the changed application version.
I have no idea how to identify those individual columns with different data values or even where to start.
Just so you understand better what I am doing now here is the query I am running that I then eyeball: use DB_new select * from claim where claim_id = 35144 use DB_old select * from claim where claim_id = 35144
I have a series of records based on empid where I want to identify the empid that may have discrepancies listed. I have some empids that are listed more than once and have different DOB's. In the example I am trying to Create a DOB_ERROR column and either say yes if the DOB doesn't match the other records in the file with the same empid.
SELECT Empid, DOB, CASE WHEN DOB = DOB THEN 'No' ELSE 'Yes' END AS DOB_ERROR, City, St, Gender FROM Emp WHERE EMPID IN
any useful SQL Queries that might be used to identify lists of potential duplicate records in a table?
For example I have Client Database that includes a table dbo.Clients. This table contains various columns which could be used to identify possible duplicate records, such as Surname | Forenames | DateOfBirth | NINumber | PostalCode etc. . The data contained in these columns is not always exactly the same due to differences caused by user data entry; so some records may have missing data from some of the columns and there could be spelling differences too. Like the following examples:
1 | Smith | John Raymond | NULL | NI990946B | SW12 8TQ 2 | Smith | John | 06/03/1967 | NULL | SW12 8TQ 3 | Smith | Jon Raymond | 06/03/1967 | NI 99 09 46 B | SW12 8TQ
The problem is that whilst it is easy for a human being to review these 3 entries and conclude that they are most likely the same Client entered in to the database 3 times; I cannot find a reliable way of identifying them using a SQL Query.
I've considered using some sort of concatenation to a new column, minus white space and then using a "WHERE column_name LIKE pattern" query, but so far I can't get anything to work well enough. Fuzzy Logic maybe?
the results would produce a grid something like this for the example above:
ID | Surname | Forenames | DuplicateID | DupSurname | DupForenames 1 | Smith | John Raymond | 2 | Smith | John 1 | Smith | John Raymond | 3 | Smith | Jon Raymond 9 | Brown | Peter David | 343 | Brown | Pete D next batch of duplicates etc etc . . . .
Hi, I have table which stores the fund name and its data. We get quarterly information from the fund co. Suppose if the user wants to add a fund thats not in our database we let then add a ClientFundId and a FundName. But may be after sometime the fund company may add that fund in the next quarter.. So how do i get rid of Duplicated Data.. In the ClientFundId column we can a 9 letter Aplhanumeric or a 5 letter character but if the fund co.. provides those values the 5 letter characters are stored in Ticker column and the 9 letter words are stored in Cusip column.. So i just wrote this query hoping i could retrieve the duplicate values but it didnt list any..but i found one this is my query.. Select FundId, Cusip, Ticker, ClientFundId, FundName, ShortName From Fund Where
ClientFundId = Ticker or ClientFundId = Cusip Any help will appreciated Thanks Karen
I am trying to find when a name has been entered more than once into 1 database table.
I'm currently doing something like this (can't remember exactly, not at work)
SELECT COUNT(*) AS Cnt, Name FROM tblTable GROUP BY Name ORDER BY Cnt Desc
This brings back all the Names in the database and tells me which are duplicates but I want to just have the results of the duplicate values and not the single values.
does someone have a querry to display the duplicate records in a table.
Table: zipcode dma
My data upload is failing because there is a primary key on zipcode and the source data (42k records) has about 50 duplicate zipcode records in it. It is possible that there is a unique combo of zipcode / dma but I need to identify the duplicate records to determine that.
I have this script bellow which does what it is supposed to. However it only outputs the cust_id. I want it to show all the columns in the table. How would I do this?
SELECT cust_id FROM cust_table WHERE cust_name in ('Billy','John') and rownum < 100 GROUP BY cust_id HAVING COUNT(*) > 1;
Hi everybody I need help on finding duplicates and deleting the duplicate record depending on name and fname , deleting the duplicates and leaving only the first one.
my PERSON table is this below:
ID name fname ownerid id2
1 a b 2 c c 3 e f 4 a b 1 10 5 c c 2 11
I have this query below that returns records 1 and 4 and 2 and 5 since they have the same name and fname
select * from ( Select name ,fname, count(1) as cnt from PERSON group by name,Fname ) where cnt > 1
ID name fname ownerid id2
1 a b 4 a b 1 10
2 c c 5 c c 2 11
With this result I need to delete the second record of each group but update the first records with the ownerid and id2 of the second record that would be deleted... I don't know how to proceed with this..
I have a table that has Serial numbers and MSDSID numbers and many other fields in the table.
What I need to figure out is if the table contains a distinct Serial number with different MSDSIDs.
So If I have in the table
Serial         MSDSID         Date         Size... 001                20            1/1/2015       5 002                 21            1/1/2015       3 001                22            2/1/2015       1 003                21           3/1/2015       1 004                23           1/15/2015      5 003                22            1/20/2015     6 004                23           2/21/2015      5 002                21           4/25/2015     4
I would like the results to show:
Serial           Count 001                 2          the count is 2 because Serial 001 has an MSDSID of 20 and 22 002                 1           the count is 1 because Serial 002 only has MSDSID 21 003                  2           the count is 2 because Serial 003 has an MSDSID of 21 and 22 004                 1          the count is 1 because Serial 002 only has MSDSID 23
It would be even better if the results just showed where the count is greater than 1.
I have a table that exists of Serial, MSDSID and other fields.
What I need to check is to see if there are Serial numbers that multiple MSDSID numbers.
So if I have
Serial         MSDSID 001               23 002               24 003               25 004               23 005               26 006                24
The results would be
Serial            MSDSID          Count 001               23                  2 004               23                  2 002               24                  2 006                24                  2
I need to make a selection on join datasets with 2 conditions and populate the results in another dataset(Report).It is working with the fist condition "AccountingTypeCharacteristicCodeId = 3"...
INSERT INTO SurveyInterface.tblLoadISFNotification (OperatingEntityNumber, SDDS, SurveyCodeId, QuestionnaireTypeCodeId, ReferencePeriod, DataReplacementIndicator, PrecontactFlag, SampledUnitPriority) SELECT ISF.OperatingEntityNumber       ,[SDDS]       ,[SurveyCodeId]       ,[QuestionnaireTypeCodeId]       ,[ReferencePeriod]       ,[DataReplacementIndicator]       [code]....
Know I also want to add in that new dataset(report) all the duplicates of concatenated variables
ISF.OperatingEntityNumber/ISF.QuestionnaireTypeCodeId GROUP BY ISF.OperatingEntityNumber, ISF.QuestionnaireTypeCodeId
Every sunday, new data will be loaded from temptable to main table. I have to make sure that, duplicates does not get loaded from temptable to maintable.
For example, if last sunday a record gets loaded from temp to main. If this sunday also the same record is present then it means that is a duplicate.
The duplicate is decided on below scenario
select 'CodeChanges: ', count(*) from CodeChanges a, CodeChanges_Temp b where a.AccountNumber = b.AccountNumber and a.HexaNumber = b.HexaNumber and a.HexaEffDate = b.HexaEffDate and a.HexaId = b.HexaId and
[Code] ...
Yesterday (Sunday) , data from temp got loaded onto maintable but with duplicates.
There is a log which just displays number of duplicates.
Yesterday the log displayed 8 duplicates found. I need to find out the 8 duplicates which got loaded yesterday and delete it off from main table.
There is a column in both tables which is 'creation date and time'. Every Sunday when the load happens this column will have that day's date .
Now i need to find out what are all the duplicates which got loaded on this sunday.
The total rows in temp table is : 363 No of duplicates present is : 8
I used below query to find out the duplicates but it is returning all the 363 rows from the maintable instead of the 8 duplicates.
Select 'CodeChanges: ', * from CodeChanges a where exists ( Select 1 from CodeChanges_Temp b where a.HexaNumber = b.HexaNumber and a.HexaEffDate = b.HexaEffDate and
[Code] ...
Need finding the duplicate records which has creation date time as '2015-11-01 00:00:00.000' and all the above columns mentioned in the query matches.
My basic situation is this - I ONLY want duplicates, so the oppositeof DISTINCT:I have two tables. Ordinarily, Table1ColumnA corresponds in a one toone ratio with Table2ColumnB through a shared variable. So if I queryTableB using the shared variable, there really should only be onrecord returned. In essence, if I run this and return TWO rows, it isvery bad:select * from TableB where SharedVariable = 1234I know how to join the tables on a single record to see if this is thecase with one record, but I need to find out how many, among possiblymillions of records this affects.Every record in Table1ColumnA (and also the shared variable) will beunique. There is another column in Table1 (I'll call itTable1ColumnC) that will be duplicated if the record in Table2 is aduplicate, so I am trying to use that to filter my results in Table1.I am looking to see how many from Table1 map to DUPLICATE instances inTable2.I need to be able to say, in effect, "how many unique records inTable1ColumnA that have a duplicate in Table1ColumnC also have aduplicate in Table2ColumnB?"Thanks if anyone can help!-- aknoch
I have a patient record and emergency contact information. I need to find duplicate phone numbers in emergency contact table based on relationship type (RelationType0 between emergency contact and patient. For example, if patient was a child and has mother listed twice with same number, I need to filter these records. The case would be true if there was a father listed, in any cases there should be one father or one mother listed for patient regardless. The link between patient and emergency contact is person_gu. If two siblings linked to same person_gu, there should be still one emergency contact listed.
Below is the schema structure:
Person_Info: PersonID, Person Info contains everyone (patient, vistor, Emergecy contact) First and last names Patient_Info: PatientID, table contains patient ID and other information Patient_PersonRelation: Person_ID, patientID, RelationType Address: Contains address of all person and patient (key PersonID) Phone: Contains phone # of everyone (key is personID)
The goal to find matching phone for same person based on relationship type (If siblings, then only list one record for parent because the matching phones are not duplicates).
I have this 40,000,000 rows table... I am trying to clean this 'Contacts' table since I know there are a lot of duplicates.
At first, I wanted to get a count of how many there are.
I need to compare records where these fields are matched:
MATCHED: (email, firstname) but not MATCH: (lastname, phone, mobile). MATCHED: (email, firstname, mobile) But not MATCH: (lastname, phone) MATCHED: (email, firstname, lastname) But not MATCH: (phone, mobile)
We all were new at one point.... any help is appreciated.
Objective:
Combining two 49,000 row tables and remove records where there is only 1 column difference. (keeping the specified column value removing the one with a blank.)
Reason:
I have 2 people going through a list, coding a specific column with a single letter value. They both have different progress on each sheet. Hence I am trying to UNION them and have a result of their combined efforts without duplicates.
My progress/where I'm stuck:
Here is my first query/union:
SELECT * FROM [Eds table] UNION SELECT * FROM [Vickis table];
As shown above, I have unioned these 2 tables and my results removed th obvious whole record duplicates, but since 1 column is different on these, a union without criteria considers them unique.....
an example of duplicates that I must remove are as follows:
Hi, I have had this problem for a while and have not been able solve it.
What im looking at doing is looping thru my patient table and trying to organise the patients in to there admission sequence, so when patient "A" comes in and is treated at my hospital and is discharged and admitted to another Hospital within one day then patient "A" will get a code of 1 being there first admission.
then if patient "A" is admitted again but there admission date is greater than one day they get a code of 2 being for there second admission but will need to loop thru table looking for other admissions and discharges.
The table name is Adm_disc_Match_tbl
Basically what i have 4 fields. Index_key = which is the patient common link (text) ur_episode = this wil change for each Hospital (text) Admission_datetime = patient admission date and time (datetime) Discharge_datetime = patient discharge date and time (datetime)
I'm looking for a way of taking a query which returns a set of date time fields (probable maximum of 20 rows) and looping through each value to see if it exists in a separate table.
E.g.
Query 1
Select ID, Person, ProposedEvent, DayField, TimeField from MyOptions where person = 'me'
Table
Select Person, ExistingEvent, DayField, TimeField from MyTimetable where person ='me'
Loop through Query 1 and if it finds ANY matching Dayfield AND Timefield in Query/Table 2, return the ProposedEvent (just as a message, the loop could stop there), if no match a message saying all is fine can proceed to process form blah blah.
I'm essentially wanting somebody to select a bunch of events in a form, query 1 then finds all the days and times those events happen and check that none of them exist in the MyTimetable table.
I have the table below, and I am trying to retrieve TileIDs that have the same ModelIDs.
ModelID TileID HP DL380 G3 120v Dual15400 HP DL380 G3 120v Dual15400 HP DL380 G3 120v Dual15400 HP DL380 G3 120v Dual15400 HP DL380 G3 120v Dual15400 HP DL380 G3 120v Dual15400 Sun SF 280R 120v 15401 Sun SF 280R 120v 15401 Sun SF 280R 120v 15401 Sun SF 280R 120v 15401 Lantronix MSS4 15401
So TileID ‘15400’ would be a keeper, since all ModelIDs are the same.
I have a large table of customers. I would like to add a column that contains an integer, unique to that customer. The trick is that this file contains many duplicate customers, so I want the duplicates to all have the same number between them.the numbers dont have to be sequential or anything, just like customers having the same one.
Hi All, I am trying to find records when searchProductCount = 2 AND when searchProductCount < 2 BUT productID not in (select pid from TableB) ... I have the query below ... but is there any other better way to do this? TableB has IDs: 100, 700 ...etc eg: searchProduct ID is 50,100 -- means returns everything when we found ALL productID (50,100) from TableA, count =2 Select productName, productID from TableA where searchProductCount = 2 AND productID IN (50,100) union all -- when not all productID found in TableA, we only return productsID from TableA which ID found in TableB, count < 2 Select productName, productID from TableA where searchProductCount < 2 and productID IN (select pid from TableB) -- in this case, pid found in TableB from searchProductID will be 100 ------------------------------------------------------------------ it comes out there are duplicates results (when first query is valid, we union all second query, so we have duplicates records). How can we eliminate the duplicates? Or is there better way to acheive this without using union all?