I know of several methods to remove duplicate records but I recently encountered a unique situation where some duplicate records were actually acceptable.
Here is my situation:
I have a table that contains records of individuals who have children so 1 person can have 3 children with different birthdates; but there is also a field that has a specified language. The challenge arises when an individual may have only 1 child but has entered a single record twice. Once with a specifed language and again without which produces a default value of UNKNOWN.
I need to be able to remove this record without affecting records that may have a record entered twice as well; having 2 children for instance but they may also have a specified language in one record but a default value of UNKNOWN for their second record.
So I can't eliminate the unwanted duplicates by filtering out records that have UNKNOWN because I would also remove individuals that I need.
EX: firstN | lastN | address | lang | childs birthdate John Doe 210 Somewhere Ave ENG 1993-10-09 John Doe 210 Somewhere Ave UNK 1993-10-09 Jane Doe 210 Anywhere Ave ENG 1969-12-23 Jane Doe 210 Anywhere Ave UNK 1958-04-15
How could you remove the duplicate for John in this example without affecting Janes duplicate record which is actually ok because she apparently has 2 children with different birthdates whereas John's duplicate record is obviously created because it was entered twice; once without entering a language and the second time specifying the language?
I have tried a number of things short of creating a cursor which isn't really the best way to resolve this issue since there are millions of rows.
Anyone out there have any input that be helpful? Or has anyone ever had this similar issue? I would be interested in knowing how you addressed the problem.
I have the following script that selects tables from my database with the same column name and then I delete data that falls within a specified condition. However what I need to be able to do is just select these tables that meet the condition and then just delete the data because at the moment it's also returning tables that I don't need.
So I just want to use a cursor on a table list that meet the criteria:
1) have qid column name 2) qid >= 5000000 and qid < 1500000000 '
Example
declare @strqry varchar(1000)
declare dailyYear cursor for SELECT TABLE_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE 'qid' = COLUMN_NAME order by table_name asc open dailyYear fetch next from dailyYear into @DelTable
while @@FETCH_STATUS = 0
begin
Set @strqry = 'Delete from '+@DelTable+' where qid >= 5000000 and qid < 1500000000 '
I've two audit tables, AUDIT_ORDERS and AUDIT_ORDER_LINES.
The AUDIT_ORDERS has these columns: AUDIT_ID, ORDER_ID, AUDIT_DATE and other ones.
The AUDIT_ORDER_LINES has these columns: AUDIT_ID, ORDER_ID, ORDER_LINE_ID, AUDIT_DATE and other ones.
I need to join these two tables in order to select for each order line row the first order having the related audit date lower than or equal to the audit date of the related order line.
I don't want to use the TOP 1 clause or a subquery. I think to complete a such statement:
SELECT OL.Order_Line_ID, O.Order_ID, OL.Audit_Date, O.Audit_Date FROM AUDIT_ORDER_LINES as OL INNER JOIN AUDIT_ORDERS as O on OL.Order_ID = O.Order_ID and O.Audit_Date <= OL.Audit_Date ...
I'd like to get the first row of the Audit_Orders with audit_date <= of the audit_date of the Audit_Order_Lines table by using the join clause.
SELECT row_number() over (ORDER by a.employeeID) as rec_num, a.* FROM EmployeeA a UNION SELECT row_number() over (ORDER by a.employeeID) as rec_num, a.* FROM EmployeeB a
1 777 Mike HR 2 888 Susy HR 1 111 Smith TECH 2 222 John TECH 3 333 Lenny TECH
How do i get sequence number for all of this records. The rec_num reset for every statement. I want the records numbering for second statement continue from first statement so that it can be like this :
I know how to query up duplicate records, here is what I am using to do so:
Select PhoneNum, Count(PhoneNum) as NumOccurances From Tester Group By PhoneNum Having (Count(PhoneNum) > 1)
when the duplicates arise, I expect them to have unique CallResultCode values, and I would like to make a priority of which value stays and which one gets dumped, keep in mind that I am a SQL noob.
I want to write a insert statement but before insert statement run i need to check a condition like if the same record is already existing or not ? how do i do this using transact -SQL
I mean i have a table call Employee like this definition
ColumnName Datatype EmployeeID INT Name Varchar(255)
Records like this
EmployeeID Name 1 John E Mathew 2 Ethel Elizabeth
Ok Before i insert a new record , i need to check if a emplyee name call ="John E Mathew" already in the table or not if Employee Name call ='John E Mathew' not exisits only it should execute my insert statement .
I wanted to update Catrgory coulmn of all records in the Master table with the Value of LastUpdate of the CategoryTable the where the ID of the both the table are same
Table2 contains fields Group, Name,Category, Dimension (Group and Name are not in Table1)
So basically I need to read the records in Table1 using Groupid and each time there is a Groupid then select records from Table2 where Table2.Category in (Select Catergory from Table1) and Table2.Dimension in (Select Dimension from Table1)
In Table1 There might be 10 Groupid records all of which are different.
Here we need consider patient dates that fall between sdate and edate of the patientrefs table, and then we need to consider the highest status values in order (for example, the highest values in order - 2 is first highest, 4 is second highest, 3 is third highest, and 1 is fourth highest value)
If the date falls between multiple different sdate and edate with the same status values, then we need to consider the latest sdate value and from that entire record we need to extract that value.
Here, pn=2 values have dates which fall between sdate and edate of patientref table. Then we give highest values status is 2, and status 2 values have two records, then we go for max sdate(latest sdate). Then this pn=2 latest sdates is 2015-02-10 and we need to retrieve the corresponding edate and status values.
pn = 4donot have sdate and edate and status values dut not fall conditon
select p.pn,p.code,p.[date],p.doctorcode,pr.sdate,pr.edate,pr.[status] from patient p outer apply (select top 1 pr.pn,pr.code,pr.sdate,pr.edate,pr.[status] from patientref pr where pr.pn=p.pn and pr.code=p.code and p.date between pr.sdate and pr.edate order by case when pr.status=2 then 1 when pr.status=4 then 2 when pr.status=3 then 3 when pr.status=1 then 4 end ,pr.sdate )pr
but this query not given expected result.here when dos not fall between sdate and edate that records not given in the above query. I required that records also.if not fall b/w condition then we need retrive that records empty values for that records.
Could you please give me any advices on how to filter out the records through out the data flow by any particular condition? E.g. In my case, I want to filter out rows with null id (will get rid of those rows with null id which are not matched in the look up component)? Hope it is clear for your help and I am looking forward to hearing from you for your help and thank you very much.
Above has 6 files entries for client id 22784 and LOAN_SANCTION_DATE 2014-02-03 out of which 3 are rejected ..
Now , i want to write a query to select those distinct client_id , LOAN_SANCTION_DATE from Client_Master where all files has been rejected ..
means by grouping client ID and LOAN_SANCTION_DATE all the files are rejected ..
I have wrote as below .. got the result but not satisfy with the query
SELECT DISTINCT CLIENT_ID,LOAN_SANCTION_DATE,COUNT(FILE_ID) AS No_Of_Files ,COUNT(DISTINCT CASE WHEN IS_REJECT=1 THEN FILE_ID END )AS No_Of_Rejected FROM dbo.FILE_MASTER GROUP BY CLIENT_ID ,LOAN_SANCTION_DATE HAVING COUNT(FILE_ID)=COUNT(DISTINCT CASE WHEN IS_REJECT=1 THEN FILE_ID END )
I have a database which has a field called fldTimes. basically this field records the number of hits a file gets. How can I choose the most 5 popular files with the greatest hits. Thanks
This might be a simple question. I have a LIKE statement that is working fine, however I am not sure if something else is possible.
I can pull all records on a query for a person's name with a parameter value of "MARTIN". It will also pull records for "LYNN MARTIN". However, what if I would like to have that search also pull "LYNN M MARTIN"? Currently "LYNN MARTIN" is not finding "LYNN M MARTIN".
When the end user wants to search on LYNN MARTIN and that is what they input, I want SQL to find all records that match LYNN MARTIN and also find records that HAVE LYNN % MARTIN.
I hope this make sense? I guess I need to build my select statement using a WHERE LIKE statement, but I am not sure of the syntax.
Hi, I'm just wondering if someone can help me with some SQL syntax stuff.I want to take this sql statement: "SELECT TOP 50 tblProfile.chName, tblProfile.intCount FROM tblProfile, tblLinks WHERE (tblLinks.MemberID = tblProfile.MemberID) ORDER BY tblLinks.dtDateAdded DESC;" and select only unique "chName's" records
Hello Everyone and thanks for your help in advance. I have a SQL Server Table wtih approximately 100,000 records. I need to determine if there are duplicate records in this table. My probelm is that there is a unique ID column that was added for each row, so I'm not exactly sure how to filter the rows. Any help on this would be greatly appreciated. Thanks.
hi again, i just want to ask if how can i randomly select 5 distinct records from a table w/ a hundreds of records everytime i exec a stored procedure?? thanks
I have a reviews table where all reviews are submitted. On the main page I want to display the 10 most reviewed products. I have a Product_ID column in this table which identifys the product. How can i write a query which will select the product_ID of records which have the most frequent product_ID's?
I came up with something like this: "Select Top 10 Product_ID, COUNT(*) AS Occurances FROM reviews GROUP BY Product_ID ORDER BY occurances DESC"
But it does not work.?? It says "Declaration expected" as error
Hi, my sql is not too hot so i hope someone can help me. I need to select all the records from one table that do not exist in 2 other tables. I know it sounds simple enough but for some reason i can not get it working. It may have something to do with the fact that the field i am searching on are datetime fields. Here is a shortened version of my code.
SELECT DateOfStats FROM table1 WHERE (DateOfStats NOT IN (SELECT dateofstats FROM table2)) and (DateOfStats NOT IN (SELECT dateofstats FROM table3))
i need to select records from table "A" if only the "PK" of "A" exists in table "B". I need to return a resultset not just a single record. The problem is table "B" is not a table in database instead a user supplied table which can be a datatable in memory.
What I need is the start and end time of each task, but the issue is there is no unique task number to bind them together.. So for instance the task starts with 'Open-Submitted' and ends with 'Task Approved'. The issue is there can be multiple occurrences in the same file number. I need to be able to split these into multiple tasks with the associated start and stop times.
File IDDatetimesTask Event StatusTask Event NameTask IDEvent ID File 16/3/13 16:33Open-SubmittedTask is retrievedTSK-12345612345 File 16/3/13 16:44Open-ApprovedTask ApprovedTSK-12345623456 File 16/20/13 18:11Open-SubmittedTask is retrievedTSK-12345634567 File 16/21/13 14:42Open-ApprovedTask ApprovedTSK-12345645678
I am trying to create a select query similar to the following but the problem I am having is that I want to only select one record where there may be several with the same dw_order_no. I have tried various ways using SQL developer but without success
SELECT VE_EZP_ORDER_TRANS.EZP_BILL_STATUS AS EZP_BILL_STATUS1, VE_EZP_AGED_CUSTOMER_DEBT.SURNAME, VE_EZP_AGED_CUSTOMER_DEBT.DEBT_AGE_CATEGORY, VE_EZP_AGED_CUSTOMER_DEBT.DEBT_AGE, VE_ORDERLINE.DW_ORDER_NO,
Is there a way to see a list of duplicate records?? EG There is a field named "Invoice" in a table named "Orders" and I want to see only records where the same invoice shows more than once.
Good Morning,I have a view that contains rate information, contractIDs, and effectivedates.I need to select the rate info based on contractID and date.I can provide a date and contractID, and I need to select the rate info forthat contract where the effective date is <= dateprovided.I need the 1 record that is closest to that date.I am thinking something with max() perhaps. Any ideas? The <= effectivedate will return several rows, I just need the one closest to the date Iprovide.Thanks for any advice,CK
I have a table contains information related to sales:
SO number Order Date Customer SellingPerson 1001 2012/07/02 ABC Andy 1002 2012/07/02 XYZ Alan 1003 2012/07/02 EFG Almelia 1004 2012/07/02 ABC John 1005 2012/07/02 XYZ Oliver 1006 2012/07/02 HIJ Dorthy 1007 2012/07/02 KLM Andy 1008 2012/07/02 NOP Rowan 1009 2012/07/02 QRS David 1010 2012/07/02 ABC Joey
Now, i want to write a query using CTE that gives me first five distinct customer in result set:
SO number Order Date Customer SellingPerson 1001 2012/07/02 ABC Andy 1002 2012/07/02 XYZ Alan 1003 2012/07/02 EFG Almelia 1006 2012/07/02 HIJ Dorthy 1007 2012/07/02 KLM Andy
I wrote this query :
With t(so_number,order date,customer, SellingPerson) as (select top 5 so_number,order date,customer, SellingPerson from t) select distinct billingcontactperson from t order by so_id
And getting this error:
Msg 252, Level 16, State 1, Line 1 Recursive common table expression 't' does not contain a top-level UNION ALL operator.