SQL 2012 :: Multiple Joining Tables - Duplicate Records
Jul 14, 2014
I have tried joining several tables and the result displays duplicate rows of virtually every line/row. I have tried using distinct but this didn't work. I know it could because there's several columns from some of the tables named the same.
We have the below query which is pulling in Sales and Revenue information. Since the sale is recorded in just one month and the revenue is recorded each month, we need to have the results of this query to only list the Sales amount once, but still have all the other revenue amounts listed for each month. In this example, the sale is record in year 2014 and month 10, but there are revenues in every month as well for the rest of 2014 and the start of 2015 but we only want to the sales amount to appear once on this results set.
I would like to know if it's possible to return a single record by joining the tables below. [Persons] PersonID [int] | PageViewed [int] =============== ================= 1 10 2 5 3 2 4 12
[PersonNames] - PersonID JOINS Persons.PersonID PersonID [int] | NameID [int] | PersonName [nvarchar] | PopularVotes [int] =============== ============== ======================= =================== 1 1 Samantha Brown 5 1 2 Samantha Green 10 2 3 Richard T 10 3 4 Riko T 0 4 5 Sammie H 0
[AltNames] - backup for searches caused by common spelling mistakes AltNameID [int] | AltNames [nvarchar] ================ ============================= 1 Sam, Samantha, Sammie, Sammy 2 Riko, Rico
[PersonAllNames] - JOINS [PersonNames.NameID] ON [AltNames.AltNameID] NameID [int] | AltNameID [int] ============= ================ 1 1 4 1 3 2 This is ideally what I'd like to have returned: PersonID | PageViewed | MostPopularName | NameSearch ========= ============ ================= ================= 1 10 Samantha Green Samantha Brown, Samantha Green, Sam, Samantha, Sammie, Sammy 2 5 Richard T Richard T 3 2 Riko T Riko T, Riko, Rico 4 12 Sammie H Sammie H, Sam, Samantha, Sammie, Sammy
[MostPopularName] is [PersonNames.PopularVotes DESC].[NameSearch] combines all records from [PersonNames.PersonName] and [AltNames.AltNames].
The purpose for this is that I'd like to cache the results table so that all searches can just perform a lookup against the NameSearch field. Any help would be greatly appreciated. Thanks, Pete.
OrderID ControlName 1 Row1COlumn1 (It Means Pant in Red Color is selected by user(relation with Child2 Table)) 1 Row3Column1 (It Means Gown in Blue Color is selected by user(relation with Child2 Table)) 1 Row4Column3 (It Means T Shirt in White Color is selected by user(relation with Child2 Table)) 2 Row1Column2 (It Means Tie in Green Color is selected by user(relation with Child2 Table)) 2 Row3Column1 (It Means Bow in Red Color is selected by user(relation with Child2 Table))
Child2 Table
PackageID Product Color1 Color2 Color3 1 Pant Red Green Blue 1 Shirt Blue Pink Purple 1 Gown Blue Black Yellow 1 T Shirt Red Green White 2 Tie Red Green White 2 Socks Red Green White 2 Bow Red Green White
We want to have result like
OrderID PackageID CustomerName Pant Gown T Shirt Tie Bow
1 1 ABC Red Blue White x x Blue 2 2 Bcd x x x Green Red
I have tried
;with mycte as ( select ms.OrderID,ms.PackageID ,ms.CustomerName , Replace(stuff([ControlName], charindex('Column',ControlName),len(ControlName),''),'Row','') rowNum ,Replace(stuff([ControlName], 1, charindex('Column',ControlName)-1 ,''),'Column','') columnNum From child1 c inner join MasterTable ms on c.Orderid=ms.orderid)
[code]....
it works if we have a product in one color only. like if we have pant in red and blue then its showing just first record
writing the query for the following, I need to collapse the continuity. If the termdate for an ID is one day less than the effdate of the next id (for the same ID) i need to collapse the records. See below example .....how should i write the query which will give me the desired output. i.e., get min(effdate) and max(termdate) if termdate is one day less than the effdate of next record.
PhoneType is an auxiliary table that has 5 records in it Home phone, Cell phone, Work phone, Pager, and Fax. Is there a way to do a join or maybe make a view of a view that would allow me to ultimately end up with…
StudnetID: 1 Name: John HomePhone: 123-456-7890 WorkPhone: 123-456-7890 CellPhone: Pager: 123-456-7890 Fax: Memo: This is one student record.
Some students will have no phone number, some will have all 5 most will have one or two. If possible I would like to do a setup like this in my database to keep from having to have null fields for 4 phone numbers that the majority of records won’t have. Thanks in advanced, Nathan Rover
Hi, This seems like a basic problem but I can't figure out how to resolve it.
I have a query :
SELECT PR.WBS2, SUM(LedgerAR.Amount * - 1) AS Expr5, LB.AmtBud AS budget FROM PR LEFT OUTER JOIN LedgerAR ON PR.WBS1 = LedgerAR.WBS1 AND PR.WBS2 = LedgerAR.WBS2 AND LedgerAR.WBS3 = PR.WBS3 LEFT OUTER JOIN LB ON LB.WBS1 = PR.WBS1 AND LB.WBS2 = PR.WBS2 AND PR.WBS3 = LB.WBS3 WHERE (PR.WBS2 <> '9001') AND (PR.WBS2 <> 'zzz') AND (PR.WBS2 <> '98') AND (PR.WBS3 <> 'zzz') AND (PR.WBS2 <> '') AND (PR.WBS1 = '001-298') GROUP BY PR.WBS2, LB.AmtBud ORDER BY PR.WBS2
I want to sum up the middle column and last column grouping by wbs2. However, when I do SUM(lb.amtbud) the budget column is not summing correctly it is summing the column as if the data appeared like this:
OrderID ControlName 1 Row1COlumn1 (It Means Pant in Red Color is selected by user(relation with Child2 Table)) 1 Row3Column1 (It Means Gown in Blue Color is selected by user(relation with Child2 Table)) 1 Row4Column3 (It Means T Shirt in White Color is selected by user(relation with Child2 Table)) 2 Row1Column2 (It Means Tie in Green Color is selected by user(relation with Child2 Table)) 2 Row3Column1 (It Means Bow in Red Color is selected by user(relation with Child2 Table))
Child2 Table
PackageID Product Color1 Color2 Color3 1 Pant Red Green Blue 1 Shirt Blue Pink Purple 1 Gown Blue Black Yellow 1 T Shirt Red Green White 2 Tie Red Green White 2 Socks Red Green White 2 Bow Red Green White
We want to have result like
OrderID PackageID CustomerName Pant Gown T Shirt Tie Bow
I have two tables a and b, where I want to add columns from b to a with a criteria. The columns will be added by month criteria. There is a column in b table called stat_month which ranges from 1 (Jan) to 12 (Dec). I want to keep all the records in a, and join columns from b for each month. I do not want to loose any row from a if there is no data for that row in b.
I do not know how to have the multiple joins for 12 different months and what join I have to use. I used left join but still I am loosing not all but few rows in a, I would also like to know how in one script I can columns separately from stat_mont =’01’ to stat_month =’12’
/****** Script for SelectTopNRows command from SSMS ******/ SELECT a.[naics] ,a.[ust_code] ,a.[port] ,a.[all_qty_1_yr] ,a.[all_qty_2_yr]
[Code] ....
output should have all columns from a and join columns from b when the months = '01' (for Jan) , '02' (for FEB), ...'12' (for Dec): Output table should be something like
* columns from a AND JAN_Cum_qty_1_mo JAN_Cum_qty_2_mo JAN_Cum_all_val_mo JAN_Cum_air_val_mo JAN_Cum_air_wgt_mo JAN_Cum_ves_val_mo FEB_Cum_qty_1_mo FEB_Cum_qty_2_mo FEB_Cum_all_val_mo FEB_Cum_air_val_mo FEB_Cum_air_wgt_mo FEB_Cum_ves_val_mo .....DEC_Cum_qty_1_mo DEC_Cum_qty_2_mo DEC_Cum_all_val_mo DEC_Cum_air_val_mo DEC_Cum_air_wgt_mo DEC_Cum_ves_val_mo (FROM TABLE b)
I have a straight-forward select query to show work orders for a particular customer as below. I want to add a field value from another table, deltickitem diwhich contains contract records. I need to include the field di.weekchg to show the weekly hire rate, but the joined query must ensure that the both the contract number matches that in the original select and that the item number matches that in the actual select. Additionally, there is the problem that the item can appear more than once in the deltickitem table against a particular contract (if item has been off-hired and then re-hired on the same contract number) - in this case the query must select the record with the highest di.counter number, which I haven't worked out how to put in my query.
This is my basic code, but I keep ending up with duplicate work order lines in my result set.
Select wh.worknumber, wh.custnum, wh.contract, wh.sitename, wh.itemcode, wh.regnum, m.name, di.weekchg, wh.date_created, wh.task_descr, wh.actual_labour_sale+wh.actual_parts_sale as [Repair Cost] From worksorderhdr wh Left Join inventory iv On iv.item = wh.itemcode inner Join models m On m.id = iv.model_id left join deltickitem di on di.dticket = wh.contract where wh.custnum = 'BARRATNE' and wh.rejected <> 1 and wh.charge_to_cust = 1 order by wh.date_created
This shows a banking history of transactions and includes a field called TransType and a field called PaymentID.
I also have two other tables called Suppliers and SubContractors.
For each record in the bank, I need to match up a record in either the suppliers or subcontractors tbl based on the PaymentID value. I know if the record relates to either a Supplier or Subcontractor based on the value of the TransType field which will be either SUPPLIER or SUBCONTRACTOR or Null (in which case a match doesn't matter)
I have a working query based on joining just the Supplier tbl.. but how do I do the join to the other tbl aswell?
So overall, for each record in the bank, if the transtype is SUPPLIER I need to look in the supplier tbl for a match for that paymentID, and if the transtype is SUBCONTRACTOR, I need to do the same but in SUBCONTRACTOR tbl.
I have a question regarding duplicate records, the thing is I'm able to query for duplicated records if I type the following:
select ColumnName from TableName where ColumnName in ( select ColumnName from TableName group by ColumnName having count(*) > 1 )
That gives me duplicate records for one column, but I need find duplicate records in more than one column (4 columns to be exact), but the way I need to find these records is they all have to be duplicate, what I'm trying to say is I don't don't want to find the following:
First Last Age Email John Smith 25 jsmith@hotmail.com John Smith 26 jsmith@hotmail.com John Smith 25 jsmith4@hotmail.com
I need to find the following:
First Last Age Email John Smith 25 jsmith@hotmail.com John Smith 25 jsmith@hotmail.com John Smith 25 jsmith@hotmail.com
So all the columns must be exactly the same, that's the only condition I want to show the records, is there any way to do this?
For the record, I'm using MS SQL Server 2000, thank you.
I think I got all my create table statements are correct.
I need to Find the number of agents for each supplier that has at least one agent. The result should be tuples of the form (sid, sName, number of agents)
-Select Sid, sName, count(Aid) from Agent A join Supplier S on (S.Sid = A.Sid) group by S.Sid, S.sName, Aid; But it gives me this error: no such column: A.Sid
Im thinking I might have a problem with my create table statement and/or primary key statements?
I had a problem before of not been able to find the rows with 0 values. I've now managed this although it's brought up duplicate rows due to the discounts been different on the same Mfr_part_number. I tried using the max function on the isnull (Exhibit_Discount.Discount, 0.00) AS Discount instead but to no success. i think i maybe something to do with PK keys not been used in the set-up of the database.
Use Sales_Builder Go SELECT DISTINCT GBPriceList.[Mfr_Part_Num],
Have a table that is used during a conversion. It is hit pretty heavily with inserts by mulitple session ids. For performance reasons I cannot add a unique constraint on the column that is getting duplicates (it is an encrypted cell in varbinary).
I do not want duplicates in this Encrypted Column. So before each insert the insert programs reads the table and verifies that the Encrypted Column value does not already exist. But unfortunately several duplicates are falling through the cracks. What are my options for not allowing duplicates?
I managed to find the 'Deleting Duplicate Records' from SQLTeam.com (thanks, by the way!!).. I managed to modify it for one of my tables (one of 14).
-- Add a new column
Alter table dbo.tblMyDocsSize add NewPK int NULL go
-- populate the new Primary Key declare @intCounter int set @intCounter = 0 update dbo.tblMyDocsSize SET @intCounter = NewPK = @intCounter + 1
-- ID the records to delete and get one primary key value also -- We'll delete all but this primary key select strComputer, strATUUser, RecCount=count(*), PktoKeep = max(NewPK) into #dupes from dbo.tblMyDocsSize group by strComputer, strATUUser having count(*) > 1 order by count(*) desc, strComputer, strATUUser
-- delete dupes except one Primary key for each dup record deletedbo.tblMyDocsSize fromdbo.tblMyDocsSize a join #dupes d ond.strComputer = a.strComputer andd.strATUUser = a.strATUUser wherea.NewPK not in (select PKtoKeep from #dupes)
-- remove the NewPK column ALTER TABLE dbo.tblMyDocsSize DROP COLUMN NewPK go
drop table #dupes
Now that I've got that figured out, I need to write the same thing to fix the other 13 tables (with different column info)- and I'll need to run this daily.
Basically I've put together some vbscript that gathers inventory data and drops it into an MSDE db (sorry - goin for 'free' stuff right now). Problem is it has to run daily so that I'm sure to capture computers that turned on at different times etc which ever-increases my database 'till I bounce off the 2GB limit of MSDE.
So the question is, what would be the best way to do this? Can I put the code into a stored procedure that I can execute each day?
Hi All,I've the following table with a PK defined on an IDENTITY column(INSERT_SEQ):CREATE TABLE MYDATA (MID NUMERIC(19,0) NOT NULL,MYVALUE FLOAT NOT NULL,TIMEKEY INTEGER NOT NULL,TIMEKEY_DTTM DATETIME NULL,IID NUMERIC(19,0) NOT NULL,EID NUMERIC(19,0) NOT NULL,INSERT_SEQ NUMERIC(19,0) IDENTITY(1,1) NOT NULL)GOALTER TABLE MYDATAADD CONSTRAINT PK_MYDATAPRIMARY KEY (INSERT_SEQ)GOThe TIMEKEY_DTTM field is generated, from the value actually insertedinto theTIMEKEY field, by the following trigger:CREATE TRIGGER TIMEKEY1ON MYDATAFOR INSERT ASBEGINDECLARE @M_TIMEKEY_DTTM DATETIMESELECT @M_TIMEKEY_DTTM = DATEADD(SECOND, INS.TIMEKEY +EP.GMT_OFFSET * 0 ,'1970-01-01 00:00:00.000')FROM INSERTED INS, LOCATIONINFO EPWHERE INS.EID = EP.EIDUPDATE MYDATASET TIMEKEY_DTTM = @M_TIMEKEY_DTTMFROM INSERTED INS, MYDATA MDWHERE MD.INSERT_SEQ = INS.INSERT_SEQENDGOThere is also a composite, non unique, index defined on thetuple:(MID,IID,TIMEKEY,EID)CREATE INDEX IX_METDATA ON MYDATA (MID,IID,TIMEKEY,EID)GOAs a consequence of an application design change, I would also changethis index to be UNIQUE, but when I try to drop and create it I get anerror, because the tables stores some duplicated rows...In order to succesfully upgrade the index definition, I wrote some DMLstaementsto lookup and remove the duplicated rows, keeping only the firstrecord inserted, i.e. the one with the lowest INSERT_SEQ:---- This table stores then umber of duplicated records eventuallydiscovered-- into the MYDATA table; the initial value for the NUM_DUPLICATESfield is-- 0 (no duplicated record)--DROP TABLE DUPLICATESGOCREATE TABLE DUPLICATES (TABLENAME VARCHAR(17),NUM_DUPLICATES NUMERIC(19,0) )GOINSERT INTO DUPLICATES VALUES ('MYDATA',0)GOINSERT INTO DUPLICATES VALUES ('CATEGORIESDATA',0)GO---- ///////// CLEAN UP OF MYDATA TABLE--DROP TABLE TMP_MYDATAGOCREATE TABLE TMP_MYDATA (MID NUMERIC(19,0) NOT NULL,TIMEKEY INTEGER NOT NULL,IID NUMERIC(19,0) NOT NULL,EID NUMERIC(19,0) NOT NULL,INSERT_SEQ NUMERIC(19,0) )GO---- Insert into the TMP_MYDATA table all the duplicated records for-- the tuple (MID,IID,TIMEKEY,EID) and NULL for the INSERT_SEQ field--INSERT INTO TMP_MYDATA (MID,IID,TIMEKEY,EID)SELECT MID,IID,TIMEKEY,EIDFROM MYDATAGROUP BY MID,IID,TIMEKEY,EIDHAVING COUNT(*)>1GO---- Updates the INSERT_SEQ field to the lowest value in the group-- of duplicated records--UPDATE TMP_MYDATASET TMP_MYDATA.INSERT_SEQ = (SELECT MIN(INSERT_SEQ)FROM MYDATAWHERE TMP_MYDATA.MID = MYDATA.MID ANDTMP_MYDATA.IID = MYDATA.IID ANDTMP_MYDATA.TIMEKEY = MYDATA.TIMEKEY ANDTMP_MYDATA.EID = MYDATA.EID )GO---- Updates the value of NUM_DUPLICATES for the MYDATA table.--UPDATE DUPLICATESSET NUM_DUPLICATES = (SELECT COUNT(*) FROM TMP_MYDATA)WHERE TABLENAME = 'MYDATA'GO---- Delete from the MYDATA table all the duplicated records,-- keeping only the row with the lowest INSERT_SEQ-- The delete is performed only if there are duplicated recors;-- this is achieved using a "short circuit" AND on the number ofrecords-- stored into the NUM_DUPLICATES field of the DUPLICATES table for-- the MYDATA table...--DELETE FROM MYDATAWHERE ( SELECT NUM_DUPLICATES FROM DUPLICATES WHERE TABLENAME ='MYDATA') > 0 ANDEXISTS ( SELECT 1FROM TMP_MYDATAWHERE MYDATA.MID = TMP_MYDATA.MID ANDMYDATA.IID = TMP_MYDATA.IID ANDMYDATA.TIMEKEY = TMP_MYDATA.TIMEKEY ANDMYDATA.EID = TMP_MYDATA.EID ANDMYDATA.INSERT_SEQ > TMP_MYDATA.INSERT_SEQ )GOThis tecnique works fine on a normal table (1M recs) but is not veryperformanton huge tables (>10M records)!Do you know a better way to achieve the task of removing all theduplicates records, preserving the lowest INSERT_SEQ betwee theduplicates and also preserving the sequence seed, so that a new recordinserted at time t1>t0 is enumerated with an INSERT_SEQ|t1 >max(INSERT_SEQ)|t0 ?Thanks a lot for your help!PatrizioPS. sorry for such a large post!
I've inherited a table of members that has the following structure:
CREATE TABLE [dbo].[dimMember]( [dimMemberId] [int] IDENTITY(1,1) NOT NULL, [dimSourceSystemId] [int] NOT NULL CONSTRAINT [DF_dimMember_dimSourceSystemId] DEFAULT ((-1)), [MemberCode] [nvarchar](50) NOT NULL, [FirstName] [nvarchar](250) NOT NULL, [LastName] [nvarchar](250) NOT NULL,
[Code] ....
Based on the way the data loads into the table there's a possibility of some records being near duplicates of each other. For example, we can have a member that has records that have the same first name, last name, SSN, but different addresses, membercodes, subscribercode etc... This can happen in pretty much any variation thereof.
What I want to do, is add a new column and use that to group the similar records under based on comparing on several columns. By this I mean, if a member matches 4 of the 7 values below with another member, we would group these:
First Name (1st 3 characters) Last Name DOB CurrentAddress1 MemberCode SSN SubscriberCode
I'm at a loss of how to structure the SQL to update the new column in the table.
Im trying to delete duplicate records from the output of the query below, if they also meet certain conditions ie 'different address type' then I would merge the records. From the following query how do I go about achieving one and/or the other from either the output, or as an extension of the query itself?
I am getting output two times but my table has single data then how to remove duplicate my SQL query is:
select nonregister.email as firstname,nonregister.email ,appliedjobstatuswithresumes.description,appliedjobstatuswithresumes.applieddate from nonregister , appliedjobstatuswithresumes where appliedjobstatuswithresumes.nonregid IN (select nonregid from nonregister where nonregid IN (select nonregid from [appliedjobstatuswithresumes]
I have searched in length and cant seem to find a specific answer.I have a tmptable to hold user "shoppingcart" ( internal supplies)What i want is to take when the user clicks order to take that table pull the records with that user and populate the order and order details tablesThe order table has a PK of orderID and the orderdetails has a FK of orderIDI know how to insert to the main table, but dont know how to populate the details at the same timeI have this.Insert into supplyordersselect requestor from tmpordercart where requestor = &name so how do i also take from the tmpordercart the itemno and quanity and put them into the orderdetails so that it links back to order table?
I have two tables that can be created with sample data using the DDL at the bottom of this post. What I'm looking to do is update the QtyReceived column in tblPurchaseOrderLineDetail from the Qty column in tblReceivedItems. However, the tricky part that I can't figure out is splitting these quantities out over multiple lines. I should only be allowed to receive up to the QtyOrdered column in tblPurchaseOrderLineDetail.
For a specific example from the sample data we'll look at PurchaseOrderDetailID 28526. From the tblReceivedItems, there are three records with quantities of 48, 48, and 20. From the tblPurchaseOrderLineDetail there are three records of QtyOrdered of 55, 45, and 20. What I would like to happen is fulfill the records in the tblPurchaseOrderLineDetail sequentially (essentially in order of ExpectedDate). So, the QtyReceived would be 55, 45, and 16 for the corresponding records. If there is already a quantity in the QtyReceived column, but it's less than the QtyOrdered column, the quantity needs to be added to the column (not overwritten).
DDL To CREATE Sample Tables and Data:
CREATE TABLE [dbo].[tblReceivedItems]( [ID] [int] IDENTITY(1,1) NOT NULL, [PurchaseOrderDetailID] [int] NULL, [Qty] [int] NULL) SET IDENTITY_INSERT [dbo].[tblReceivedItems] ON INSERT [dbo].[tblReceivedItems] ([ID], [PurchaseOrderDetailID], [Qty]) VALUES (1, 28191, 48)
Using SSE 2012 64-bit.I need to insert records from multiple Access Tables into 1 Table in SSE and ensure no duplicates are inserted.This is executing, but is very slow, is there a faster way?
Code: INSERT INTO dbTarget.dbo.tblTarget (All fields) SELECT (All Fields) FROM dbSource.dbo.tblSource WHERE RecordID NOT IN (SELECT RecordID FROM dbTarget.dbo.tblTarget)
I'm new to relational database concepts and designs, but what i've learned so far has been helpful. I now know how to select certain records from multiple tables using joins, etc. Now I need info on how to do complete deletes. I've tried reading articles on cascading deletes, but the people writing them are so verbose that they are confusing to understand for a beginner. I hope someone could help me with this problem.
I have sql server 2005. I use visual studio 2005. In the database I've created the following tables(with their column names):
Table 1: Classes --Columns: ClassID, ClassName
Table 2: Roster--Columns: ClassID, StudentID, Student Name
What I can't seem to figure out is how can I delete a class (ClassID) from Classes and as a result of this one deletion, delete all students in the Roster table associated with that class, delete all assignments associated with that class, delete all scores associated with all assignments associated with that class in one DELETE sql statement.
What I tried to do in sql server management studio is set the ClassID in Classes as a primary key, then set foreign keys to the other three tables. However, also set AssignmentID in Table 4 as a foreign key to Table 3.
The stored procedure I created was
DELETE FROM Classes WHERE ClassID=@classid
I thought, since I established ClassID as a primary key in Classes, that by deleting it, it would also delete all other rows in the foreign tables that have the same value in their ClassID columns. But I get errors when I run the query. The error said:
The DELETE statement conflicted with the REFERENCE constraint "FK_Roster_Classes1". The conflict occurred in database "database", table "dbo.Roster", column 'ClassID'. The statement has been terminated.
What are reference constraints? What are they talking about? Plus is the query correct? If not, how would I go about solving my problem. Would I have to do joins while deleting?
I thought I was doing a cascade delete. The articles I read kept insisting that cascade deletes are deletes where if you delete a record from a parent table, then the rows in the child table will also be deleted, but I get the error.
Did I approach this right? If not, please show me how, and please, please explain it like I'm a four year old.
Further, is there something else I need to do besides assigning primary keys and foreign keys?