I've a dtsx package which runs nightly to do following:
1. select data from a SQL replicated table
2. do some lookups (Lookup, Derived Column, Multicast, Conditional Split, etc.)
3. insert into another SQL table on another server using "Table or view - fast load", rows per batch = 10000, maximum insert commit size = 10000, and "redirect row" on error output on destination to an error log text file.
Once in a while, I found duplicate records in the error log; these rows cannot be inserted into destination table due to primary constraint. For example, transaction_id=111000 appears twice in the error log but it is a unique key in the source table.
My questions:
1. What could be the cause of duplicating rows during ETL in SSIS? I've asked this before and have spent so much time research but still could not find the reason. This link is from my previous post:
2. For a daily extract data with over millions of rows, what would be best to set rows per batch, maximum insert commit size, etc? I've read some posts on this forum and decide to use 10000 for both, but once in a while there's just one duplicate rows that causes the whole batch of 10000 rows not committed.
I have one ssis package moving the data from staging to destination. In stating table we have the duplicate data. But in destination table 4 columns have primary key. How to handle the duplicate records in oldedb source.
When I try to debug the break points will always say the source code is different from the current version, but the custom component in the GAC has the new version number. The other strange thing is the toolbox will not reset to the original version meaning it will not remove the custom components. The funny thing is after I compile the custom components and restart VS the custom component runs with the new code changes. I can see the new features I added, but the debugger and toolbox still seem to be broken.
I have tried the following 1) Reset the tool box. 2) uninstall all my custom dll from the GAC €śC:WINDOWSassembly€? 3) remove all my custom dll from €śC:Program FilesMicrosoft SQL Server90DTSPipelineComponents€? 4) restart VS 2005 5) reselect the custom components. 6) reboot my computer.
It seem like VS has another cache. For the tool box or something.
Yes, I know this subject has been exhausted, but I need help in locating the discussion which took place a few months ago. Sharon relayed to the group a piece of software (expensive) which would help in my particular situation. I grabbed a demo and have gotten the approval for purchase. Unfortunately, I don't have the thread with me at work.
The problem:
Number Fname Lname Age ID 123 John Franklin 43 1 123 Jane Franklin 40 2 123 Jeff Franklin 12 3 124 Jean Simmons 39 4 125 Gary Bender 37 5 126 Fred Johnson 29 6 126 Fred Johnson 39 7 127 Gene Simmons 47 8
The idea would be to get only unique records from the Number column. I don't care about which information I grab from the other columns, but I must have those fields included. If my resultant result set looked as follows, that would be fine. Or any other way, as long as all of the fields had information and there were only unique values in the Number field.
Number Fname Lname Age ID 123 Jeff Franklin 12 3 124 Jean Simmons 39 4 125 Gary Bender 37 5 126 Fred Johnson 39 7 127 Gene Simmons 47 8
If anyone remembers this discussion, mainly the date, I would really appreciate it.
I have two tables, one contains all work orders, the second contains records on work orders that are linked to customoer orders. I'm trying to create a query that will return specific fields from the table that contains orders in the linked order table, and only the work orders in the all order table that (work_order) do not exist in the linked order table (demand_supply_link). I have tried several queries and cannot get the results I desire. Here is the query I am currently trying.
SELECT DISTINCT WORK_ORDER.DESIRED_WANT_DATE as 'Want Date', DEMAND_SUPPLY_LINK.SUPPLY_BASE_ID as 'WO Id', WORK_ORDER.DESIRED_QTY as 'End Qty', DEMAND_SUPPLY_LINK.SUPPLY_PART_ID as 'Part Id', CUST_ORDER_LINE.CUSTOMER_PART_ID as 'Cust Part', OPERATION.RESOURCE_ID as Resource, PART.DESCRIPTION as Description, CUSTOMER.NAME as Name FROM ((((DEMAND_SUPPLY_LINK INNER JOIN CUST_ORDER_LINE ON DEMAND_SUPPLY_LINK.DEMAND_BASE_ID = CUST_ORDER_LINE.CUST_ORDER_ID) INNER JOIN WORK_ORDER ON DEMAND_SUPPLY_LINK.SUPPLY_BASE_ID = WORK_ORDER.BASE_ID) INNER JOIN OPERATION ON WORK_ORDER.BASE_ID = OPERATION.WORKORDER_BASE_ID) INNER JOIN PART ON WORK_ORDER.PART_ID = PART.ID) INNER JOIN (CUSTOMER INNER JOIN CUSTOMER_ORDER ON CUSTOMER.ID = CUSTOMER_ORDER.CUSTOMER_ID) ON CUST_ORDER_LINE.CUST_ORDER_ID = CUSTOMER_ORDER.ID WHERE WORK_ORDER.DESIRED_WANT_DATE Is Not Null AND OPERATION.RESOURCE_ID in ('ASSY','FAB 1','PLAY TRK') AND WORK_ORDER.STATUS='R'
UNION SELECT distinct work_order.desired_want_date as 'Want Date', work_order.BASE_id as 'WO Id', work_order.desired_qty as 'End Qty', work_order.part_id as 'Part Id', operation.resource_id as Resource, part.description as Description FROM WORK_ORDER INNER JOIN PART ON PART_ID=WORK_ORDER.PART_ID INNER JOIN OPERATION ON WORK_ORDER.BASE_ID=OPERATION.WORKORDER_BASE_ID WHERE WORK_ORDER.DESIRED_WANT_DATE IS NOT NULL AND OPERATION.RESOURCE_ID IN ('ASSY','FAB 1', 'PLAY TRK') AND WORK_ORDER.STATUS='R'
This is the error I receive: Server: Msg 205, Level 16, State 1, Line 1 All queries in an SQL statement containing a UNION operator must have an equal number of expressions in their target lists.
The all orders table (work_order) will not have the other fields to link to as there is no customer order linked to them.
Can someone tell me the best procedure when trying to find duplicate records within a table(s)?
I'm new using SQL server and I have been informed that there maybe some DUPS within unknown tables. I need to find these DUPS.
If someone can tell me how to perform this procedure I would apprciate it. And if you reply can also include examples that i could follow for my records.
Table1 has shop# and shop_id. Every shop# should have only one shop_ID. There has been a few data entry errors where a shop# has duplicate a shop_id. How to write a query for shop#s that have more than one shop_id?
Not so sure how simple this question is but here is what happened. I installed SQL Server 2005 on a new Win Server 2003. I exported the tables and their data from the old machine to the newly established database on the new machine.
It looks like all my records were duplicated. When I try to delete one of the duplicates it won't work because both rows are effected. I can't set my primary key now and if I try to create a new database with the primary key already set than the import fails.
Any one run into this before or know what's going on?
Hi,I have written a web application using dreamweaver MX, asp.net, and MSsql server 2005.The problem I am having occurs when I attempt to edit a record. I have setup a datagrid with freeform fields so that the user can click on edit, make the required changes within the data grid then click update. The data is then saved to the database. All this was created using dreameaver and most of the code was automatically generated for me.The problem is that, not everytime, but sometimes when I go to edit a record once I hit the update button to save the changes the record is duplicated 1 or more times. This doesnt happen everytime but when it does it duplicates the record between 1 and about 5 times. I have double checked everything but cannot find anything obvious that may be causing this issue. Does anyone have any suggestions as to what I should look for? Is this a coding error or something wrong with MSsql? Any ideas?Thanks in advance-Mitch
hi all, How do i avoid duplicate records on my database? i have 4 textboxes that collect user information and this information is saved in the database. when a user fills the textboxes and clicks the submit button, i want to check through the database if the exact records exist in the database before the data is saved. if the user is registered on the database, he wont be allowed to login. how can i acheive this? i thought of using the comparevalidator but i'm not sure how to proceed. thanks
How do i remove duplicate records from a table with a single query without using cursors or anything like that.Sample :tempCol11221P.S The table has only one column
I use a tabel for storin log data from a mail server. I noticed that I'm getting duplicate records, is there a way to delete the socond and/or third entry so I dont have any duplicates?
I am trying to fetch data from 2 tables, say TABLE1 and TABLE2, both of which got columns like id and num. Then i want all the rows from TABLE1 where id1=id2 and num1 != num2. but it is showing all the rows for an id1 twice, if there are two records in TABLE2 with same id and num. is there any way to filter those records without using the distinct keyword.
Can anyone help me to write a query to show customers who have duplicate accounts with Email address, first name, and last name. this is the table structure is Customer table
customerid(PK) accountno fname lname
Records will be
like this
customerid accountno fname lastname 1 2 lori taylor 2 2 lori taylor 3 1 randy dave
Userid is auto number, lastname and emailaddress are PK.
I want to delete duplicate records. If lastname and emailaddress are the same, only keep a record which createdate is the most newest date. See above example I only want to the record which userid is 3. I have alreday created a code which I attached below. This code onle keep a record which userid is 1.
Anybody can help me to solve this problem? Thanks.
============== My current code ==================== delete from userprofile where userprofile.userid in --list all rows that have duplicates (select p.userid from userprofile as p where exists (select lastname, emailaddress from userprofile where lastname = p.lastname and emailaddress = p.emailaddress group by lastname, emailaddress having count (userid)>1)) and userprofile.userid not in --list on row from each set of duplicate (select min(p.userid) from userprofile as p where exists (select lastname, emailaddress from userprofile where lastname = p.lastname and emailaddress = p.emailaddress group by lastname, emailaddress having count (userid)>1) group by lastname, emailaddress)
Just like Unique/Distinct command, is these some way I could list just the duplicate records from a table . The field is numeric. Thanks a lot for you help.
How can I made a query to show only my duplicate records ? For some reason that i do not know, i have duplicate entries in my clustered index 21 duplicate records in a table how can i query to know those 21 duplicate records ?
I was wondering if anyone can tell me an easy way to find duplicate records on sql. The thing is this, at work we have a database (table) which includes tracking numbers, I need a easy way to be able to search this table for duplicate tracking numbers and print them out. I currently access this table to edit some data by using the following path “Start > Programs > Microsoft SQL Server > Enterprise Manager” then work my down the tree to “Databases > Master > Tables” on tables I do a right click and “open table/query”. Any help would be most appreciated. Believe me I’m very “SQL illiterate”
I need a sql statement to delete duplicate records.
I have a college table with all colleges in the nation. I noticed that all of the colleges were listed twice. How do I delete all of the duplicate records.
Here is my table. Colleges ------------------- schoolID - smallint NOT NULL, schoolName - varchar(60) NULL
Can someone help me out with the sql statement??? I'm running SQL Server 6.5.
and generating a report from an SQL table, and need to know how to exclude records that are "duplicates". Not duplicates in a sense that every field is identical, but duplicates in a sense where everything except the unique identifier is identical. Is there a quick and easy way to do this?
I am importing data into a SQL table and there is a potential for duplicate records to be coming in. How do I simply ignore the duplicates and add only the records that do not violate the keys?
Hi All, I am having one table named MyTable and this table contains only one column MyCol. Now i m having 10 records in it and all the records are duplicate ie value is 7 for all 10 records.
It is something like this,
MyCol 7 7 7 7 7 7 7 7 7 7
Now i m trying to delete 10th record or any record then it gives me error "Key column information is insufficient or incorrect. Too many rows were affected by update."
What should i do if i want only 4 records insted 10 records in my table? How do i delete the 6 records from table?