First post here. Anyway, I have a question regarding SSIS. I'm currently given a task that requires reading a flat file, applying duplicate removal as well as invalid data removal, processing it, and finally writing it to a SQL Server 2005 DB.
Part of the processing requires checking for partial duplicates in the batches of records provided in the text file. For example, the record contains a a phone number, status, timestamp of creation and various other entries. If a phone number is repeated (meaning, duplicate entry), a column called 'Status' must be checked, and only entries with the status of 'C' is allowed through.
Another part of the processing requires that if the phone number is repeated along with various other entries including status, the timestamp of creation is checked and only the entry with the most recent timestamp is accepted.
I would like to know how to implement this in SSIS without using table objects and scripts, as my experience tells me that doing this in a script can really take a hit on system performance. The task is expected to handle tens of thousands of records in a day.
Hi,I need to enforce that a table does not have "duplicates" for aspecific status type in the table.If the column "STATUS" = 2, then there can not be more than one rowwith a specific "ID" column.I can not use a unique key constraint because duplicate values for thiscombo of columns is valid for the status = 1.Just when the status = 2, there can not be any other rows with the sameID and status = 2.Any ideas?-Paul
I have two connections in a package pointing to two different databases on the same server. I have to insert records from 'DB1' table 'Gender1' to 'DB2' table 'Gender2'. Before I do that though, I have to make sure the minimum value (of all the Gender Keys that are going to be inserted) of 'DB1' 'GenderKey' (which is an identity field) is greater than the maximum value of DB2-GenderKey (which is a primary key but not an identity field). How can I do this simple check? I have to do this process for many different tables ....... Gender table is just an example. If someone can give an detailed explanation on which tasks to use and how to use them (as I am relatively new to SSIS) that'd be great.
If DTSGlobalVariables("gsPrevProcessCount").Value > 0 Then MsgBox "The data for one or more of your members has already been processed. Please review your data files and remove the processed files from the data directory." Main = DTSTaskExecResult_Failure Else Main = DTSTaskExecResult_Success End If
End Function
In SSIS, I have an Execute SQL Task that returns a variable PrevProcessedCount. Now, I am stuck. How do I display a message to the user (or maybe just put it in the audit log) and how to I check the variable and stop processing my package?
I have a condition where if column5 is equal to 1 then put column6 into the destination column "dest6", if it is not equal to 1 then put column6 in destination column "dest7"
What is the best way to do this in SSIS?
If I have to use the conditional split then do I have to copy my complete mappings, exact change this one column?
Thank for the help this mapping will take me a long time!
Right the answer is probably simple but the Internet and books and everything has been no joy to me whatsoever.
I want to split my data stream based on the date. So I want to use a conditional split object to do this.
I entered the following as my case date_created > (DT_DBTIMESTAMP)"01/10/2000"
When I move off the line it stays black so appears to be okay, yet when I run my package it says it is not a boolean result and fails. Can anyone please tell me what I am doing wrong.
Also I cannot filter in the source call due to the sheer amount of work being done on the data before the split.
HI, How to create package in SSIS by applying the business Logic like if the record already exist it should be and update else it should be an insert in the destination table. how to achive this funcality in SQL SERVER 2005 (Business Intelligence studion).
HI i need to write the Condition for Insert and Update Reccord depending upon the Prod_ID. How to write the Follwing condtion in the Condition Split? pls Anyone give me the Solution?
" if Prod_ID Exist then UPDATE Records
if Prod_ID Not EXIST then INSERT Records "
how to write the above conditon in the Condional Split?
I am using Conditinal split in my package. I need to remove certain rows which are matching my criteria. The criteria requires using wild card characters like, first_name = '%john%'.
Can anybody help me out in 1) implementing cursors in SSIS. I want to process each row at a time from a dataset. I was trying to use Foreachloop container but in vain. Can you please answer in detail.
my few other questions are: 1) Can i do nested inner join in SSIS. If yes, how? ( I have three table i need to join Tab1 to table 2 and get join the table 3 to get the respective data) 2) I have a resultsets. I want to split the data according to data in a col. Say for instance: Col1 Col2 A 1 A 2 B 3 C 4 C 5 i want to split the data according A, B and C . i.e., if Col1= A then do this, if Col1= B then do this..etc. How can i do this using conditional split task in SSIS
I have a CSV file which contains some duplicate record and i have to load this file in SQL server database using SSIS package .
What i have to do is read the file and if the same record entry is occur more than 10 times for a particular unique combination ( like ID , Date , Time ) then i need to take only one record for that occurance.
I am having a problem where duplicate log statements are being written to a log file (as defined by a log provider).
I believe that this is because in the logging dialog box, I have ticked the checkbox next to a child task to override the logging functionality. I need to do this because it is a script task and I want to capture "ScriptTaskLogEntry" events (something that I cannot do at the parent level). However by doing this I seem to get the script events written at the parent, as well as at the Script Task level.
Is there any way of avoiding this, but still capturing the log events from the script task?
Another issue that is possibly linked is that I am getting an error from the log provider:
The SSIS logging provider "SSIS log provider for Text files" failed with error code 0x800700EA ((null)). This indicates a logging error attributable to the specified log provider.
Could this be because of the parent and child task are both attempting to write to the same log provider?
I am importing the values for field Atype from a .csv file as DT_STR, 13 and I need to fit them into a bit type CType field.
When I write the conditional split ((ISNULL(Atype)?"a":Atype)!=(ISNULL(CType)?"9":CType)) it says that the DT_WSTR and DT_I4 types are incompatible and that I need to explicitly cast with a cast operator. I haven't been able to make it work, how to explicitly cast?
We are building a dataload application where parameters are store in a table. And there are multiple packages for each load.There is a column IsChecked column if it is 1 then only the child package should execute.Created a master package. In which i have taken execute SQL task in that storing a results in variable and based on the result the child package should execute. But In executesql task i selected result set as full result set. I am getting the below error.
[Execute SQL Task] Error: Executing the query "SELECT isnull(ID ,0) AS ID FROM DataLoadParameter..." failed with the following error: "The type of the value (DBNull) being assigned to variable "User::LoadValue" differs from the current variable type (Int32). Variables may not change type during execution. Variable types are strict, except for variables of type Object.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.
I have 3 source for IS flow. One is flat file, one is DB table and one is output bad data. It might be a situation when I could have duplicate primary key since records come from 3 sources (flat file, db table, reject (output) table). Can any one give me suggestion how to handle duplicate primary key problem in this situation.
public static void CreateDestDFC1() { destinationDataFlowComponent1 = dataFlowTask.ComponentMetaDataCollection.New(); destinationDataFlowComponent1.Name = "SQL Server Destination 1"; destinationDataFlowComponent1.ComponentClassID = "{5244B484-7C76-4026-9A01-00928EA81550}"; managedOleInstance1 = destinationDataFlowComponent1.Instantiate(); managedOleInstance1.ProvideComponentProperties(); managedOleInstance1.SetComponentProperty("BulkInsertTableName", "Employee"); managedOleInstance1.AcquireConnections(null); managedOleInstance1.ReinitializeMetaData(); managedOleInstance1.ReleaseConnections(); } //Second one here.. public static void CreateDestDFC2() { destinationDataFlowComponent2 = dataFlowTask.ComponentMetaDataCollection.New(); destinationDataFlowComponent2.Name = "SQL Server Destination 2"; destinationDataFlowComponent2.ComponentClassID = "{5244B484-7C76-4026-9A01-00928EA81550}"; managedOleInstance2 = destinationDataFlowComponent2.Instantiate(); managedOleInstance2.ProvideComponentProperties(); managedOleInstance2.SetComponentProperty("BulkInsertTableName", "Customer"); managedOleInstance2.AcquireConnections(null); managedOleInstance2.ReinitializeMetaData(); managedOleInstance2.ReleaseConnections(); } And its giving a error.can anyone say why? or can anyone change this? The package contains two objects with the duplicate name of "component "SQL Server Destination" (50)" and "component "SQL Server Destination" (22)".
I've a dtsx package which runs nightly to do following:
1. select data from a SQL replicated table 2. do some lookups (Lookup, Derived Column, Multicast, Conditional Split, etc.) 3. insert into another SQL table on another server using "Table or view - fast load", rows per batch = 10000, maximum insert commit size = 10000, and "redirect row" on error output on destination to an error log text file. Once in a while, I found duplicate records in the error log; these rows cannot be inserted into destination table due to primary constraint. For example, transaction_id=111000 appears twice in the error log but it is a unique key in the source table.
My questions: 1. What could be the cause of duplicating rows during ETL in SSIS? I've asked this before and have spent so much time research but still could not find the reason. This link is from my previous post:
2. For a daily extract data with over millions of rows, what would be best to set rows per batch, maximum insert commit size, etc? I've read some posts on this forum and decide to use 10000 for both, but once in a while there's just one duplicate rows that causes the whole batch of 10000 rows not committed.
I recently encountered an error when I created several copies of one package.
It's always nearly the same package with small modifications. I call this packages from a parent package which is part of our datawarehouseing-framework.
The problem is, when copying a packages or using a packages as template the packages' IDs and Task's-IDs are the same. And this isn't only an issue concerning logging!! :
When the parent package calls one of the copied packages the first task is executed in every package parallely. Furthermore ... when I for example set a breakpoint on a data transformation task in one of the packages, the breakpoint is set in all packages on the same task! This is resulting in strange errors because the tasks-states and variable values seem to get mixed up.
Unfortunately there is only a possibility to change the package's ID, but the IDs of tasks are readonly!
One solution is, to create a new package and copy all the tasks to the new package which creates new IDs, but doing so, I have to manually recreate a long list of variables, all the configurations, all the connection-managers once again. Furthermore I loose the layout of tasks.
I found some posts about it here
http://groups.google.de/group/microsoft.public.sqlserver.dts/browse_thread/thread/6f85a31ea190608a/0eae312aa8440cf8?lnk=gst&q=pitfall&rnum=1&hl=de#0eae312aa8440cf8 or
In my SSIS package, i have a field test_method_number coming from OLE DB Source. I used Derived transformation to trim test_method_number: TRIM(test_method_number)
Now in the next Derived Transformation, i see duplicate test_method_number. How to get rid of this duplicate?
I have one ssis package moving the data from staging to destination. In stating table we have the duplicate data. But in destination table 4 columns have primary key. How to handle the duplicate records in oldedb source.
Is there a way in order to execute a subscribed report based on a certain criteria?
For example, let's say send a report to users when data exist on the report else if no data is returned by the query executed by the report then it will not send the report to users.
My current situation here is that users tend to say that this should not happen, since no pertinent information is contained in the report, why would they receive email with blank data in it.
I'm doing a group by in an aggregate transformation. I have say 6 columns in the output and I'm grouping on all of them - how can I get duplicate rows in the output? If I do the same select and group by in SQL on the source data I don't get any duplicate rows. In fact out of 6000+ rows I only get 2 duplicates.
I want to import a data file into a sql table. The table has a primary key but the data could have a duplicate value in the PK column (error in the source data). How can I "trap" for this type of error in SSIS?
I am using the Import/Export wizard to import data from an ODBC data source. This can only be done from a query to specify the data to transfer.
When I try to create the tables, for the query, I am getting the following error:
Msg 2714, Level 16, State 4, Line 12
There is already an object named 'UserID' in the database.
Msg 1750, Level 16, State 0, Line 12
Could not create constraint. See previous errors.
I have duplicated this error with the following script:
USE [testing]
IF OBJECT_ID ('[testing].[dbo].[users1]', 'U') IS NOT NULL
DROP TABLE [testing].[dbo].[users1]
CREATE TABLE [testing].[dbo].[users1] (
[UserID] bigint NOT NULL,
[Name] nvarchar(25) NULL,
CONSTRAINT [UserID] PRIMARY KEY (UserID)
)
IF OBJECT_ID ('[testing].[dbo].[users2]', 'U') IS NOT NULL
DROP TABLE [testing].[dbo].[users2]
CREATE TABLE [testing].[dbo].[users2] (
[UserID] bigint NOT NULL,
[Name] nvarchar(25) NULL,
CONSTRAINT [UserID] PRIMARY KEY (UserID)
)
IF OBJECT_ID ('[testing].[dbo].[users3]', 'U') IS NOT NULL
DROP TABLE [testing].[dbo].[users3]
CREATE TABLE [testing].[dbo].[users3] (
[UserID] bigint NOT NULL,
[Name] nvarchar(25) NULL,
CONSTRAINT [UserID] PRIMARY KEY (UserID)
)
I have searched the "2714 duplicate error msg," but have found references to duplicate table names, rather than multiple field names or column name duplicate errors, within a database.
I think that the schema is only allowing a single UserID primary key.
I have the following code in the color property of a textbox. However, when I run my report all of the values in this column display in green regardless of their value.
I've begun to get the above error from my package. The error message refers to two output columns.
Anyone know how this could happen from within the Visual Studio 2005 UI? I've seen the other posts on this subject, and they all seemed to be creating the packages in code.
Is there any way to see all of the columns in the data flow? Or is there any other way to find out which columns it's referring to? Thanks!
I want to know how I can create conditional FROM WHERE clauses like below ..
SELECT X,X,X FROM CASE @intAltSQL > 0 Then Blah Blah Blah END CASE @intAltSQL = 0 Then Blah END WHERE CASE @intAltSQL > 0 Then Blah Blah Blah END CASE @intAltSQL = 0 Then Blah END