Do We Have IF-ELSE Construct Similar To A Flow Chart Execution In SSIS?
Aug 10, 2007
Hi All,
Just needed an insight into the IF-ELSE construct w.r.t its implementation in SSIS or a similar methodology that could be adopted in SSIS while executing a Package.
Scenario : I want to Start with importing data from Different sources to SQL Server Destination. For Which, i define 3 different Data Flow Tasks each involved in importing data from an external source to SQL Server Destination.
1] Text File Inbound Task : Source - Flat File Source ; Destination - SQL Server Dest.
2] Excel Inbound Task : Source - Excel Data Source ; Destination - SQL Server Dest.
3] Xml Inbound Task : Source - XML Data Source ; Destination - SQL Server Dest
Finally i want to execute the package with an IF-ELSE Scenario which will Check for the external Source being :
I have a package that loads staging tables from an Oracle source DB. In the data flow tab I have 30+ read table/write table task combinations. When I run the package 3-4 of the read/write combos execute at a time. What I'm trying to control is the priority order of the combo execution. My goal is to minimize to total load time by having the larger table transfers run first and the smaller table transfers fill in until they are all complete. Currently, the largest table (16 million) transfers last (because it was the last combo that I created?).
Has anyone come up/determined a generic way to capture and log indicative information within a data flow in SSIS - e.g., a number of rows selected from the source, transformed, rejected, loaded, various timestamps around these events, etc.? I am trying to avoid having to build a custom solution for each of the packages that I will have (of which there will be dozens). Ideally, I'd like to have some sort of a generic component (such as a custom transformation) that will hide the implementation details and provide a generic interface to the package.
It is not too difficult to achieve something similar on the control flow level, but once you get into data flows things get complicated.
Is there documentation somewhere about multiple execution paths in SSIS control flow? I didn't find documentation anywhere. I have a situation where I have two tasks that take considerable time, but could be executed in parallel (to speed up things) and I was wondering whether SSIS supports parallelism.
To illustrate the issues in simultaneous execution, I created a test SSIS package. In the package, I have five tasks, let's call them T1, T2, T3, T4 and T5. The taks are connected with "green arrows" like this:
T1->T2
T1->T3
T2->T4
T3->T4
T5 is not connected. The tasks can be e.g. Send Mail tasks, that's not relevant to this issue. I put a breakpoint in each task and execute the package.
When I execute package, T1 and T5 become active, i.e. the arrow that displays where the package execution currently is, is in two tasks simultaneously. Now F10 (step over) doesn't seem to work "Unable to step. Not implemented". If I press F5 nothing happens. After I press F5 for a second time tasks T1 and T5 and executed. Why don't they execute with the first pressing of F5? I would additionally like to know whether these two tasks are executed in parallel or sequentially, i.e. in the same thread or in two threads? Is there documentation of this?
The execution stops at T2&T3. Again, pressing F5 doesn't do anything, but the second time I press F5 T2 and T3 are executed.
I'm building a package wherein I perform a SQL task(A) if the error log is not empty. This same SQL task(A) is also being used by another data flow task(B). The precedence points from B to A bottom to top. When I execute, all the tasks in the downward direction (precedence pointing downward/sideways) execute but this one doesnt as it points updwards. I can copy and paste task A and make B point to A downward, but I dont want to duplicate A in the same package.
Is there any other approach?
If you dont understand the above, see the flow:
X (SQL task) ----on success-------------> A (SQL task) | ^ | | ...(sequence of steps) | | | | | B(Data Flow Task) ---------failure --------|
Execution flow doesnt move from B to A, even though its a failure condition. Hope this explains the problem.
I'm in the process of trying to optimize a stored procedure with many queries. The execution plan provides a missing non-clustered index on nearly every query, and they're all fairly similar. The only real difference between them are what's in the INCLUDE statement. The two key columns are listed in every missing index. Let's say each query is approximately 5% of the total batch and 90% of the queries all fall into the category I listed above. How should I go about creating the missing indexes? Create all of the missing indexes or create one generic one that has all the INCLUDE columns? Create a minimal index with just a few of the common INCLUDE columns?
Here's an example of what I'm talking about with the missing indexes:
/* USE [DB] GO CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [dbo].[TABLE_1] ([COLUMN_A],[COLUMN_B]) INCLUDE ([C4ABCD],[C4ARTX],[C4ASTX],[C4ADNB],[C4AFNB],[C4BKVA]) GO */ /*
The Query Processor estimates that implementing the following index could improve the query cost by 99.9044%.
*/ /* USE [DB] GO CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [dbo].[TABLE_1] ([COLUMN_A],[COLUMN_B]) INCLUDE ([C4ARTX],[C4ASTX],[C4ADNB],[C4CZST]) GO */
/*
The Query Processor estimates that implementing the following index could improve the query cost by 99.5418%.
*/ /* USE [DB] GO CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [dbo].[TABLE_1] ([COLUMN_A],[COLUMN_B]) INCLUDE ([C4ABCD],[C4ARTX],[C4ASTX],[C4ADNB],[C4AFNB],[C4BKVA]) GO */
I am currently trying to import following Excel Datasheet into my SQL 2005 Database with SSIS:
Column: Created For example cells: 18.03.2008 17:47:43
So now I want to get the time, the year, the month and the day seperately out of this column for a later OLAP Analysis.
I created a Data Flow Task, where I imported the Excel source and set the SQL database as target. But now I don't know really how to work with the "Data Conversion" where to extract the dates. In Excel this is just a "customized" cell.
I would like to know what happens during the different phases when executing a data flow task. I noticed that there are the validation, prepare for execute, pre-execute, execute, post-execute, and cleanup phases.
We all know that the execute phase is the actual execution of the desired data flow (source to destination) but I was wondering what the other phases do. Knowing what happens during those phases will help me optimize the whole execution. 1 question raised by someone here is "what is the difference between the Prepare for Execute phase and the Pre-Execute phase? and what does the validation phase do?
I have one Data flow, which trasfer data into two table (Parent & Child) .
My question is : Is there a way, i can load data first into parent and then child table. because child table getting load first after that parent table loading. (Execution should be Source Parent --> Destination Parent) First , (Source Child --> Destination Child) Second. In my case its executing reverse. So i have foreign key constraints at child table , its giving foreign contraints error while running ssis package
Can any one tell me, How to define my own sequence execution at the Data flow task (Source - Destination) ?
I have a SSIS (CTP June 2005) package with several data flow tasks. One data flow has 2 OLEDB data sources which are then unioned together, followed by a conditional split, then a derived column transformation, which feeds into an OLEDB destination. When I run the package, this particular data flow seems to run fine and then just stops. There are no errors and the records counts don't move. I've used data viewers to look at the data and it seems fine. I've switched the OLEDB destination with a flat file and execution runs fine, then switched back to OLEDB and it hangs again (over and over). There's nothing unusual about the OLEDB destination, and all the other OLEDB destinations work fine. It also seems to hang on the same destination row number count, but again that row has been verified as valid. In fact, I dropped the DB table and recreated it with no constraints and all fields nullable, but the problem persists. Help?
i got a foreach loop that has about 20 data flow tasks(same database connections but different extractions) but i notice that when i execute the project it only runs 4 data flow tasks at a time.
i know that there is an option for each data flow to set the "Engine Threads", but is there a way to set the thereads in a foreach loop or for the whole project so it will execute all data flow tasks in one go for each loop.
I'm trying to conditionally execute a dataflow based on the presence of a data file. If the data file isn't present, I'd like to execute gracefully without error.
Logic is as follows:
If FileExists Then execute dataflow Else exit w/o error End If
I've got the code ready to go, but I'm not sure how to do this conditional branch logic. Right now, the code calls the Dts.Results.Success / Failure. The problem, however, is Failure is exactly that... which doesn't result in the graceful exit I'm looking for.
I got a serious problem with an SSIS-Import. My packages import from a foreign source into a kind of temp-table (actually it's not a temporary table, it´s just filled with data and truncated after completion of the package), do some transformations and then I got a data flow task that simply copies all the rows from the "temp" to the final table. I get the following errors (here there are two simultanious copy operations from two different "temps" into the same final table.
Error: 0xC0202009 at _temp to finaltable 5 2 1, OLE DB Destination [16]: An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Transaction (Process ID 68) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.".
Error: 0xC0209029 at _temp to finaltable 5 2 1, OLE DB Destination [16]: The "input "OLE DB Destination Input" (29)" failed because error code 0xC020907B occurred, and the error row disposition on "input "OLE DB Destination Input" (29)" specifies failure on error. An error occurred on the specified object of the specified component.
Error: 0xC0047022 at _temp to finaltable 5 2 1, DTS.Pipeline: The ProcessInput method on component "OLE DB Destination" (16) failed with error code 0xC0209029. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.
Error: 0xC0047021 at _temp to finaltable 5 2 1, DTS.Pipeline: Thread "WorkThread0" has exited with error code 0xC0209029.
Error: 0xC02020C4 at _temp to finaltable 5 2 1, OLE DB Source [1]: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.
Error: 0xC0047038 at _temp to finaltable 5 2 1, DTS.Pipeline: The PrimeOutput method on component "OLE DB Source" (1) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.
Error: 0xC0047021 at _temp to finaltable 5 2 1, DTS.Pipeline: Thread "SourceThread0" has exited with error code 0xC0047038.
Information: 0x40043008 at _temp to finaltable 5 2 1, DTS.Pipeline: Post Execute phase is beginning.
Information: 0x402090DF at _temp to finaltable 5 2 1, OLE DB Destination [16]: The final commit for the data insertion has started.
Information: 0x402090E0 at _temp to finaltable 5 2 1, OLE DB Destination [16]: The final commit for the data insertion has ended.
Information: 0x40043009 at _temp to finaltable 5 2 1, DTS.Pipeline: Cleanup phase is beginning.
Information: 0x4004300B at _temp to finaltable 5 2 1, DTS.Pipeline: "component "OLE DB Destination" (16)" wrote 5041 rows.
Task failed: _temp to finaltable 5 2 1
Warning: 0x80019002 at KDStat_alles_412: The Execution method succeeded, but the number of errors raised (7) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
Task failed: 412
Warning: 0x80019002 at kdstat_alles_master: The Execution method succeeded, but the number of errors raised (7) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
Error: 0xC0202009 at _temp to finaltable 1 2 1, OLE DB Destination [4468]: An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Transaction (Process ID 82) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.".
Error: 0xC0209029 at _temp to finaltable 1 2 1, OLE DB Destination [4468]: The "input "OLE DB Destination Input" (4481)" failed because error code 0xC020907B occurred, and the error row disposition on "input "OLE DB Destination Input" (4481)" specifies failure on error. An error occurred on the specified object of the specified component.
Error: 0xC0047022 at _temp to finaltable 1 2 1, DTS.Pipeline: The ProcessInput method on component "OLE DB Destination" (4468) failed with error code 0xC0209029. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.
Error: 0xC0047021 at _temp to finaltable 1 2 1, DTS.Pipeline: Thread "WorkThread0" has exited with error code 0xC0209029.
Error: 0xC02020C4 at _temp to finaltable 1 2 1, OLE DB Source [1]: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.
Error: 0xC0047038 at _temp to finaltable 1 2 1, DTS.Pipeline: The PrimeOutput method on component "OLE DB Source" (1) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.
Error: 0xC0047021 at _temp to finaltable 1 2 1, DTS.Pipeline: Thread "SourceThread0" has exited with error code 0xC0047038.
Well, it looks like those two processes simply deadlock each other. But there are 11 Simultanious Data Imports which all go fine. The Issue only occurs at those two. The structure of the packages is exactly the same everywhere.
Is it possible that a previous update-sql query hasen´t committed properly and locks the datasets?
I am currently trying to configure the logging for my package. I was able to redirect the log messages, I am interested in, to a text file and configure its connection string in the package configuration.
Now here is the point I am currently stuck at: I would like my package's log file to contain the logs for one day only. This can be done by concatenating the date to the old log file's name and creating a new empty file with the original log file name on a daily basis.
If you are familiar with the log4j tool, then what I need is a configuration similar to the so-called "DailyRollingFileAppender" in log4j.
Is this "DailyRollingFileAppender" configuration possible with SSIS?
I have a table that I am basically reduplicating a couple of times for each part of this database that I want to create.Each table basically has the same data: The tables will be called motherTable, fatherTable, sonTable, daughterTable and so on.I am pretty much using the following in each column: UserID, MotherID(or FatherID or SonID, etc., etc. and so on for each unique table), FirstName, LastName, MiddleName, BirthPlace, Photo, Age.I don't see an option to copy a table and just modify the second ID part and rename that table accordingly.How can I make this an easier way of creating these similar tables without retyping all these columns over and over again?Thanks in advance.
Scenarion: 1.- SSIS Package execute tasks on 2000 SQL Server Database 2.- Execution takes places using Business Intelligence Studio Question: 1.- How can I tracked that SQl 2000 tasks took place using a SSIS Package?
I am having a hard time with what appears to be something simple. I want to import an excel spreadsheet into a table on a daily basis from a command line. I created a package from the Import Wizzard in the SQL Management Studio and saved it. Since I want a clean table each day, my process needs to be create a temp table, import from the Excel file into the temp table. If that is successful, delete the original table and rename the temp table the original name. The point of this process is to provide for a fail-safe if there is some unforseen problem downloading the data on a particular day.
When I run the package, the first thing it does is delete the original table. I know this because the process shows the time that it finished is before anything else has started or finished. The time shown for the completion of the data flow task is about 2 minutes after that time.
This is maddening!!! The one thing I do not want to happen I can not seem to prevent. I have my control flow set on success. Why does it do this?
There is a table with a column that contains Xml documents. For each record from my Data Flow Source, I want to pass in the Xml document and the node to interrogate, and return the value contained in the node. Like the Crm component, this is probably one I will have to write from scratch in C#, but I would like to avoid having to create the custom component if it already exists in the public arena.
Does anyone know of any Xml Ssis Data Flow Components that are downloadable for free?
I was working all day making changes to my 3MB package. I was adding a large number of transforms that were copied-and-pasted from elsewhere in the same data flow task.
All was going well. I even took the time to have SSIS lay out the task again (1/2 hour). Suddenly I started receiving some strange errors:
After the layout, I noticed two stray components 'way off in the upper right corner. I found that one of them had a duplicate name to a component which had been added hours ago. Even after deleting it, I got "duplicate name" errors.
I copied three components in one selection, and when I tried to paste them, got the error "can't initialize component on paste". I tried them one at a time, but got the same error.
I got errors about COM failures due to marshalling to another thread I then exited Visual Studio and started it again. To my great surprise, the data flow task I was working on was still there, but was completely empty.
Comparing what I'm left with to my last version in source control, I find that the entire pipeline element is missing from the DTS: ObjectData element!
I'm developing a real love/hate relationship with SSIS. It varies from one day to the next. Guess what kind of day this is!
When I export the report in excel format the chart is displayed as picture. I want it to be displayed as editable chart.Does Office Writer work in this situation and did anyone use Office Writer to accomplish same type of problem.Is there any other method or product we can use instead of the office writer.
I am using SSIS in SQL Server 2005 and want to have a query like this in my data flow task
Select a.* from abc as a inner join (Select max(b.id) as ID from xyz as b inner join pqr as c on b.id = c.id and b.id > ?) as t1 on t1.id = a.id
SSIS fails to detect the parameter (?) for the inner query and gives message.
" Parameters cannot be extracted from the SQL command. The provider might not help to parse parameter information from the command. In that case, use the "SQL command from variable" access mode, in which the entire SQL command is stored in a variable.", so assuming this is your problem, then you can workaround.
"
The idea is to parameterize the inner query ,,, (so if the above query doesnt make sense ignore it )
I am having some problems with the loading of tab delimited text file (source) to a SQL Server table (destination) using the SSIS data flow task. Package has been executed successfully with no error msg. The number of rows in the text file also matches the number of rows in the SQL table. But, when I check the content of the table, I noticed some of the columns contain NULL which supposed to have value. This happens not to all the rows but only to some rows. I did some testing by removing some rows from the beginning, middle and end of the text file and re-run the package but the result is quite inconsistent. Sometimes, the field got filled, but sometimes, it just contains NULL where it supposed to have value.
I am experiencing an error where the ssis data flow task would freeze and stop data export from a oledb source to a text file. It doesn't generate any errors the ssis package would just hang. This only happens when I run it in 64 bit mode. When I change the mode to 32 bit the ssis never freezes and runs fine. Has anyone experience this? Is there a fix so I can run my jobs in 64 bit mode?
I have a SSIS Package which I would like to modify using SSIS API. I need to put new component between some two existing data flow's components. During this process I need to disconnect two data flow's components using SSIS API. How can I do that?
I am loading a lot of Excel and CSV files to SQL Server. Some loading may fail for various reasons. I want a file either be load as a whole or nothing. Currently I keep a list of failed filename and remove it at the end (I add a column for source file name).
Any better way to make sure a file is loaded as a whole or nothing?
I need to create a chart with the following features
1) Bar chart that has data for 3 years (3 series) 2) Line chart that has the same data as per the above points on the bar chart but this is a running total. (3 series) 3) These data points are for the 12 months 4) there should be a secondary axis for the cumulative one