Little Urgent.....Advice On Logic Flow In SSIS Transformation......!
May 19, 2008
Dear All,
I have a table A with a KEY column and SSN column.
KEY = 12 digits ( first 3 digits are Department Id , and last 9 digits are SSN)
I have a table B with SSN column only.
both KEY and SSN columns are Primary keys so duplicate entries must be Avoided.
Table A is intended to be popluated weekly from TXT file (SSIS package RUN). I want to achieve somethign like this..!
P-Code sample:
for each Row in TXT file
if TXTfile.KEY = TableA.KEY then
skip and Read/Go to next Row in TXT file
else
INSERT TXTfile.KEY into TABLEA.KEY
SSN_Var = EXTRACT the SSN part (SSNpart.READ)
if SSN_VAR.Exists In TableB.AnyRow then
skip
else
Insert into TableB
End If.
End If
End For Loop.
-----------------------------------------------------------------
Using SSIS controls, what will be best flow and logic to achieve this.....?
any sample scripting code ????
I have a sql server automatic job set up containing 2 steps.First step is an Operating System Command (CmdExec) type job whichruns a .bat file which copies across a backup file to this machine.The second step is a Transact -SQL Script (TSQL) type job which runsthe RESTORE DATABASE REPLACE command which restores the backup filecopied across in step 1.Step 1's advanced option says: On Success Action: Goto Step [2]Restore the database.Step 2's advanced option says: On Success Action: Quit the jobreporting success.On clicking ok I receive the error message:"WARNING: The following job step(s) cannot be reached with currentflow logic: [1] Transfer the Backup File. Are you sure this is whatyou want".The job fails to run.Both steps do run successfully when set up as two separate jobs - notlike this which has the two steps in one job.Anyone have any ideas as to what the flow logic problem is?Thiko!
Im just a little unsure of somethingl If i have, for instance, 2 containers with both containing control flow items, and container 1 flows into container 2 (thus ensuring that container 1's items are executed first), how do I set the control flow to NOT execute containers 2 at all if any of the items of container 1 has failed BUT ensuring that container 1's items are ALL executed even though some in container1 might fail.
Thus I want all items in container1 to at least execute but only if all are successful should the control flow execute container 2's items.
I am trying to use a CTE in an OLE DB Command data flow transformation object. However, when I enter the cte and corresponding query in the SqlCommand field of the OLE DB command editor dialog, I get a syntax error. Can CTE's be used data flow objects? I have been able to use them in an Execute SQL Control Flow Item, but not in any data flow item.
I have one data flow control. Source is SQL server and destination is flat file destination. I have one derived column placed in between these two. This functionality works fine. I would like to sum one column data and count total no. of columns and put it in global variable. How can I achieve it?
I'm creating a custom data flow transformation in c#.
I would like to use expressions within this component in the same way as in the derived column component: specifying the expression as a custom property of an output column, then evaluating this expression for each row of the buffer and using this evaluated expression to populate my output column values.
So I've added an custom expression on my output column, and set its expression type to CPET_NOTIFY
But in the ProcessInput method I don't manage to get the evaluated expressions, when I use exp.Value I get my expression definition and not its evaluation.
Is there a way to get these evaluated expressions ?
I've created my own custom data flow transformation task (using C#) that will parse a fullname and output the various name parts. In the ProvideComponentProperties method, I create 5 output columns (prefix, first, middle, last, and suffix). In the ProcessInput method, I parse the input and add the name parts to the buffer. The bad thing is that I€™m making an assumption on the position of the Full Name input column within the buffer.
I would like the €œuser€? to be able to map their "full name" input column to a known Full Name column so I don€™t have to make any assumptions. This is the first SSIS task I€™ve tried to create and I haven€™t been able to find very many examples online.
In Excel they are showing Sale as -ve quantity and purchase as +ve quantity.
The database has quantity always as +ve figure and a separate column "isPurchase" set to true or false depending on whether purchase or sale.
So I need Derived Column to return a bolean (or int) depending whether quantity is positive or negative. I tried each of the following in the Expression but all of them were invalid expressions.
CASE WHEN [TotalQuantity] > 0 THEN 0 ELSE 1 END
If [TotalQuantity] > 0 THEN 1 ELSE 0 END
IIF ([TotalQuantity] > 0, 1,0)
Can anyone help me with correct syntax, or correct Data Flow Transformation if Derived column is wrong.
I created a custom transform that has a custom interface and is a wizard that uses a web service. It creates custom properties and output columns on the fly. I set the dialog result to Ok and close at the end of the steps. The transform then has the custom fields and output columns I created in the wizard. I've verified this by right clicking on the transform and going to the advanced editor. If I then immediately run the package, the custom fields don't exist in the CustomPropertiesCollection. If I close the package and reopen it, the properties now are gone. If I then go through the wizard again, thus recreating the properties, they stay and don't disappear. The quickest way to get a working transform is to add it to my data flow then save, close and reopen the package and then go through the wizard. Just saving after I add the transform does not help.
Does anyone know what might be causing this very strange problem?
Hi, I'm trying to implement an incremental data pull (Oracle to SQL) based on Andy's blog: http://sqlblog.com/blogs/andy_leonard/archive/2007/07/09/ssis-design-pattern-incremental-loads.aspx
My development machine is decent: 1.86 GHz, Intel core 2 CPU, 3 GB of RAM. However it seems the data flow task gets hung whenever I test the package against the ~6 million row source, as can be seen from these screenshots. I have no memory limitations on the lookup transformation. After the rows have been cached nothing happens. Memory for the dtsdebug process hovers around 1.8 GB and it uses 1-6 percent of CPU resources continuously. I am not using fast load to insert new records into my sql target table. (I am right clicking Sequence Container 3 and executing this container NOT the entire package in the screenshots)
The same package works fine against a similar test table with 150k rows. http://i248.photobucket.com/albums/gg168/boston_sql92/7.jpg http://i248.photobucket.com/albums/gg168/boston_sql92/8.jpg
The weird thing is it only takes 24 minutes for a full refresh of the entire source table from Oracle to the SQL target table. Any hints,advice would be appreciated.
I need to know how to use my private function - created as a scalar-valued-function in SQL Server 2005 - in script component (here a transformation is used) in a data flow task to transform a two-digit-month into a tree-sign-month:
I have Variable , data source and conditional transformation which checks the count(*) if the count == 0 then I connect an script component and change variable to false(initial it is True) and write into a log file...
Then I check that variable on predence constarint at workflow if variable==True then success. BUT Whenever I run the package my dataflow gets green even the condition does not meet like count==0 . So my variable's value is "False". Actually if the condition doesnt meet then my script shouldnt work. Am I missing something???
I've built a simple custom data flow transformation component following the Hands On Lab (http://www.microsoft.com/downloads/details.aspx?familyid=1C2A7DD2-3EC3-4641-9407-A5A337BEA7D3&displaylang=en) and the Books Online (ms-help://MS.MSDNQTR.v80.en/MS.MSDN.v80/MS.SQL.v2005.en/dtsref9/html/adc70cc5-f79c-4bb6-8387-f0f2cdfaad11.htm and ms-help://MS.MSDNQTR.v80.en/MS.MSDN.v80/MS.SQL.v2005.en/dtsref9/html/b694d21f-9919-402d-9192-666c6449b0b7.htm).
All it is supposed to do is create an output column and set its value to the result of calling a web service method (the transformation is synchronous). Everything seems fine, but when I run the data flow task that contains it, it doesn't generate any output. The Visual Studio debugger displays it as yellow, with 1,385 rows going into it, but the data viewer attached to its output is empty. The output metadata looks just like I expect: all of my input columns plus the new column, correctly typed. No validation or run-time warnings or errors are reported.
I'll include the entire C# file below, which only overrrides the ProvideComponentProperties, Validate, PreExecute, ProcessInput, and PostExecute methods of the parent PipelineComponent class.
Since this is effectively a specialization of the DerivedColumn transformation, could I inherit from the class that implements the DC component instead of PipelineComponent? How do I even find out what that class is?
Thanks! Here's the code: using System; // using System.Collections.Generic; // using System.Text;
using Microsoft.SqlServer.Dts.Pipeline; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper;
namespace CustomComponents { [DtsPipelineComponent(DisplayName = "GID", ComponentType = ComponentType.Transform)] public class GidComponent : PipelineComponent { /// /// Column indexes for faster processing. /// private int[] inputColumnBufferIndex; private int outputColumnBufferIndex;
/// /// The GID web service. /// private GID.WS_PDF.PDFProcessService gidService = null;
/// /// Called to initialize/reset the component. /// public override void ProvideComponentProperties() { base.ProvideComponentProperties(); // Remove any existing metadata: base.RemoveAllInputsOutputsAndCustomProperties(); // Create the input and the output: IDTSInput90 input = this.ComponentMetaData.InputCollection.New(); input.Name = "Input"; IDTSOutput90 output = this.ComponentMetaData.OutputCollection.New(); output.Name = "Output"; // The output is synchronous with the input: output.SynchronousInputID = input.ID; // Create the GID output column (16-character Unicode string): IDTSOutputColumn90 outputColumn = output.OutputColumnCollection.New(); outputColumn.Name = "GID"; outputColumn.SetDataTypeProperties(Microsoft.SqlServer.Dts.Runtime.Wrapper.DataType.DT_WSTR, 16, 0, 0, 0); }
/// /// Only 1 input and 1 output with 1 column is supported. /// /// public override DTSValidationStatus Validate() { bool cancel = false; DTSValidationStatus status = base.Validate(); if (status == DTSValidationStatus.VS_ISVALID) { // The input and output are created above and should be exactly as specified // (unless someone manually edited the persisted XML): if (ComponentMetaData.InputCollection.Count != 1) { this.ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid metadata: component accepts 1 Input.", string.Empty, 0, out cancel); status = DTSValidationStatus.VS_ISCORRUPT; } else if (ComponentMetaData.OutputCollection.Count != 1) { this.ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid metadata: component provides 1 Output.", string.Empty, 0, out cancel); status = DTSValidationStatus.VS_ISCORRUPT; } else if (ComponentMetaData.OutputCollection[0].OutputColumnCollection.Count != 1) { this.ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid metadata: component Output must be 1 column.", string.Empty, 0, out cancel); status = DTSValidationStatus.VS_ISCORRUPT; } // And the output column should be a Unicode string: else if ((ComponentMetaData.OutputCollection[0].OutputColumnCollection[0].DataType != DataType.DT_WSTR) || (ComponentMetaData.OutputCollection[0].OutputColumnCollection[0].Length != 16)) { ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid metadata: component Output column data type must be (DT_WSTR, 16).", string.Empty, 0, out cancel); status = DTSValidationStatus.VS_ISBROKEN; } } return status; }
/// /// Called before executing, to cache the buffer column indexes. /// public override void PreExecute() { base.PreExecute(); // Get the index of each input column in the buffer: IDTSInput90 input = ComponentMetaData.InputCollection[0]; inputColumnBufferIndex = new int[input.InputColumnCollection.Count]; for (int col = 0; col < input.InputColumnCollection.Count; col++) { inputColumnBufferIndex[col] = BufferManager.FindColumnByLineageID(input.Buffer, input.InputColumnCollection[col].LineageID); } // Get the index of the output column in the buffer: IDTSOutput90 output = ComponentMetaData.OutputCollection[0]; outputColumnBufferIndex = BufferManager.FindColumnByLineageID(input.Buffer, output.OutputColumnCollection[0].LineageID); // Get the GID web service: gidService = new GID.WS_PDF.PDFProcessService(); }
/// /// Called to process the buffer: /// Get a new GID and save it in the output column. /// /// /// public override void ProcessInput(int inputID, PipelineBuffer buffer) { if (! buffer.EndOfRowset) { try { while (buffer.NextRow()) { // Set the output column value to a new GID: buffer.SetString(outputColumnBufferIndex, gidService.getGID()); } } catch (System.Exception ex) { bool cancel = false; ComponentMetaData.FireError(0, ComponentMetaData.Name, ex.Message, string.Empty, 0, out cancel); throw new Exception("Could not process input buffer."); } } }
/// /// Called after executing, to clean up. /// public override void PostExecute() { base.PostExecute(); // Resign from the GID service: gidService = null; } } }
The logic I am trying to recreate via SSIS is the following SQL statement:
insert into db3.dbo.targettable1 -- Target database table (SiteC, Objecte, Attrib1)
select distinct ?, ?, from ? -- Source database table join dbo.targettable2 c1 -- Target database table on c1.Alias = ? and c1.CSetID = ? and c1.FacID = (select f.PFacID from dbo.Fac f where f.FacID = ?) where not exists (select * from dbo.targettable2 c -- Target database table where c.Alias = ? and c.FacID = ? and c.CSetID = ?)
I have an OLE DB Source that consists of an expression to approximate the following portion of the Above Select statement:
Select ?, from ? -- Source database table and
The package has 2 global variables User:CSetID and User::FacID whose scope is global to the package and whose values are set within a Foreach Loop Container outside of the Data Flow Task
I was trying to reference the 2 global variables within the Looup Transformation to recreate the following portion of the SQL statement.but encounter errors:
join dbo.targettable2 c1 -- Target database table on c1.Alias = ? and c1.CSetID = ? and c1.FacID = (select f.PFacID from dbo.Fac f where f.FacID = ?)
In the Advanced Editor window of Lookup Transaction
select * from (select * from [dbo].[targettable2 ]) as refTable where [refTable].[Alias] = ? and [refTable].[FacID] = ? and [refTable].[CSetID] = ?
Is there away to reference global variables in a Lookup Transformation that are set outside a Data Task Flow?
I have developed some packages to load data into "Fact" tables in the data warehouse. Some packages are OK, other ones not. What is the problem?: some packages load fact tables with lots of "Lookup - Data Flow Transformation" into the "data flow task" (lookup against dimension tables) but they are very very slow, too much slow to be choosen as a solution.
Do you have any other solutions to avoid using "Lookup - Data Flow Transformation"? Any other solution (SSIS, TSQL and so on....) is welcome to speed up the Fact table loading process.
I'm having trouble with a Script Component in a data flow task. I have code that does a SqlCommand.ExecuteReader() call that throws an 'Object reference not set to an instance of an object' error. Thing is, the SqlCommand.ExecuteReader() call is already inside a Try..Catch block. Essentially I have two questions regarding this error:
a) Why doesn't my Catch block catch the exception? b) I've made sure that my SqlCommand object and the SqlConnection property that it uses are properly instantiated, and the query is correct. Any ideas on why it is throwing that exception?
Hi, Do you have any suggestions of how to transform the below code in SSIS? I guess we should use foreach loop container...... also i would appreciate if you can post some related articles.
I need to parameterize some values in the data flow so that i can chnage the values directly in parameter file and re run the data flow for new value in the passed in the parameter. This can be easy for other who do not know about the flow of data flow task as to where to change the variable/parameter.
How can this be accomplished. I want the data flow task to refer to this file before it starts executing and pick the appropriate value from the file.
Or is their any better way to accompalish what i want to do here in SSIS.???
Hi all, I am new to SSIS...and wanted to know where can i get a lot of info(from web/book) on how to deal with ActiveX Scripttasks & Script tasks in SSIS...the place i am working has a lot of VB Scripting done in DTS Packages...having a hard time in understanding the Scripts, as i am more like back-end guy and wanted to learn a lot in SSIS, once i understand the scripts it will help me a great deal as to how to approach the tasks...Is there any website which teaches how to avoid Scripting in SSIS as i read somewhere that Scripting should be avoided as much as possible by making using of so many tasks in the SSIS tool.
I will look forward for someone to help me out and show me a way.
I have 2 flat files to load into a datamart via SSIS. Need to implement:- 1. How can I prevent loading of same file again? 2. If by chance wrong data has been loaded how can I rollback? Kindlt guide asap as I have to implement these. JigJan
I have begun using SSIS and I am a little taken aback by the complexity of it especially since I just want to do a simple data transformation such as in DTS. Are there any tutorials for data transformation for SSIS on the web/this forum and what if I want to do a simple transformation from Access to SQL Server?
Hey all - got a problem that seems like it would be simple (and probably is : )
I'm importing a csv file into a SQL 2005 table and would like to add 2 columns that exist in the table but not in the csv file. I need these 2 columns to contain the current month and year (columns are named CM and CY respectively). How do I go about adding this data to each row during the transformation? A derived column task? Script task? None of these seem to be able to do this for me.
Here's a portion of the transformation script I was using to accomplish this when we were using SQL 2000 DTS jobs:
' Copy each source column to the destination column Function Main() DTSDestination("CM") = Month(Now) DTSDestination("CY") = Year(Now) DTSDestination("Comments") = DTSSource("Col031") DTSDestination("Manufacturer") = DTSSource("Col030") DTSDestination("Model") = DTSSource("Col029") DTSDestination("Last Check-in Date") = DTSSource("Col028") Main = DTSTransformStat_OK End Function *********************************************************** Hopefully this question isnt answered somewhere else, but I did a quick search and came up with nothing. I've actually tried to utilize the script component and the "Row" object, but the only properties I'm given with that are the ones from the source data.
My issue is the inner join transformation in SSIS. See i ll explain my problem clearly now.....
Actually i m just checkin if the inner join performed in business intelligence studio usin the inner join transformation and the inner join performed in the management studio using queries are same. Logically both the resultset should match isn't but in my case it is not so. It is very important for me to figure out where the problem is because i m goin to use lotsa inner join transformations in my current project.
I ll appreciate if someone can help me to figure out this problem. May be you can also tell me the detailed steps in adding the inner join transformation and also how it works.
I have a SSIS Package which in brief moves data from one SQL data store to another. On my local machine, the package executes fines, and does what it is supposed to do, by moving data from Point A to Point B. I have both Point A(DB) and Point B(DB) on my local machine. But when I deploy this package over to ASSEMBLY which is basically another environment, I get weird intermittent errors. What should I do? Can someone give me some tips on how to do good error logging in SSIS? I am currently writing to Windows Event Log, and I seem to look at some errors that are coming thru. What else can I do on SSIS Package side, to improve error reporting and handling, so troubleshooting is faster and more effective. I just need some stable advice on how to setup my package to do error logging where troubleshooting issues in different environments is more effective. Also I need to know, what could be causing these issues? Thanks.
I have a flatfile source to which different flatfiles will be passed as input,this is connected to an OLEDB destination which changes along with the sourcefile. But when the new file is given as input, the OLEDB mappings are not getting refreshed.It is showing an error.
Actually this was implemented in DTS, and they have used an activex script for the transformation. what shd I use in SSIS?
Hi I have migrated a DTS that had some activeX transformation tasks within data pump flow tasks.
Those parts were migrated as "DTS 2000 tasks" .. so activeX transformation tasks aren't possible in SSIS ? I know ActiveX script tasks are but for transformations ?
1. IF i leave these Encapsulated DTS 2000 tasks in the migrated SSIS package, will it run independently of the original DTS or does it need the old DTS running to "call" that part from ? (I hope im making sense here) is it possible to load this functionality internally into the new SSIS ?
2. How could I (if i can't do ActiveX transformation tasks) achieve this is SSIS ? can I achive this using the script tasks in SSIS ?
Can someone help me out in providing the STEPS to solve this problem. My scneario is, I've a table which has got 2 fields and 5 default row values have been filled in. Now, using the above, duirng package runtime, it need to dynamically create additional field and has to store values like for.e.g (0001 America). I'm getting the following error while executing the ssis package.
1. [DTS.Pipeline] Warning: Component "Derived Column" (1170) has been removed from the Data Flow task because its output is not used and its inputs have no side effects. If the component is required, then the HasSideEffects property on at least one of its inputs should be set to true, or its output should be connected to something. 2. [DTS.Pipeline] Warning: Source "OLE DB Source Output" (87) will not be read because none of its data ever becomes visible outside the Data Flow Task.
Please suggest with your valuable solution at the earliest.
Is there way to rename parameters Param_0, Param_1 in OLEDBCommand transformation? I am trying to create table driven packages using BIML. I am using OLEDBCommand Transformation to update rows. But since, I will not be sure of how many parameters and order of the parameters, I was planning to rename the parameter programmatically, so that accordingly I can build the update statement and add filter condition.