To Do Update Inside The Data Flow Task Of Package
Mar 10, 2008Hi
Is there any task which helps up to update data inside the data flow task or any method to do the same.....
Hi
Is there any task which helps up to update data inside the data flow task or any method to do the same.....
We are designing an ETL solution for a BI project using SSIS.
We need to load a dimension table from a source DB into the DW; inside the DW there are two tables for each source dimension table: a current dimension table and a storical dimension table.
The source table includes technical columns that indicate if each record was processed correctly by ETL procedures.
Each row in the source table can be a new record, or an exisisting one. If it's a new record, a corresponding new record should be inserted into the current dimension table of the DW; if it's an exisisting one, a new record should be inserted in the storical dimension table and the existing record in the current dimension table should be updated.
Furthermore for each record we need to update the source table with an error code. If the record was processed with succes the code is zero, otherwise it contains an error code.
I try to use a single data flow task, using the 'ole db command trasformation' task for managing update and 'ole db destination' task for managing insert.
The problem is that some insert and update should be executed inside a single transaction, for each record; for example if the source contains a new record I should insert the record in the current dimension table and update the source record with error code zero if every think goes fine.
There is some way to group task in the data flow pane in a single transaction ?
There is a better way to do that ?
Cosimo
Hi,
I have tables like the one below for my Stage and dimension tables:
Stage Table
accountid
name
address
Dimension Table
accountkey ---- surrogate key (DW key)
accountid ---- business key (transaction's primary key)
name
address
I used slowly changing dimension to detect the changes for the records inside my Dimension table. But I had a problem when a new record exists in the stage table. The accountkey is set as the primary key and it gets its value from a different table which stores the last account key that was created. I cannot load all the changes unless i have a business key. Is there a way that i can get the "last key" from a different table in the data flow area and then supply it together with the other fields in the new output branch of the slowly changing dimension?
cherriesh
thanks!
In my control flow, I have a container which contains an Execute SQL Task, and then upon success, a Data Flow Task. The SQL Task truncates my datamart table. In the data flow task, I execute a stored procedure (through a variable) that populates that same datamart table. I can execute the stored procedure's select statement in Management Studio with no problems in about ten seconds. However, in the SSIS package, the SQL task completes successfully, and then it hangs indefinitely on the data flow step. In the Data Flow tab, none of the boxes are even turning yellow. Why won't it complete? When I move the Exec SQL Task to another container, the package executes fine, but it should be in the Load Phase container.
View 1 Replies View RelatedHi Everyone,
In the data flow task, i have done a group by and now i have a single row.... I want to assign the value in this row to a package variable.... Without using the script component .......Any suggestions ??
Regards,
Manu
Hi, this is my first post and I'm relatively new to SSIS so please go easy on me.
Without going into too much detail about it, I've set up a simple SSIS package which does this in a nutshell:
Foreach loop picks up all *.xls files in a given folder
1 - Puts the name of the current spreadsheet into a variable
2 - File System Task copies the current spreadsheet ("abc.xls") to a file called "work.xls"
3 - Data Flow task performs data extraction on "work.xls" and puts it into a SQL server database
4 - File System Task moves "abc.xls" into a "success" folder
Continues with loop - move onto next spreadsheet
This works fine, so long as the spreadsheets all have the same number of columns.
As soon as one of them has a column missing (believe me, this will happen - we're dealing with users here) the package falls over at step 3.
When the package comes across an erroneous spreadsheet, what I'd like to do is move the offending file to a failure folder (making step 4 either a success or failure file move) and carry on with the next one.
I know that you can have an error path (the red line) from any step within the dataflow task, but this doesn't help me because the error lies in the structure of the spreadsheet and not the contents.
I've already come up with a work around whereby each file is moved into the failures folder just after step 2, then moved from the failures folder into the success folder at step 4.
This almost gives me what I want, although of course the package still falls over whenever it encounters a dodgy looking spreadsheet.
Is there any way that I can get the package to do what I'm after?
Many thanks,
Simon
Hi,
I'm trying to implement an incremental data pull (Oracle to SQL) based on Andy's blog:
http://sqlblog.com/blogs/andy_leonard/archive/2007/07/09/ssis-design-pattern-incremental-loads.aspx
My development machine is decent: 1.86 GHz, Intel core 2 CPU, 3 GB of RAM.
However it seems the data flow task gets hung whenever I test the package against the ~6 million row source, as can be seen from these screenshots. I have no memory limitations on the lookup transformation. After the rows have been cached nothing happens. Memory for the dtsdebug process hovers around 1.8 GB and it uses 1-6 percent of CPU resources continuously. I am not using fast load to insert new records into my sql target table. (I am right clicking Sequence Container 3 and executing this container NOT the entire package in the screenshots)
http://i248.photobucket.com/albums/gg168/boston_sql92/1.jpg
http://i248.photobucket.com/albums/gg168/boston_sql92/2.jpg
http://i248.photobucket.com/albums/gg168/boston_sql92/3.jpg
http://i248.photobucket.com/albums/gg168/boston_sql92/4.jpg
http://i248.photobucket.com/albums/gg168/boston_sql92/5.jpg
http://i248.photobucket.com/albums/gg168/boston_sql92/6.jpg
The same package works fine against a similar test table with 150k rows.
http://i248.photobucket.com/albums/gg168/boston_sql92/7.jpg
http://i248.photobucket.com/albums/gg168/boston_sql92/8.jpg
The weird thing is it only takes 24 minutes for a full refresh of the entire source table from Oracle to the SQL target table.
Any hints,advice would be appreciated.
All,
Is it possible to passing variable at row level within a data flow? If so, what transformation should use?
Thanks
hi,
I have an aggregate transformation in a dataflow task.
It has only 1 output value.
I'm trying to assign this value to a user variable, but I can't figure out how to do that.
i can hack something silly together - like write the value to the db, and then get it out, but I there has to be an easier way..
Thanks a lot.!
Hopefully this is an easy question:
Inside of a for each loop (looping through an ADO record set of objects to import) I have a data flow task (along with many other processes).... if the dataflow task suceeds I log success in a table. If it errors I want it to fail the dataflow task (which will fire off my Event Handler for that data flow and log the failure, email etc) BUT I want it to continue the loop - I can't seem to figure out how to get the data flow object not to fail the whole loop. If any other objects inside the foreach, other than the data flow, fail I would like the whole loop to fail. Also if possible (but not a requirement) I would like it to have a threshold where if the data flow fails X variable times it will fail the package.
I am having difficulty how to not fail the loop when the import data fails..... just looking for a simple "on error next" type logic for that specific object in the foreach but not the rest. Thanks in advance for the help/advice.
Hi all,
I use a "Exec DTS 2000" task.
I have a variable user::MYVAR in my SSIS package, wich value is "PARENT".
I pass this variable as an outer variable to my dts 2000 package, wich receive it correctly, and then update it to "UPDATED".
After this task finishes, when I come back to SSIS, MYVAR is still at "PARENT".
Is it possible to see the update from the parent package ?
Thanks
Hi,
I am trying to create a simple BI Application for SSIS. In Visual Studio 2005 I just get a Data Flow Task from the toolbar and add it to the project. When I double click it I get the following error:
The task with the name "Data Flow Task" and the creation name "DTS.Pipeline.1" is not registered for use on this computer.
Then when I try to delete it it gives this other error:
Cannot remove the specified item because it was not found in the specified Collection.
I am creating this application in an administrator account in this computer, so I doubt the problem is related to permissions. I am running SQL Server 2005 and Visual Studio 2005 in WinXP Tablet PC Edition.
Any suggestions why this is happening and how to fix it?
I am using SQL 2005 SSIS. I am joining several large tables and then the move result into another table in the same database.
I would like know which method is faster:
Use Execute SQL Task to insert the result set to the target table
Use the Data Flow Task to insert the result set to the target table. (Use OLE DB source to execute SQL command and then use the SQL destination)
Could you tell me why then other is slower?
Thanks.
I have a stored procedure that is executed via a sql script task that returns a full result set. I map this result set to a variable or object type. Is there a way to use this variable as a data source in a subsequent data flow task?
A.
Hi,
I'm trying to get a record count out of a databse using OLE DB Source and row count tasks but keep getting an error. I set up a variable as int32 and select the variable name in the row count task and when I go to the Input Columns tab to select a field to count, it gives me this error:
Error at Data Flow Task[Row Count[505]]: The component "Row Count" (505) has forbidden the requested use of the input column with lineage ID 32.
I don't even know what this means?
thanks,
Hi all,
I have a Send Mail Task in my control flow to notify users that the processing is done. I want to avoid the package to fall in error if the Send Mail task failed.
What is the best practice to do that ?
Should I raise the MaximumErrorCount of theSend Mail Task ? Should I play with ErrorHandler ?
Hi,
How do I retrieve the connections (connection managers) collections from Custom Data Flow destination? ComponentMetadata.RuntimeConnectionCollection is empty. I would like to be able to access all the connections defined in the package from the custom data flow task.
I came across code in which it was possible to access the Connections collection using the IDtsConnectionService for custom task (destination). The custom task has access to serviceProvider, whcih can be used to get access to the IDtsConnectionService interface but not the custom data flow task.
Any help appreciated.
Thanks
Naveen
Hi,
I created a package with SQL 2005. The package gets the Access DB and then inserts it into SQL Server.
If I open the package in .NET, I can see the SQL Task and Data Flow Task. The SQL Task has a property sqlstatementsource, which has the necxessary SQL code to create the tables.
How can I tell the SQL Task to recompile the SQL code if I give it another DB name, because the tables differ and don't map in the Data Flow Task
Thanks
I would like to fetch the data flow component name while package is executing. Since system variable named [System::SourceName] only fetches name of the control flow tasks? Is there a way to capture them?
View 5 Replies View Related
Hi
I have a data flow task. If it completes I should update a flag in the database. So How I can I know if the
data flow task has completed or not.
Thanks
Sai
I have a table which has been loaded from various source feeds. The SourceId relates to the source name and the SourceCompanyId is the sources primary key for the company. I am basically trying to create one row with all the SourceCompanyIds in my column headers. What data flow tasks would be necessary in SSIS?
The structure of the final table is:
CREATE TABLE [dbo].[Company](
[CompanyId] [int] IDENTITY(1,1) NOT NULL,
[CompanyName] [varchar](75),
[CIK] [varchar](10),
[Ticker] [varchar](10),
[Source1CompanyId] [int] NULL,
[Source2CompanyId] [int] NULL,
[Source3CompanyId] [int] NULL,
[Source4CompanyId] [int] NULL,
[Source5CompanyId] [int] NULL,
[Source6CompanyId] [int] NULL,
CONSTRAINT [PK_Company] PRIMARY KEY CLUSTERED
(
[CompanyId] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
=================================
The table in which contains all the company data
CREATE TABLE [dbo].[SourceCompany](
[SourceId] [int] NOT NULL,
[SourceCompanyId] [varchar](10) ,
[SourceCompanyName] [varchar](75),
[CIK] [varchar](10),
[Ticker] [varchar](10),
CONSTRAINT [PK_SourceCompany] PRIMARY KEY CLUSTERED
(
[SourceId] ASC,
[SourceCompanyId] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
Hi All,
I want to export data from SQL Server2005 to an Excel spreadsheet thru "Data Flow Task". I am using OLE DB for SQL Server for the source connection and a Connection To Excel as my destination source. The Excel spreadsheet (2003) exists and has the first row with column names. I don't have any warnings before trying to execute.
The SQL datable fileds are
i) ID - Int
ii) RefID
iii) txtRemarks - nvarchar(MAX)
iv) ddlWaterLevel - nvarchar(50)
While executing the tasks, I got the error
Error: 0xC0202025 at Data Flow Task, Excel Destination [427]: Cannot create an OLE DB accessor. Verify that the column metadata is valid.
Error: 0xC004701A at Data Flow Task, DTS.Pipeline: component "Excel Destination" (427) failed the pre-execute phase and returned error code 0xC0202025.
After analysing I found in the DataFlow --> Excel destination --> Advanced Editor for Excel Destination, the default data type for txtRemarks shows as "Unicode string [DT_WSTR]". But this is supposed to be "Unicode text stream [DT_NTEXT]". Even if I change the data type in the design time, It doesn't accept.
Please do help me out.
thanks
Sanra
I need to call a stored procedure to insert data into a table in SQL Server from SSIS data flow task.
I am currently trying to use OLe Db Destination, but I am not sure how to map inputs to OLE DB Destination to my stored procedure insert.
Thanks
Hi all, I am getting the following when trying to import text coloums from execl to SQL server 2005. Any ideas?
Error at Data Flow Task [Destination] Coloums "blar" and "Blar_name" cannot convert between unicode and non-unicode sting types.
Any help would be great.
Thanks
Dave
Dave Dunckley says there is a law for the rich and a law for the poor and a law for
Dirty Davey.
Hi frns,
I am new to SSIS. I need some help in designing the below dataflow task.
-- Teacher creates several tasks and each task is assigned to multiple students
-- The teacher table contains contains all the tasks created a every teacher
use ods
go
create table teacher
(
yr int,
tid int,
tname varchar(20),
taskid int
)
insert into teacher values(2007,101,'suraj','task1')
insert into teacher values(2007,101,'suraj','task2')
insert into teacher values(2007,102,'bharat','task3')
insert into teacher values(2007,103,'paul','task4')
insert into teacher values(2007,103,'paul','task5')
insert into teacher values(2007,103,'paul','task6')
-- Teacher "suraj" has created 2 tasks
-- Teacher "bharat" has created 1 task
select * from ods..teacher
yr tid tname taskid
============================
2007 101 suraj 1111
2007 101 suraj 1122
2007 102 bharat 2222
-- Students table contains studentid(sid),teacherid(i,e tid ) & taskid
drop table students
create table students
(
yr int,
sid varchar(10),
tid int,
taskid varchar(10)
)
truncate table students
insert into students values(2007,'stud1',101,'task1')
insert into students values(2007,'stud1',101,'task2')
insert into students values(2007,'stud2',101,'task1')
insert into students values(2007,'stud2',101,'task2')
--Note : stud1,stud2 comes under teacher with tid "101"
insert into students values(2007,'stud3',102,'task3')
-- Note : stud3 and stud4 comes under teacher with tid "102"
insert into students values(2007,'stud4',103,'task4')
insert into students values(2007,'stud4',103,'task5')
insert into students values(2007,'stud4',103,'task6')
insert into students values(2007,'stud5',103,'task4')
select * from students
yr sid tid taskid
----------------------------
2007 stud1 101 task1
2007 stud1 101 task2
2007 stud2 101 task1
2007 stud2 101 task2
2007 stud3 102 task3
2007 stud4 103 task4
2007 stud4 103 task5
2007 stud4 103 task6
2007 stud5 103 task4
Now in my target table i need to load the data in a such a way that
use targetdb
go
drop table trg
go
create table trg
(
yr int, -- data should load from teacher.yr
tid int,
taskid int(20),
cnt int
)
Mapping in target column and value to be loaded
==================================================
yr -- teacher.yr
tid -- teacher.id
taskid -- this need to start a new sequence of numbers starting from 1 for each teacher and dont want the task id to be copied as it is.
cntofstudents -- need to count no of students from "students" table for a given teacher and for his assignment
For example for teacherid "101" and taskid "task1" there are 2 students
again for the same teacher "101" and taskid "task2" there are 2 students
For teacher "102" and taskid "task3" there is only 1 student
Similary for teacher "103"
Relation
========
Teacher table | Students Table
yr | yr
tid | tid
After i run the ETL the data should look as follows :
insert into trg values(2007,101,1,2)
insert into trg values(2007,101,2,2)
insert into trg values(2007,102,1,1)
insert into trg values(2007,103,1,2) -- task4 is created by teacher "103" and assigned to 2 students stud4 and stud5
insert into trg values(2007,103,2,1) -- task5 is created by teacher "103" and assigned to 1 student i.e stud4
insert into trg values(2007,103,3,1) -- task6 is created by teacher "103" and assigned to 1 student i.e stud5
Note : If u observer the values in 3rd column of the trg table, instead of directly mapping the taskid we need to generate a separate sequence for every teacher.
BottomLine : for each and every task created by each teacher there should be a unique record along with the count of students in "STUDENTS" table
Can anyone help me out in designing the Data Flow task for this Functionality.
Thanks,
Manu
Hi there. I'm trying to learn SSIS, please, help me. I have 2 questions:
1)
There are 2 databases on 2 different servers. I need to get data from Table1(database1) and put it to Table2(database2). But I have to insert rows, which ID is not exists in Table2. How Can I do necessary filter?
2)
In the OLE DB DataSource Component I have used SQL Command(it's simplified):
declare @TmpTable TABLE (WorkCode int not null);
INSERT INTO @TmpTable (WorkCode)
select WorkCode
from Table1
SELECT WorkCode
FROM @TmpTable
SSIS Package works without any exception. But there is no any inserted record in destination table. If I try similar query without temporary table - it works good. Why?
Hi,
I have SQL Server 2005 Express edition on my machine. On an SSIS project in BIDS, when i drag a "Data Flow Task" to the package it returns the following error:
The designer could not be initialized. (Microsoft.DataTransformationServices.Design)
Does this has anything to do with the fact that i don't have SSIS installed on my machine?
I thought that SSIS was only needed (on my machine) for the runtime, just to run the packages. To create and edit the pachages i need to install SSIS on my machine too? this doesn't makes sense, maybe it's another problem.
Can anyone help me on this?
Thank you,
Rafael Augusto
I am having problems with the Data Flow task. It does not even show up in the list of items to drop into the SSIS project.
If I go to the Data Flow tab and hit create, I get the follow error. I have tried repairing and reinstalling, but nothing seems to clear up the error. Without rebuilding my machine, is there anyone who knows how to get the Data Flow Task reinstalled properly?
Thanks
Wayne
TITLE: Microsoft Visual Studio------------------------------Registration information about the Data Flow task could not be retrieved. Confirm that this task is installed properly on the computer. ------------------------------ADDITIONAL INFORMATION:TaskHost "{C3BF9DC1-4715-4694-936F-D3CFDA9E42C5}"' is not installed correctly on this computer. (Microsoft.DataTransformationServices.Design)For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%u00ae+Visual+Studio%u00ae+2005&ProdVer=8.0.50727.762&EvtSrc=Microsoft.DataTransformationServices.Design.SR&EvtID=TaskHostNotInstalled&LinkId=20476------------------------------BUTTONS:OK------------------------------
I have a simple data flow task setup...
2 ADO.NET connection managers, each referencing a DSN pointed to a Unidata database.
2 DataReader sources, each using a single ADO.NET connection managers, running a simple SELECT statement from a table.
I have a Union All transform setup to merge the data and write to a OLE DB Destination (SQL05 database)
When I run the package, each source will validate, but only one will execute. The other source will do nothing. The data source will be colored yellow, and will just sit there. The package will just sit, almost like it is waiting for input.
This behavior is not consistent, however. It varies which data source will hang, pretty much 50-50. About 25% of the time, both sources will execute, and all rows will be written to the destination.
Any help is appreciated.
thanks
I have a Windows XP X64 machine with SQL 2005 Developer and VS 2005 Team Edition for Architects on it. For the most part it appears that all normal VS and SQL functions are working properly with the exception of SSIS. If I open the developer studio and drag a Data Flow Task onto the design surface I get the message below. I have tried doing an unistall and reinstall of Integration Services as well as a repair on VS 2005 with no success. I've searched the web and newsgroups and can't find any mention of the problem I'm having. Any help greatly appreciated.
View 18 Replies View RelatedHi All,
I wanna know if we can have more than one "OLEDB Destination" within a Data Flow Task, I want to use the same data flow and write to two different tables in a database with some changes. If we cannot do this within the same data flow what is the best way to do this.
Thanks
How can I debug data flow task. From this article (http://www.databasejournal.com/features/mssql/article.php/3567961) and this video (http://www.jumpstarttv.com/Media.aspx?vid=3) I found that when we are executing the task, the data count will be shown along with color coded task but when I debug the package, no count is shown. I have SQL and source and flat file as destination. There is nothing stored in the destination file but the sql script returns the data.
Thanks.
I was working all day making changes to my 3MB package. I was adding a large number of transforms that were copied-and-pasted from elsewhere in the same data flow task.
All was going well. I even took the time to have SSIS lay out the task again (1/2 hour). Suddenly I started receiving some strange errors:
After the layout, I noticed two stray components 'way off in the upper right corner. I found that one of them had a duplicate name to a component which had been added hours ago. Even after deleting it, I got "duplicate name" errors.
I copied three components in one selection, and when I tried to paste them, got the error "can't initialize component on paste". I tried them one at a time, but got the same error.
I got errors about COM failures due to marshalling to another thread
I then exited Visual Studio and started it again. To my great surprise, the data flow task I was working on was still there, but was completely empty.
Comparing what I'm left with to my last version in source control, I find that the entire pipeline element is missing from the DTS: ObjectData element!
I'm developing a real love/hate relationship with SSIS. It varies from one day to the next. Guess what kind of day this is!