How To Parameterize File Data Flow Sources (and Destinations)
Jul 27, 2006
When I set up a Flat File, Excel, or XML source, I have to specify the complete file name, in particular the folder where the file exists. I would like to specify the location dynamically, via a variable or property -- but how?
I am using SSIS in SQL Server 2005 and want to have a query like this in my data flow task
Select a.* from abc as a inner join (Select max(b.id) as ID from xyz as b inner join pqr as c on b.id = c.id and b.id > ?) as t1 on t1.id = a.id
SSIS fails to detect the parameter (?) for the inner query and gives message.
" Parameters cannot be extracted from the SQL command. The provider might not help to parse parameter information from the command. In that case, use the "SQL command from variable" access mode, in which the entire SQL command is stored in a variable.", so assuming this is your problem, then you can workaround.
"
The idea is to parameterize the inner query ,,, (so if the above query doesnt make sense ignore it )
A long long time ago I asked for the ability to define an order in which you insert data into multiple destinations in a single data-flow. This was to get around the problem of loading tables in the same data-flow that have FK relationships between them.
My chosen alternative at the moment is to load the table with the unique key first and drop the data for the table with the FK into a raw file and load it in a seperate data-flow. Another alternative is to disable FK constraints and reenable after but I've chosen the raw file method.
Now that I'm using SSIS on a real project this is becoming a much bigger problem that I ever envisaged it might be. Its rare that I'm building stuff where I don't have to use raw files for this very reason and this means that I have god knows how many raw files hanging around all over the place.
I suppose the reason for this post is to flag the importance of this requested feature. I really hope that this makes it into Katmai. Or into a service pack would be even better!
Its a bit of a failing at the moment.
Are there any chances that this will make it into Katmai? its one of my biggest bug-bears about SSIS v1.
I wanna know if we can have more than one "OLEDB Destination" within a Data Flow Task, I want to use the same data flow and write to two different tables in a database with some changes. If we cannot do this within the same data flow what is the best way to do this.
Is there a way to programatically set (using expressions, variables) the FastLoadMaxInsertCommitSize property of an OLEDB destination in a data flow for Fast Load Operations. Basically, what I want to do is based on the # of records which are going to be inserted want to set the FastLoadMaxInsertCommit size.
The further i get with doing my current SSIS package the more i am starting to wonder about best practices and performance.
My current package loops through CSV files in a specified location and extracts events from these files. Each file contains multiple events which are a mixture of different types. Depending on the event there are a different number of comma seperated values. In the package i firstly set each event to one column seperated by a comma delimeter. I then create an array for the event which is split by the delimeter. In a script i weed out all elements of the array that are common to all events and set the remaining events to another array. After some processing i come to my conditional split transformation which splits the processing of each event based on the EventID. This is where i'm having doubts on whether i have approched the package correctly. There are approximately 60 different events so each one of these has a seperate pipeline to process the remaining parameters in the array and output them to the destination table. The destination table is differnet for each ID. Is it viable to have this amount conditions and paths when creating the pacakge and is this likely to have any detrimental effect on performance. Is there possibly another way that i could approach this problem?
What I am trying to do is move data from a staging table into a live environment and then update the staging table AFTER the row has move (and not errored). There does not seem to be a reliable method for doing this.
It looks like you can only extended connection managers for data flow sources.
Is there anyway to develop a custom connection manager for the SQL destination in the data flow destinations?
I can€™t use hardcoded connection strings. I can€™t use configurations because they are not encrypted. I already have managed code that will give the corrected connection string. I already have a custom connection manager that I use from the data flow sources. I just need one for data flow destinations but I can€™t see a way to extend into the OLE DB?
I have around 120 tables. I am using script component to pull the values from oracle stored procedures. I do not want to create 120 source & desitinations in my dataflow. Please advice how can this be possible with one script component and one oledb destination.
I noticed I can add mutiple outputs in 1 script component which makes the script component to work like a dataset (container of different recordsets (tables) ), if I am correct. Can this be redirected to an dynamic oledb connection.
I am working on a typical data conversion project where we are migrating data from an old data model to a new data model, using SSIS. Both the DBs are in SQL.
Now we have a situation where say there are 25 source tables and 20 odd target tables.
For transporting data, we are using OLEDB Source & OLEDB Destination transforms. However, each transform maps to one view or one table. As a result, the Data Flow is really messed up with 45+ transforms in it. Is there an elegant way of doing this ? With say just one datasource or maybe fewer transforms?
I have a dynamic flat file I need to import to a table (in the same format as the file). The problem, I'm realizing, is that dynamic column mappings are a pain with SSIS. I have to know the format of the flat file ahead of time, which I won't.
What are my options here? Can package configurations help with this?
Until there's an Integration Services 2.0, what custom components would you most like to see examples of? The documentation team is starting work on the 2nd Web refresh of Books Online and SQL Server samples, anticipated for release around April, and may be able to incorporate some requests as samples or BOL topics.
I have 5 or more tables to join to get a particular output which has to be sent to a destination table. In the 5 tables some are inner joins and some are left outer join. I am opting for stored procedure at this point. But I would like to know how can this be done in data flow transformations having multiple souce and merge joins or any other alternates. I tried using merge join, but this does not accept more than two tables.
I saw this simple post which kick started me to use ssis transformations to stored procedures. But I encounter issue. http://www.mssqltips.com/tip.asp?tip=1322 error "The destination component does not have any available inputs for use in creating a path".
I have a data flow task in which there is a OLEDB source, derived column item, and a oledb destination. My source is a SQL command, that returns some values. I have some values, that I define in the derived columns, and set default values under the expression column. My question is, I also have some destination columns which in my OLEDB destination need another SQL command. How would I do that? Can I attach two or more OLEDB sources to one destination? How would I accomplish that? Thanks
I created a SSIS package to extract data from a flat file source and load them into a table in a data base. After I created the package i checked it in to source control(perforce).
But the problem is once a month new flat source file comes and data should be updated. Once the new flat file comes, is there anyway that SSIS package can identify the path of the flat file and execute the package automatically? In Flat file source only the data will be changed. Not location or data type or anything. Can i use parameters to do that?
In a foreachloop, I am inserting records into a flat file which is working fine. But the thing is that as the file grows, it takes longer for it to locate the EOF(End of File) of the flat file so as to insert the records.
I have around 70-100 lines written to the file at each loop and there are more than 20k records to be looped. wihich means that at the end I should be having 1400k - 20000k line in the text file.
One solution would be to insert the records at the start of the file itself so that it does not has to lookup the EOF each time before writting.
Another would be to generate separate files and then merge it.
Any idea how can this can be done?
Beside this I have to zip the file and then SFTP to a given address.
I need to create a report that lists other reports in the SSRS database. So I created an XML Data Source. But no matter what, I can't seem to parameterize this connection string so that it'll always point to whichever report server the report is deployed to. I know that I can change this data source once I've done deployment, however I don't want to have to change this whenever I change from SIT to UAT to Production. Is there a way to achieve this?
During my development of a ssis package i've noticed that when creating two control flows that pulling data from seperate tables, each going to its own flat file, that the second keeps the attributes of the column names from the first. So when I create my second flat file, not only does it have the names of its correct columns but has the name of the the first flat file.
I'm hoping that I've explained the correctly. I'll provide more info "OR" I can provide the code to package if anyone would like.
any suggestions on dealing with a flat file in the format below. I only want to process the data columns in the middle of the file and want to ignore all other rows. This was a very simple task in DTS with a small amount of VBScript in the transformation but it doesn't seem as straightforward in SSIS. thanks
After performing a join operation on two tables i get the below resultset
pid, fname, typename, pname, pcost
1, cad, bars, product-1, 100
2, har, witte, product-2, 120
3, nes, bars, product-3, 119
Now i need to create files with the obtained resultset like
Column 'fname' is the folder name and 'typename' should be the file in the particular folder.
For example the first record should be inserted into file name 'bars.txt' in the folder 'cad' and third record should be created in file name 'bars.txt' in the folder 'nes'.
I have a table that holds in each record an image (varbinary(max) actually), a text reference for the image and a MIME type for the image. I need to read this table and for each record that has been created since the last run, I need to create a file with the image as the content, the mime type as the file extension and the text reference as the file name. There will be one file created per record found by the data flow source.
I was assuming that I could use the flat file destination and manipulate the file naming using the contents of each record in the flow but am completely stumped on how to achieve this.
In DTS 2000 I had a situation where I had a text file as input source and text file as output source. On migrating the package to 2005 it puts a wrapper around it which executes it as a 2000 package, the rest of the tasks are neatly converted to 2005 style tasks. I presume this to mean that this will not be supported through to the next version, and there is no direct equivalent in 2005.
My question is how do I import a non-flat file source which has different numbers of columns per line. I did ,somehow, manage to do this with 2000 but cannot seem to get anywhere with 2005.
The flat file source seems to be expecting a common number of columns and just can't seem to cope with no column delimiters on some lines. If anybody knows different I would be glad to hear about it.
Raw data is not helpful to me as only works with a specific raw type (apparently)
Went onto Bulk Insert Task but got this message
[Bulk Insert Task] Error: An error occurred with the following error message: "Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.Bulk load: An unexpected end of file was encountered in the data file.".
Have already browsed with this on web but only find comments about changing timeout setting.
Can find timeout settings in DataFlow source and DataFlow destinations but not in Bulk Insert Task.
As you can see this is a long and protracted question.
If the answer is simple I apologise if not blame Microsoft. Other than that have found SSIS has some nice improvements, apart from the odd vague error message I keep coming across.
I am facing one problem with sql 2005 SSIS while uploading a "||" seperated text file with Data Flow in SSIS.Below I am giving the sample data of the text file
data||data||data
data||data||data
When I uploaded it using sql 2000 DTS it was not giving and it was being uploaded into the table but when I tried with SQL 2005 it is giving the below problem.
"[Flat File Source [1]] Error: Data conversion failed. The data conversion for column "Column 2" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.". "
At our business we are getting a lot of PDF documents that are being hand keyed into a database. Has anyone heard ior know of a SSIS Data Flow Source component that I coud use to read thos documents into a data stream (?) and process?
I need to parameterize some values in the data flow so that i can chnage the values directly in parameter file and re run the data flow for new value in the passed in the parameter. This can be easy for other who do not know about the flow of data flow task as to where to change the variable/parameter.
How can this be accomplished. I want the data flow task to refer to this file before it starts executing and pick the appropriate value from the file.
Or is their any better way to accompalish what i want to do here in SSIS.???
All, I'm having an issue with the Flat File Data Flow Source returning only a limited set of the rows that are in the flat file. Basically, I connect to the flat file fine, it goes to retrieve the data (tab delimited file) and only returns 190 of 392 rows. Is there a limitation on the # of rows this data flow source can retrieve or something? I've look all through the settings and properties of the task as well as the connection manager and nothing is obvious as to what is causing this. Hopefully someone ou tthere has run into this before and can help me retrieve all rows. Thanks in advance!
I have a Data Flow Task within a ForEach loop container.  The source of the flow is ADO.NET connection and the destination is a Flat File Connection.  I loop through a collection of strings in the ForEach loop.  Based on the string content, I write some data to the same destination file in each iteration overwriting the previous version. I am running into following Errors:
[Flat File Destination [38]] Warning: The process cannot access the file because it is being used by another process. [Flat File Destination [38]] Error: Cannot open the datafile "Example.csv". [SSIS.Pipeline] Error: Flat File Destination failed the pre-execute phase and returned error code 0xC020200E.
I know what's happening but I don't know how to fix it. Â The first time through the ForEach loop, the destination file is updated. Â The second time is when this error pops up. Â I think it's because the first iteration is not closing the destination file. How do I force a close of the file within Data Flow task or through a subsequent Script Task.This works within a SQL 2008 package on one server but not within SQL 2012 package on a different server.
I have a package that does simple exporting from an excel sheet to a table. I used a Dataflow task with Excel Source and OLEDB Destination Components. And i created Package configurations for Source and Destination Components. After than when i execute the package i get the following error.
Information: 0x40016041 at ProductDetails_Import: The package is attempting to configure from the XML file "D:TEST_ETLLPL_Config2.dtsConfig".
Information: 0x40016041 at ProductDetails_Import: The package is attempting to configure from the XML file "D:TEST_ETLDBCon2.dtsConfig".
Information: 0x4004300A at Data Flow Task, DTS.Pipeline: Validation phase is beginning.
Error: 0xC0202009 at ProductDetails_Import, Connection manager "Excel Connection Manager": SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E21.
An OLE DB record is available. Source: "Microsoft OLE DB Service Components" Hresult: 0x80040E21 Description: "Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.".
Error: 0xC020801C at Data Flow Task, Excel Source [1]: SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. The AcquireConnection method call to the connection manager "Excel Connection Manager" failed with error code 0xC0202009. There may be error messages posted before this with more information on why the AcquireConnection method call failed.
Error: 0xC0047017 at Data Flow Task, DTS.Pipeline: component "Excel Source" (1) failed validation and returned error code 0xC020801C.
Error: 0xC004700C at Data Flow Task, DTS.Pipeline: One or more component failed validation.
Error: 0xC0024107 at Data Flow Task: There were errors during task validation.
How do i use the foreach loop container and pass each file found according to a specified pattern to a Flat File Source in a Data Flow Task Object so i can operate on each file found in the foreach loop object instead of having to specify a static file name
I have a flat file which is loaded into the database on a daily basis. The file contains rows of strings which I load into a table, specifically to a column of length 8000.
The string has a length of 690, but the format is like 'xxxxxx xx xx..' and so on, where 'xxxx' represents data. So there are spaces, etc present in the middle.
Previously I used SQL 2000 DTS to load the files in, and it was just a Column Transformation with the Col001 from the text file loading straight to my table column. After the load, if I select len(col) it gives me 750 for all rows.
Once I started to migrate this to SSIS, I allocated the Control Flow Task and specified the flat file source and the oledb destination, and gave the output column a type of String and output column width of 8000. But when I run the data flow task it copies only 181 or 231 characters out of the 750 required. I feel it stops where it finds the SPACES and skips the rest.
I specified row delimiters or CR, and LF. I checked the file under UltraEdit and there were no special characters in the file that would cause the problem.
Any suggestions how I can get it to load the full data?
I'm importing a large csv file two different ways - one with Bulk Import Task and the other way with the Data Flow Task (flat file source -> OLE DB destination).
With the Bulk Import Task I'm putting all the csv rows in one column. With the Data Flow Task I'm mapping each csv value to it's own column in the SQL table.
I used two different flat file sources and got the following: