Bulk Insert Vs. Data Flow Task (different Row Results Using Flat File Source)
Nov 2, 2006
I'm importing a large csv file two different ways - one with Bulk Import Task and the other way with the Data Flow Task (flat file source -> OLE DB destination).
With the Bulk Import Task I'm putting all the csv rows in one column. With the Data Flow Task I'm mapping each csv value to it's own column in the SQL table.
I used two different flat file sources and got the following:
How do i use the foreach loop container and pass each file found according to a specified pattern to a Flat File Source in a Data Flow Task Object so i can operate on each file found in the foreach loop object instead of having to specify a static file name
The way I understand a Data Flow Task is that it inserts the rows from the source to destination one by one. Is there a way to make it act like a bulk insert task? We have been experiencing performance issues when inserting a lot of rows from one table to another. If there's no way to actually do it, can a bulk insert task functionality be scripted? Coz what I need is a table to table insert, and the bulk insert task only accepts data files as sources.
I have an SSIS Package which is designed to import log files. Basically, it loops through a directory, parses text from the log files, and dumps it to the database. The issue I'm having is not with the package reading the files, but when it attempts to write the information to the db. What I'm seeing is that it will hit a file, read 3000 some lines, convert them (using the Data Conversion component), and then "hang" when it tries to write it to the db.
I've run the SQL Server Profiler, and had originally thought that the issue had to do with the collation. I was seeing every char column with the word "collate" next to it. On the other hand, while looking at the Windows performance monitor, I see that the disk queue is maxed at 100% for about a minute after importing just one log file.
I'm not sure if this is due to the size of the db, and having to update a clustered index, or not.
The machine where this is all taking place has 2 arrays- both RAID 10. Each array is 600 GB, and consists of 8 disks. The SSIS package is being executed locally using BIDS.
All, I'm having an issue with the Flat File Data Flow Source returning only a limited set of the rows that are in the flat file. Basically, I connect to the flat file fine, it goes to retrieve the data (tab delimited file) and only returns 190 of 392 rows. Is there a limitation on the # of rows this data flow source can retrieve or something? I've look all through the settings and properties of the task as well as the connection manager and nothing is obvious as to what is causing this. Hopefully someone ou tthere has run into this before and can help me retrieve all rows. Thanks in advance!
I have a Data Flow Task within a ForEach loop container. The source of the flow is ADO.NET connection and the destination is a Flat File Connection. I loop through a collection of strings in the ForEach loop. Based on the string content, I write some data to the same destination file in each iteration overwriting the previous version. I am running into following Errors:
[Flat File Destination [38]] Warning: The process cannot access the file because it is being used by another process. [Flat File Destination [38]] Error: Cannot open the datafile "Example.csv". [SSIS.Pipeline] Error: Flat File Destination failed the pre-execute phase and returned error code 0xC020200E.
I know what's happening but I don't know how to fix it. The first time through the ForEach loop, the destination file is updated. The second time is when this error pops up. I think it's because the first iteration is not closing the destination file. How do I force a close of the file within Data Flow task or through a subsequent Script Task.This works within a SQL 2008 package on one server but not within SQL 2012 package on a different server.
In a foreachloop, I am inserting records into a flat file which is working fine. But the thing is that as the file grows, it takes longer for it to locate the EOF(End of File) of the flat file so as to insert the records.
I have around 70-100 lines written to the file at each loop and there are more than 20k records to be looped. wihich means that at the end I should be having 1400k - 20000k line in the text file.
One solution would be to insert the records at the start of the file itself so that it does not has to lookup the EOF each time before writting.
Another would be to generate separate files and then merge it.
Any idea how can this can be done?
Beside this I have to zip the file and then SFTP to a given address.
I have a flat file which is loaded into the database on a daily basis. The file contains rows of strings which I load into a table, specifically to a column of length 8000.
The string has a length of 690, but the format is like 'xxxxxx xx xx..' and so on, where 'xxxx' represents data. So there are spaces, etc present in the middle.
Previously I used SQL 2000 DTS to load the files in, and it was just a Column Transformation with the Col001 from the text file loading straight to my table column. After the load, if I select len(col) it gives me 750 for all rows.
Once I started to migrate this to SSIS, I allocated the Control Flow Task and specified the flat file source and the oledb destination, and gave the output column a type of String and output column width of 8000. But when I run the data flow task it copies only 181 or 231 characters out of the 750 required. I feel it stops where it finds the SPACES and skips the rest.
I specified row delimiters or CR, and LF. I checked the file under UltraEdit and there were no special characters in the file that would cause the problem.
Any suggestions how I can get it to load the full data?
I am running a set of SQL statements on a SQL server, to insert flat file data into a SQL table. The flat file is already FTP'ed to the SQL server. I seem to be getting an error, which is possibly pointing to a permissions issue
The statements:
BULK INSERT [Jedox_prod].[dbo].[B_BP_Customer] FROM 'c:jedox_dailyjdcom4401.txt' WITH ( FIRSTROW = 2, MAXERRORS = 0, FIELDTERMINATOR = '|', ROWTERMINATOR = ' ' ) GO
The error is : Msg 4861, Level 16, State 1, Line 1 Cannot bulk load because the file "c:jedox_dailyjdcom4401.txt" could not be opened. Operating system error code 3(failed to retrieve text for this error. Reason: 1815)
If it is permissions issue, how do I overcome this?
I have a package that does simple exporting from an excel sheet to a table. I used a Dataflow task with Excel Source and OLEDB Destination Components. And i created Package configurations for Source and Destination Components. After than when i execute the package i get the following error.
Information: 0x40016041 at ProductDetails_Import: The package is attempting to configure from the XML file "D:TEST_ETLLPL_Config2.dtsConfig".
Information: 0x40016041 at ProductDetails_Import: The package is attempting to configure from the XML file "D:TEST_ETLDBCon2.dtsConfig".
Information: 0x4004300A at Data Flow Task, DTS.Pipeline: Validation phase is beginning.
Error: 0xC0202009 at ProductDetails_Import, Connection manager "Excel Connection Manager": SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E21.
An OLE DB record is available. Source: "Microsoft OLE DB Service Components" Hresult: 0x80040E21 Description: "Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.".
Error: 0xC020801C at Data Flow Task, Excel Source [1]: SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. The AcquireConnection method call to the connection manager "Excel Connection Manager" failed with error code 0xC0202009. There may be error messages posted before this with more information on why the AcquireConnection method call failed.
Error: 0xC0047017 at Data Flow Task, DTS.Pipeline: component "Excel Source" (1) failed validation and returned error code 0xC020801C.
Error: 0xC004700C at Data Flow Task, DTS.Pipeline: One or more component failed validation.
Error: 0xC0024107 at Data Flow Task: There were errors during task validation.
I have a stored procedure that is executed via a sql script task that returns a full result set. I map this result set to a variable or object type. Is there a way to use this variable as a data source in a subsequent data flow task?
I have a text file with a single column that i need to bulk insert into a table with 2 colums - an ID (with identity turned on) and col2
my text file looks like:
row1 row2 row3 ... row10
so my bulk insert i have like this: BULK INSERT test FROM 'd: estBig.txt' WITH ( DATAFILETYPE = 'char', FIELDTERMINATOR = ',', ROWTERMINATOR = ' ' )
but i get the error:
Server: Msg 4866, Level 17, State 66, Line 1 Bulk Insert fails. Column is too long in the data file for row 1, column 1. Make sure the field terminator and row terminator are specified correctly.
However, as you can see from the text file, there is only one column, so i dont have any field terminators.
I am having trouble importing a flat file that was extracted from an AS400 server, into a SQL 2005 DB table using Bulk Insert. This file contains a column (field) that is of a Packed Decimal data type. Data in all other fields is displayed normally when viewing this file in a text editor such as Note Pad or Text Pad, but this one field comes up with unknown encoding: squares, thick vertical lines, basically strange characters and no numeric data.
Does anyone have any experience dealing with file of this sort?
Hi, Quick question on how SSIS handles queries from Data Source in a Data Flow. I noticed that when I run a particular query from Query Analyzer it takes forever. But, when I run the same query in SSIS data source in a data flow. The query results are immediate.
The query plan is already cached in SQL.
Is this just something which I am seeing incorrect or is there some bit of optimization in there in SSIS. As per my understanding SSIS does not optimize the source query.
I am using SQL 2005 SSIS. I am joining several large tables and then the move result into another table in the same database.
I would like know which method is faster:
Use Execute SQL Task to insert the result set to the target table
Use the Data Flow Task to insert the result set to the target table. (Use OLE DB source to execute SQL command and then use the SQL destination) Could you tell me why then other is slower?
Inside a data flow task, i have a oledb source and destination. In my situation, I need to pull data from a table in the source, but also hard code some columns myself, which means my source is a blend of data from table, hard coded data, which will then have to be mapped to columns in oledb destination. Does anyone which option to choose in the oledb source dropdown for the data access mode. Keep in mind, i do need to run a a select query, as well as get data from a table. Is it possible to use multiple oledb sources and connect to one destination, because that is really what intend to do here. I am not sure how it will work, or even if its possible. Basically my source access mode needs to be a blend of sql command and table columns, how would that be implemented? Any help or advice is appreciated.
I am trying to executed a packege so that it loads data from from the excel file to the SQL Server Server database. When I execute it, it prompts the following error message and 1 warning The excel file has three colums, Week, Item and Value
Error 4 Validation error. Data Flow Task: OLE DB Source [94]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E14. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E37 Description: "ORA-00942: table or view does not exist ". Test - GET NW PERF 1.dtsx 0 0
Warning
Warning 1 Validation warning. Data Flow Task: OLE DB Destination [36]: The external metadata column collection is out of synchronization with the data source columns. The column "DAY" needs to be added to the external metadata column collection. The column "TCH_AVAIL" needs to be added to the external metadata column collection. The column "PDROP" needs to be added to the external metadata column collection. The column "P_HR" needs to be added to the external metadata column collection. The column "SFAIL" needs to be added to the external metadata column collection. The "external metadata column "VALUE" (90)" needs to be removed from the external metadata column collection. The "external metadata column "ITEM" (89)" needs to be removed from the external metadata column collection. Not in use - GET NW STATS.dtsx 0 0
any suggestions on dealing with a flat file in the format below. I only want to process the data columns in the middle of the file and want to ignore all other rows. This was a very simple task in DTS with a small amount of VBScript in the transformation but it doesn't seem as straightforward in SSIS. thanks
I can't use DTS nor DTSwizard as I need to put it in a .sql and run it through a command line via .bat file (it's more for the users).
Each row ends with an EOL character, the fields are all fixed width, but I have a little problem here, some rows are empty but just with a EOL character.
Can we bulk insert only the desired column from a flat file to a table?
I am using SSIS to bulk insert from a file with more than 200 columns. I am trying to find a way I can bulk insert them to multiples table through SSIS.
The one way I can think is pre map the columns from the file to the destination tables. Build numerous Bulk Insert tasks to achieve that. But not sure if SSIS will let me do that.
In many DTS packages I have used parameterised queries for incremental loads from Oracle database sources using the Microsoft ODBC Driver for Oracle.
Now I want to migrate these packages to SSIS, but the OLE DB connection for Oracle does not support parameters.
I cannot use the "SQL command from variable" data access mode because of the 4000 character limitation on the length of string variables and expressions.
with my ssis-package i have to read several flatfiles. this files are stored in different folders on a unix machine. in my loop-task i have first a script, that checks whether the file exists. it does, so i set the path to a variable. in the connection-manager for the flat-file i have set source for the file to that variable. the bulk-insert-task starts, read the file and everithing is cool. but, sometimes, the package fails with this message:
[Bulk Insert Task] Error: An error occurred with the following error message: "Cannot bulk load because the file "\Owpu0kas 0 0 5 eceivedw000005.090" could not be opened. Operating system error code 53(error not found)."
i stop the degugging-mode, restart debuging and what happens? it runs ... maybe for the next one, two or three files and the error comes again, but never on the same file. i'm going crazy, what can i do?
I have a table that holds in each record an image (varbinary(max) actually), a text reference for the image and a MIME type for the image. I need to read this table and for each record that has been created since the last run, I need to create a file with the image as the content, the mime type as the file extension and the text reference as the file name. There will be one file created per record found by the data flow source.
I was assuming that I could use the flat file destination and manipulate the file naming using the contents of each record in the flow but am completely stumped on how to achieve this.
I am working in one company and currently I am assigned to new project for Data Migration from company X to our company Y using SSIS. I am totally new and i just completed 5 tutorial which was gien on MSDN website.
Basically client is going to send us first flat file with 1 million records with Header, Detail and Trailer records. I want to create a Package in such a way that it dumps all this first load into 7 to 8 different tables at a time. we also have to include functionlity for validation and error check. On successfull load error file should only return Header and Trailer but no detail records. If there are any errors then error file should contain Header, Detail records which were unable to load plus trailer which we have to sent back to client.
When 2nd file comes that time we have to check whether this is new records or change (update) one depending on Flag which tells it.
This is basically high level idea of my Package what i need to create. If u guys have any question then let me know.
I know you guys are very experienced one. Anyone of you please give me some detail idea on it I would really appricate it. I have very limited time line for it.
We have a single generic SSIS package that is used to import several hundred iSeries tables into SQL. I am not looking to rewrite the process. But I am looking for ways to improve performance.
I have tried retain same connection, maximum insert commit size, lock table (tablock), removed some large columns, played with the log file location and size, and now I am working to tweak the defaultbuffermaxrows.
To describe the data flow task - there are six data flows tasks (dft) working at the same time. Each dtf has their own list of iSeries tables and columns and the corresponding generic SQL table names. Each dtf determines their list of tables based on the number of columns to import. So there is dft30 (iSeries table has 1-30 columns to import), dtf60 (iSeries table has 31-60 columns to import), etc. The destination SQL tables are generically called Staging30, Staging60, etc. Each column in the generic Staging tables are varchar(100). The dtfs are comprised of an OLE DB Source and an OLE DB Destination.
The OLE DB Source uses a SQL Command from Variable to build a SELECT statement. The OLE DB Source uses a connection manager that uses an IBM iAccess IBMDA400 provider. The SQL Command ends up looking like this for the dtf30. This specific example is importing from the iSeries table TDACLR and it only has two columns so it will be copied to the Staging30 table.
select TCREAS AS C1,TCDESC AS C2,0 AS C3,0 AS C4,0 AS C5,0 AS C6,0 AS C7,0 AS C8,0 AS C9,0 AS C10,0 AS C11,0 AS C12,0 AS C13,0 AS C14,0 AS C15,0 AS C16,0 AS C17,0 AS C18,0 AS C19,0 AS C20,0 AS C21,0 AS C22,0 AS C23,0 AS C24,0 AS C25,0 AS C26,0 AS C27,0 AS C28,0 AS C29,0 AS C30,''TDACLR'' AS T0 from Store01.TDACLR
The OLD DB Source variable value looks like the following, but I am not showing the full 30 columns
select cast(0 AS varchar(100)) AS C1,cast(0 AS varchar(100)) AS C2,cast(0 AS varchar(100)) AS C3,cast(0 AS varchar(100)) AS C4,cast(0 AS varchar(100)) AS C5, ... cast(0 AS varchar(100)) AS C30.
The OLE DB Destination uses OpenRowSet Using FastLoad From Variable. The insert into Staging30 ends up looking like this.
Of course we then copy and transform the Staging30 data to the SQL table that equals T0.
But back to defaultbuffermaxrows. Previously the dtfs had default values of 10000 for DefaultBufferMaxRows and 10485760 for DefaultBufferSize. I added a SQL task to SUM the iSeries column sizes, TCREAS and TCDESC in this example, and set the DefaultBufferMaxRows by dividing the SUM of the columns max_length into 10485760. But I did not see a performance improvement. Do you think that redefining the columns as varchar(100) for the insert is significant? Should I possibly SUM the actual number of columns (2) as 2x100 or SUM the 30x100?
I have created an SSIS Package that takes data in an excel file and writes it to an Oracle Database. The excel file has 5 columns, but the database tabls has many additional columns. I would like to default some of these other columns for this job. For instance, I want to set the created and updated times to the time when the job ran, and set some other fields to values that will be consistent for every row in the job.
I have a table that I'm loading as part of a control flow that in turn is copied to a target table by using a data flow task. I am doing this because a different set of fields may be used from the source entry to create the target entry based on a field in the source table. That same field may indicate that multiple entries need to be created in the target table from one source table entry for which I use a multi-cast transformation.
The problem I'm having is that no matter how many entries there are in the source table, the OLE DB Source during execution only shows 7,532 entries being taken from the source table. If there are less than 7,532 entries in the source table, everything processes fine. More than 7,532 and the data flow task only takes 7,532 and then seems to hang. It also seems as though only one path of the multi-cast transformation is taken when the conditional split directs a source entry down that path.
Seems like a strange problem I know, but any insight anyone could provide is appreciated. Thanks.
I am getting inconsistent results when BULK INSERTING data from a tab-delimited text file. As part of my testing, I run the same code on the same file again and again, and I get different results every time! I get this on SQL 2005 and SQL 2012 R2.We have an application that imports data from a spreadsheet. The sheet contains section headers with account numbers and detail rows with transactions by date:
AAAA.1234 /* (account number)*/ 1/1/2015 $150 First Transaction 1/3/2015 $24.233 Second Transaction
BBBB.5678 1/1/2015 $350 Third Transaction 1/3/2015 $24.233 Fourth Transaction
My Import program saves this spreadsheet at tab-delimited text, then I use BULK INSERT to bring the data into a generic table full of varchar(255) fields. There are about 90,000 rows in each day's data; after the BULK INSERT about half of them are removed for various reasons. Next I add a RowID column to the table with the IDENTITY (1,1) property. This gives my raw data unique row numbers.
I then run a routine that converts and copies those records into another holding table that's a copy of the final destination table. That routine parses though the data, assigning the account number in the section header to each detail row. It ends up looking like this:
AAAA.1234 1/1/2015 $150 First Purchase AAAA.1234 1/3/2015 $24.233 Second Purchase BBBB.5678 1/1/2015 $350 Third Purchase BBBB.5678 1/3/2015 $24.233 Fourth Purchase
My technique: I use a cursor to get the starting RowID for each Account Number: I then use the upper and lower RowIDs to do an INSERT into the final table. The query looks like this:
SELECT RowID, SUBSTRING(RowHeader, 6,4) + '.UBC1' AS AccountNumber FROM GenericTable WHERE RowHeader LIKE '____.____%'
Results look like this:
But every time I run the routine, I get different numbers! my results are not accurate. I get inconsistent results EVERY TIME.Here is my code, with table, field and account names changed for business confidentiality.This is a high profile project at my company;
TRUNCATE TABLE GenericImportTable; ALTER TABLE GenericImportTable DROP COLUMN RowID; BULK INSERT GenericImportTable FROM 'SERVERGeneralAppnameDataFile.2015.05.04.tab.txt' WITH (FIELDTERMINATOR = ' ', ROWTERMINATOR = ' ', FIRSTROW = 6)
I have an SSIS package doing a bulk insert from a file. Then later on I'm trying to delete that file (in a file delete task), but I'm getting an error:[File System Task] Error: An error occurred with the following error message: "The process cannot access the file 'xyz' because it is being used by another process.".I'm wondering if there isn't some way to 'tweak' the bulk insert syntax so that it doesn't lock the file?
I need to call a stored procedure to insert data into a table in SQL Server from SSIS data flow task. I am currently trying to use OLe Db Destination, but I am not sure how to map inputs to OLE DB Destination to my stored procedure insert. Thanks