Creating Hierarchical Flat File From Multiple Record Types
May 1, 2008
I'm using SSIS to import seven flat files (each containing a different record type) into a staging database. This part was easy.
Now I need to export the records from all seven tables into a single flat file structured in a nested hierarchy using common keys. (This format is required by the vendor for loading data into a new system).
I could use some ideas on the data transformations needed to combine all seven record types into an hierarchical record set which can then be written to my Flat File Destination. I'm currently looking at an article on SLQIS.com ("Handling Different Row Types In The Same File") which seems close to what I need, but they are importing (ref: www.sqlis.com/54.aspx ). I'm not sure if I should just reverse this for export or use something different. Any comments are appreciated.
The record types B1 through E2 form a complete set. Each set has it's own unique child-set key. There may be one or more sets for each typeA record (although it's possible that typeE records don't exist in the most recent set).
I've just started looking at SSIS and have encountered what should hopefully be a simple problem to solve. I have a pipe-separated source file that looks like this (I've added Line numbers for simplicity):
In addition to a header and footer records, this file contains three record types for each person.
Record types are identified by the second column.
Each record type has a different number of columns:
Type 100 has 5 columns Type 200 has 4 columns Type 305 has 12 columns
The Row delimiter for all records is the {CR}{LF} character
I've set up a flat file input source and specified {CR}{LF} as the row delimiter for both header and data rows and the "|" character as the field delimiter.
It appears that SSIS is assuming that because the first data row has 5 columns, then everything must fit that format too. So the {CR}{LF} character that separates lines 02 and 03 is interpreted as text rather than a separation character and all remaining | field separators after 305 are interpreted as text containing in the fifth column. SSIS is also complaining that the last row is incomplete.
A bit like this (I've used tildes to indicate column separation):
I've seen one other reference to this behaviour but the response seemed to be SSIS doesn't know which columns are missing. In this scenario, we don't have missing columns, rather, we have different types of record in a single file. in DTS I would effectively parse the file once for each record type thus:
I have a text file to import where there are three file types: a header which has info about who sent the file and begins with 'H', detail records that begins with D and a trailer record that begins with T and just has the record count following that. The fields are delimited by '*'. H, D and T records each contain a different number of fields. I suspect that what I should do is to split this file into three separate files. I tried to do this with SSIS but ran into problems. If I make the output a file destination, it won't let me use that output as input for the next process. There are no arrows I can grab onto to link to the next transform.
This is my first SSIS package although I made hundreds of DTS packages a few years ago. I can't figure this out in DTS either.
This sounds like it should be an EASY thing to do.
Each 01 record type has the records after it associated to it until the next 01 appears, so TestStuff would have TestStuff 2,3 related to it while TestStuff 4,5 belong together. In the example the 888 in the 01 record is the key to the group, but it does not appear in the following lines.
The problem is that each record type has different line formats, columns, etc, so they must be parsed differently. I have created a conditional branch on the first two characters, and written each record type out to a seperate flat file for that type, so that they can be imported again and parsed with the Flat File Source, but I am unsure how to relate them again. I tried appending the 888 to the other lines before they were written out, but I can't find a way to share the variable across the conditional split branches using a script component.
Does anyone have an idea how I could parse these files and keep the relationship intact?
Is there a way to tell the flat file wizard to use a different map based on certain characters?
Is there a way to share a variable across the different braches of a conditional split.
I have a table that holds in each record an image (varbinary(max) actually), a text reference for the image and a MIME type for the image. I need to read this table and for each record that has been created since the last run, I need to create a file with the image as the content, the mime type as the file extension and the text reference as the file name. There will be one file created per record found by the data flow source.
I was assuming that I could use the flat file destination and manipulate the file naming using the contents of each record in the flow but am completely stumped on how to achieve this.
I've seen several posts about reading and writing files that have different record types with varying column metadata. My particular file has 11 record types plus several header types and looks something like:
<Header1>
<Header2>
<Detail01-#1>
<Subdetail02>
<Subdetail03>
...
<Detail01-#2>
<Subdetail02>
<Subdetail03>
...
...
Since i need to get different detail and subdetail records, i can't really use the technique of 3 dest file connection managers found in http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=87269&SiteID=1
I've tried using an exec sql to get the main detail records and then a forech ADO en umerator that would get the subdetails, but it all seems so kludgy. I'm starting to think that I should just write the bulk of the file creation code in a c# app instead of trying to smush this into SSIS. Opinions? Am I missing some trick in SSIS?
We have a flat file format generated from a vendor. It contains a "mainframe" view of the data with a header record, batch header record, detailed records, batch trailer record and trailer record. It arrives as a .dat file. What is the best approach to extract the necessary columns out of this file to populate the corresponding SQL server tables and rows?
There is a small problem encountered while creating a package in sql server 2005. Actually i am using a flat file which has 820 rows and 2 columns which are seperated by line feed(for ROW) and tab(for COLUMN).after importing i found that ther are only 800 rows imported into the table. Ather verifying the input file i found out that there are some null values in the second column so there is no line feed for those values. Can anyone please help me how to give multiple delimiters for the same input flat file.
Hi,We have XML files (and its XSD) that are dump of our tree-like datastructure in memory. These XML files may potentially have unlimitedlevels of nesting because some elements can contain themselves(recursively defined).We want to build some reports using some third party reporting tool(Crystal Report, for example).The first question is - is there any way for Crystal Reports toprocess such complicated (some elements have recursive definitions)XML file directly? If not, anyone knows any other reporting tool thatcan do that job?The second question is - if we have to develop a small applicationthat converts the XML to dataset, anyone who has done similar thingbefore can give us some recommendations as to which approach is theeasiest to take (Java, .NET, etc.)? I'd appreciate it if someone canpoint me to some resources to get me started.Another approach is to convert the hierarchical XML into relationaltable set and store the data into relational database. In that case,the reporting tool can simply read from the database. I'd appreciateit if someone can let me know if there is any softeware/program thatcan do the conversion. (I heard that some database server can do theconversion from XML to relational tables, but didn't find anythingconcrete.)Any comment/recommendation is appreciated!Thank you for your help in advance!!
I am attempting to create a multi-record file (as described in my last thread) and have found the following set of instructions very helpful: http://vsteamsystemcentral.com/cs21/blogs/steve_fibich/archive/2007/09/25/multi-record-formated-flat-file-with-ssis.aspx
I have been able to create a sample file with two of my record types.
I now need to build on this further, because I have 9 record types in total that need to be extracted to a single flat file.
does anyone have any ideas how I might extend the example above to include more record types or know of another means of achieving this?
Thanks in advance for any help you might be able to provide.
Hi, I have a flat file that contains 2 types of records - Dev and production. The Dev will be noted with an D and the Production with a P. These records are different - The dev records are in a different order and contain different info then the Production. I need to use SSIS to import the data into 2 different SQL Tables. How to do this? Can any one help me Thanks in advance
I have a problem with "Flat file"-connection, which I cannot understand at the present. Here is the issue: I've got an ASCII-file containing 233898 lines. I try to read this file in two different packages using two different connections. In the first connection I used default data type - DT_STR of length 50, in the second one I used "Propose types..." feature (with 2000 samples) to detect types which are better matching the reality. And, when I try to load my data, the first connection reads exactly 233898 lines from the file, the second one 203898. Somehow it skips 30000 lines unloaded.
I tried to observe the error output for the second connection - everything goes smoothly and problemless. But somehow those 30000 lines are missed.
Has anybody experienced such a situation? Is the issue known?
This question is around how we can get the data types and lengths populated into the flat file source columns.
In Connection Manager, you have your flat file defined. You can choose "Suggest Types...", and the minimum lengths and correct data types will be returned from within the data in the flat file.
Is there some way to automate this data type definition, but coming from the other direction (coming from the destination table that we are loading)?
For example, you have mapped the columns that will be loaded. Can you then reverse engineer the data types and lengths for the columns in the flat file from the destination table?
i have simple requirement...but dont know how to proceed. I need to move the first record from a flat file (file1) to another flat file(file2) using SSIS, file1 will have many records but i just need the first one alone to be moved. Any pointers on this would be of much help.
I'm rather new to ssis and I've been reading and testing but didn't find a solution for this problem.Supose I've got a table Customer with some fields. One of the fields is CustID.I want to create as many flat files as there are Customers in the table with flat file name set to the CustID.If you could point me in a good direction, It would be nice.Greetings from Belgium
Okay, can I assign line x of a flat file to a variable, parse that line, do my data transformations, and then move on to line x+1?
Any thoughts?
In other words, I'm looking at using a for loop to cycle through a flat file. I have the for loop set up, and the counter's iterating correctly. How do I point at a particular line in a flat file?
I am currently working on project where I need to insert, delete and update data from a text file I receive into multiple tables. So the file has multiple recordset types(50 to be exact) and each record has a a code to perform an -Update, Insert, or Delete to the destination table. Also when I receive the records they are not sorted. I need to sort the sets for each destination table and then read the the sorted set sequentially and perform the correct action.
Currently I am importing the record via flatfilesource into one column. I am using a script component here that would consist I guess 50 outputs including the fields needed for each table. The outputs are sorted by sortkey field and when I add the record to the output I perform data type transforms needed for each field(most are strings but I need to convert some dates, numbers ect..).
***is there a better way of accomplishing this?***
(ps. I could use the conditional split and the derive column into the 50 different table but it was giving me errors that were almost forcing me to use a nvarchar type instead of a varchar type during some of the field transformations.)
At this point I would need to read through each of the outputs sequentially and perform the update, insert or delete into the needed table. Would I have to create 50 script components with an ADO.net recordset adapter to update the tables for each of the outputs? I am hoping you can help come up with a better way to accomplish all of this.
Also if I do need to update the tables with the script component could someone point me to an example of how to programmatically accomplish that. Thanks for any help of suggestions that you may provide as I am feeling kind of stuck.
Trying to figure out the best method of reading in a number of flat files, all with different number of columns and data types and outputting them to a database.
Here's the problem: They are EBCDIC encoded and some of the columns are packed decimal. I've set up one package that takes the flat file, unpacks the decimal (Using UnpackDecimal component) and then sending the rest through a second component to go from EBCDIC -> ASCII.
What I need is a way to do this for every flat file based on the schema for that flat file. One current solution is to write a script/app to create the .dtsx XML file and then execute that for each flat file. It appears like this may be possible, but I haven't gotten far enough to know for sure. So my questions are this:
1) Is there an easier way to do this (ie somehow feed the schema to the package and use it to dynamically set up the column makers and determine which columns get fed to the unpack decimal component.
2) If there isn't a better way, will dynamically creating the .dtsx XML file based on the necessary input/output columns for each flat file work? If so, what is a good source of information on this (information about how the .dtsx XML file is set up, what needs to be changed/what doesn't, etc).
I am working on an SSIS project where I create two flat files for submission to a data contractor. This contractor requires a control record be the first line in the file. I create the control record based on the table information being exported.
What I would like to know is, is it possible to utilize the Header Section of the Flat File Destination Editor to insert the control record? And, as it is dynamic, what kind of coding must I do in order to utlise this functionality?
Hi-I have a sql database (2005) that I need to extract a report from that looks somehintg like SELECT * From Empl_Hours WHERE some_flag <> 'true' .The thing works fine, but the problem is this: I need to insert a record in the 1st row that looks like "Static_text"+row_count() +"more_static_text"where row_count is the actual # of rows that were retrieved. Thanks in advance for any help.DAn
16 flat files all fixed width. Some over 350 columns.
Open flat file 1
extract id and go see if its in table 1, if true update table 1 with first 30 columns
otherwise insert into table 1 first 30 columns.
goto table 2, lookup id, insert/update next 30 columns...etc..etc..for 10 different tables
So I've got my flat file source, I do a derived column to convert the dates, i've got a lookup for table 1, then 2 ole db commands, 1 for update if lookup successful, 1 for insert if lookup fails.
How can I pass the id as a param into the update command so it updates where x = 'x'
also I need a pointer on doing the next lookup, eg table 2, would I do this as some sort of loop?.
If you can help great, but, please don't just reply with "I'd use this object"...then no explanation of how
Probably a stupid and regularly asked question but I can't seem to find an answer, so here goes,
we have 16 .txt files, some with over 350 columns.
That info from each individual file needs importing to multiple sql tables.
need to look at sql table1 does record exist? if not create new then add in data once its been transformed eg datetime from yyyymmdd into datetime values [managed to get this using derived column] for first 20 columns, otherwise do update for the 20 columns...
then look at sql table2 and repeat for next n columns....
So I was wondering is it going to be better to write this as a dtsx package? if so can you point me to an example
or should I just write the code as part of a code behind page that scrapes the info and does a standard update/insert procedure?
We are trying to use SSIS Import export wizrd to import the flat files (CSV format) that we have into MS SQL Server 2005 database tables. We have huge number of CSV files. Is there a way by which we can import these flat (CSV) files in to corresponding SQL server tables in a single shot. I would really appreciate this help as it is painful to convert each and every file using the Import Export wizard.
I want to combine a series of outputs from tsql queries into a single flat file destination using SSIS.
Does anyone have any inkling into how I would do this.
I know that I can configure a flat file connection manager to accept the output from the first oledb source, but am having difficulty with subsequent queries.
I have a package that contains three database tables (Header, detail and trailer record) each table is connected via a OLE DB source in SSIS. Each table varies in the amount of colums it holds and niether of the tables have the same field names. I need to transfer all data, from each table, in order, to a flat file destination.
I am testing SSIS and have created a Flat File Destination. I defined the Flat File Connection as New for the first time and it worked fine. Now, I would like to go back and modify the Flat File Connection in the Flat File Destination Editor, but it allows only to create a New connection rather allowing me to edit the existing one. For testing, I can go back and create a new connection, but if my connection had 50-100 columns then it would be an issue to re-create it from scratch.
I am working in one company and currently I am assigned to new project for Data Migration from company X to our company Y using SSIS. I am totally new and i just completed 5 tutorial which was gien on MSDN website.
Basically client is going to send us first flat file with 1 million records with Header, Detail and Trailer records. I want to create a Package in such a way that it dumps all this first load into 7 to 8 different tables at a time. we also have to include functionlity for validation and error check. On successfull load error file should only return Header and Trailer but no detail records. If there are any errors then error file should contain Header, Detail records which were unable to load plus trailer which we have to sent back to client.
When 2nd file comes that time we have to check whether this is new records or change (update) one depending on Flag which tells it.
This is basically high level idea of my Package what i need to create. If u guys have any question then let me know.
I know you guys are very experienced one. Anyone of you please give me some detail idea on it I would really appricate it. I have very limited time line for it.