Parse Text File (csv, Fix Width), Perl Or Something Newer?
May 26, 2007
I need to parse some text files and load them to database - these files are mostly CSV files or fix width columns format and the column names (first line) may vary in text (e.g. different abbreviation), order and extra columns, etc.
Is it a good idea have this done using script task of SSIS? How it compare to Perl on performance? Or any other tools good for this?
Export to Fixed width text file I am trying to export a table to a fixed lenght text file, there is only flat file option and that does not put LF/CR at the end of row, is there any solution?
Export to Fixed width text file I am trying to export a table to a fixed lenght text file, there is only flat file option and that does not put LF/CR at the end of row, is there any solution?
We're having issues exporting a set of data from SQL to a fixed width flat text file by just doing a right click on the DB, then choosing Tasks > Export Data. You can not specify a row delimiter when you choose a Fixed Width format. The only way around this that we've found is to specificy char(13) and char(10) at the end of the SQL select statement. Without row delimiters you end up with 1 giant record rather than 20,000 regular sized records. Is there any other way around this that we're missing?
Using Ragged Right is not an option either since the record lengths will be inconsistent if the last field doesn't contain a consistent length to the data.
I am trying to export data from a query in SQL Server 2005 SSIS to a flat file destination. Everything works fine except the rows returned from my query are written to the flat file in one long string (i.e., without line breaks). I have tried appending a new line character to the rows returned from the query but that only throws an error when the package is executed. My rows returned from the query are 133 characters wide (essentially only one column per row) so I have set the properties accordingly for a fixed width file format with 133 character wide rows.
Any suggestions or ideas on how to correct this would be greatly appreciated.
I am trying to create a program that transfers tables to flat files. At this point in time, I have suceeded in created one that creates delimited files.
However, I am now trying to create fixed-width files as you can do with the SSIS designer, but programatically.
Is there a way to programatically determine the width of a column from the source table? I can not seem to find any kind of function or member that stores this information or allows me to retrieve it.
I know what I need to change in order to set a width for a column, but I just don't know how to find the width without just asking the user to provide one.
Hi,I been reading various web pages trying to figure out how I can extract some simple information from the XML below, but at present I cannot understand it. I have a MS SQL 2005 database with which contains a field of type text (external database so field type cannot be changed to XML)The text field in the database is similar to the one below but I have simplified it by remove many of the unneeded tags in the <before> and <after> blocks. I also reformatted it to show the structure (original had no spaces or returns) For each text field in the SQL table contain the XML I need to know the OldVal and the NewVal. <ProductMergeAudit> <before> <table name="table1" description="Test Desc"> <product id="OldVal"> </table> </before> <after> <table name="table1" description="Test Desc"> <product id="NewVal"> </table> </after></ProductMergeAudit>
In MS Access 2000 if I have a String such as:Column1Delta CC: 123Charley CC: 234Foxtrot CC: 890and I wanted to extact just the numbers in to a field called CCI could use this formula in a calculated field:CC: Mid([Column1],Instr(1,[Column1],"CC")+3,50)resulting in:CC123234890Any idea on what the code should be within a view in SQL Server?also -- what is a good reference that can help with these types ofproblems.Any help appreciated!RBollinger
I am trying to build a query on a SQL2000 text field which contains XMLstring. The query is like "select requestnumber from history whererequestnumber is like '%re1%'". As you can see in the following samplerecords, the xml string has database structure and the requestnumber isa node of the XML. I wonder if it is possible to have SQL server parsethis field and allow me to do the query. If not, any suggestion wouldbe appreciated as to how to store XML data in SQL2000. I am not sureif I misused the SQL2000 XML feature correctly. So far I pass the rawquery result to ADO and manipulate it in XMLDOM.The table is to capture history of changes in any record in mydatabase. So I need to keep it simple so any record from any table canbe stored in here. The structure of the table is like this:sysObjectNumber(int, not null)recordKeyValues(char(30), not null)archiveTime(datetime, not null)history(text, null)A sample record would be like the following:sysObjectNumber recordKeyValues arvhiveTime History=============== =============== =========== =======1728725211 ABC 2005-03-25 8:09:56.700<history><partnumber>ABC</partnumber><type>2140461537</type><color>1</color><remain></remain><lastdatein></lastdatein><lastdateout></lastdateout><lastquantityin></lastquantityin><lastquantityout></lastquantityout><threshhold>null</threshhold><usedby>user1</usedby></history>1728725211 ABC 2005-03-28 11:01:14.407<history><partnumber>ABC</partnumber><type>2140461537</type><color>1</color><remain></remain><lastdatein></lastdatein><lastdateout></lastdateout><lastquantityin></lastquantityin><lastquantityout></lastquantityout><threshhold>4</threshhold><usedby>user2</usedby></history>1728725211 ABC 2005-03-28 11:46:12.723<history><partnumber>ABC</partnumber><type>2140461537</type><color>1</color><remain>4</remain><lastdatein></lastdatein><lastdateout></lastdateout><lastquantityin></lastquantityin><lastquantityout></lastquantityout><threshhold>4</threshhold><usedby>user1</usedby></history>1728725211 ABC 2005-03-28 11:46:35.273<history><partnumber>ABC</partnumber><type>2140461537</type><color>1</color><remain>4</remain><lastdatein></lastdatein><lastdateout></lastdateout><lastquantityin></lastquantityin><lastquantityout></lastquantityout><threshhold>4</threshhold><usedby>user4</usedby></history>
In the For Loop, How to Iterate from Older flat files to Newer flat files based on File's Timestamp. If there are some older files in that folder, it should be processed first and then continue with the newer one.
I am trying to parse data separated through text (ie abc1, abc2, abc3, abc4, etc).
ID ParseData 1 [abc1.Pants/abc2.Orange hat /abc3.Purple shirt /abc4./abc5./abc6./abc7./abc8.] 2 [abc1.Gray Shoes/abc2.Striped jacket /abc3./abc4./abc5./abc6./abc7./abc8.] 3 [abc1.Blue jeans/abc2./abc3./abc4./abc5./abc6./abc7./abc8.]
New Data (abc1, abc2, abc3, etc each have a field in the new data set) ID ParseData abc1 abc2 abc3 abc4 abc5 abc6 abc7 abc8 1 [abc1.Pants...abc8.] Pants Orange hat Purple shirt 2 [abc1.Gray...abc8.] Gray Shoes Striped jacket 3 [abc1.Blue...abc8.] Blue Jeans
If I only want the data in between abc1 and abc2, between abc2 and abc3, etc, what would be the best way to do that?
My code so far looks like: DECLARE @string varchar(100) = '[abc1.Pants/abc2.Orange hat /abc3.Purple shirt /abc4./abc5./abc6./abc7./abc8.]', @searchString1 varchar(20) = 'abc1', @searchString2 varchar(20) = 'abc2';
SELECT newstring FROM dbo.SubstringBetween(@string,@searchString1,@searchString2);
This returns 'Pants.'How do I continue to parse between abc2 and abc3? between abc3 and abc4?And then continue to ID2?Should I be referencing the ParseData field instead of string of data that I want to parse?
I've got some machines that output text files after each shot of parts. I'd like to take the data in those files and parse it and insert it into a SQL Server database for future massaging. The text files look like the example I've posted below. Can SSIS parse out the set points and actual values even though the file isn't CSV or tab delimited and the data is kind of 'strewn' all over the report? Each report does have the exact same format so the report format doesn't change from report to report, just the data. Thanks in advance.
Ernie
WP4.57 C Y C L E P R O T O C O L
Order data 18.05.06 11:27:57
Order number : 2006 Recipe no. : 15761
Machine number : 7 Recipe name : Stabilizer Bar Innsulator
Machine Operator: 1257 Art.descrip.: Stabilizer Bar Grommet
Any way to parse out a text value (not varChar, using text data type) that is > than 8000 characters long? I'm looping through 1 big string passed to the DB that is pipe delimited, but I find myself needing the substring function to keep track of which segment I'm acting on (after an update, I then need to take that segment and remove it from the string)...but the subString function won't take anything larger than 8000 chars.
Say I have this string that is text data type...
'aaa|bbb|ccc|ddd|....'
..and so on, surpassing 8000 char length, how could you parse it out using the pipes as the delimter, then do an Update using that segment? Afterward, return to that string and find the next segment, then use it, and so on (in a loop). I tried using an update to set the string = replace(string, segmentJustUsed, '') to "erase" it, but replace can't take text as the datatype. Any help? Hope this isn't to confusing.
I have a data field in a SSRS Report that contains the requestor's User Id. Their ID is prefixed with "PRIV"...And I'm assuming that is the direct result of the network domain. I need to create a SSRS expression to determine if the User ID is prefixed with "PRIV" and then parse that out and use what's behind the "" as their true User Id.
example...."PRIVID123456" should appear as "ID123456" in the report data line.
What is the easiest way to get a large fixed width text file (200 columns) defintion into SSIS? To have to define each column with the ruler would be very cumbersome.
I trying to import from fixed width text files that may contain one or more empty rows at the bottom of the file (where an empty row is {CR}{LF}). By experimenting, I found it runs successfully with up to 32 blank rows, but with any more I get this:
Warning: 0x8020200F at Copy TBMO files to DW, TBMO text file source [1]: There is a partial row at the end of the file.
Error: 0xC0047038 at Copy TBMO files to DW, DTS.Pipeline: The PrimeOutput method on component "TBMO text file source" (1) returned error code 0x80020005. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.
Error: 0xC0047021 at Copy TBMO files to DW, DTS.Pipeline: Thread "SourceThread0" has exited with error code 0xC0047038.
Error: 0xC0047039 at Copy TBMO files to DW, DTS.Pipeline: Thread "WorkThread0" received a shutdown signal and is terminating. The user requested a shutdown, or an error in another thread is causing the pipeline to shutdown.
Error: 0xC0047021 at Copy TBMO files to DW, DTS.Pipeline: Thread "WorkThread0" has exited with error code 0xC0047039.
I have a subreport on my main report, and the subreport contains a matrix. When I run the report, the subreport seems to expand beyond the width of the main report. The matrix itself does not expand beyond the width of the body of the main report, but for some reason it seems that the subreport does. The subreport does not contain any headers that might be messing up the width. If I cut and paste the matrix from the subreport directly onto the main report, and remove the subreport, it prints fine. But as soon as I include the subreport on the main report it prints with blank pages because the subreport expands beyond the width of the body of the main report. I have checked the width and margins on the subreport and compared to the width of the main report and all looks good. Can anyone help?
I am attempting to import a fixed width file into a SS2005 table and am having problems when importing a date that has no value in it. The table will allow nulls.
The date is in dd/mm/yyyy foramt and when there is no date then there are 10 spaces. When transforming the data I TRIM the data down using a derive transform script so all there is, is an empty string. When the file attempts to load I get the following message:
[OLE DB Destination [2238]] Error: There was an error with input column "paid_date" (2306) on input "OLE DB Destination Input" (2251). The column status returned was: "The value could not be converted because of a potential loss of data.".
How can it potentially lose data when there is nothing to lose?
I need some way of converting the empty string into a null. Has anyone got any ideas for me?
I am transferring data from an OLEDB source to a Flat File Destination and I want the column width for all of the output columns to 30 (max width amongst the columns selected), but that is not refected in the Fixed Width Flat File that got created. The outputcolumnwidth seems to be the same as the inputcolumnwidth. Is there any other setting that I am possibly missing or is this a possible defect?
I was trying to import a fixed-width file to a sql 2005 table. The total record lenght is 1500. I was trying to import it to a single column.
The strange thing that's happening is: SSIS is inserting only the first 32 chars of the record and the remaining are gone. I tried using nvarchar(max) and varchar(max) but of no use. I think something somewhere is going wrong but I was unable to figure it out. Earlier I was able to load a similar file into a single column table.
My Header row delimiter is {CR}{LF} The preview pane shows the complete record but when it transfers to the table, I'm getting 32 chars only.
I have a series of fixed width files, all with the same schema. I need to import the data into a SQL Server table. Each record in the flat file begins with 'D1'. The length of each record (string) is 380. There are cases where the record ends after position 193, and a new record appears in the current string beginning at position 194. So at position 194 'D' appears, and '1' appears at position 195.
In the flat file, I need to insert a line break after position 193 if position 194 = 'D' and if position 195 = '1'. I'm guessing I would do this with a Script Component Transformation. Once the file is edited, then I can bring the data into the table.
What might the script look like? If you have any suggestions, samples, or know of examples on the web you can point me to, please share.
Currently we're working on an SSIS package to extract data from a SQL Server database to several fixed width flat files.
Some of the data needs to be formatted/converted in a certain way DateTimes need to be formatted in ISO8601Booleans need to be 0/1 instead of False/True...Has anybody any idea what the preferred approach (best practice) would be to do these conversions?Convert everything in the select query? What about readability of your query? Do it somewhere in the package? If so, how?....
I'm trying to read in a flat file (which, admittedly, has one very wide column), and it keeps breaking because of truncation when it tries to read in the file.
I've a report projects where each report has a number of columns and the spec is to have all the columns print-out on the same page. Is there a setting that will auto-scale these columns to fit to the page or will I have to edit the font size and widths manually on each report to fit to the page?
I'm using SSIS to do bulk inserts from fixed width files to about 20 tables in my SQL database.
The problem I'm running into is in creating Format Files for the bulk insert task to use. I've gotten the bcp command to create format files that will read csv files, but I can't seem to figure out how to get it to create one for fixed-width.
I know it can be done: http://msdn2.microsoft.com/en-us/library/ms191234.aspx At the bottom (Section F) it shows an XML format file for reading a fixed-width file. When I manually create one of these to match one of my tables, the bulk insert worked fine.
Closest I've come is with this ( [] bracketed items are correct values, just censored here): C:Program FilesMicrosoft SQL Server90ToolsBinn>bcp [database].[owner].[table] format nul -c -f C:TableFMT.xml -x -S[Server] -U[Username] -P[Password]
My question is, what is the bcp command to create this sort of XML format file?
Hi, There's a lot of information on importing data from text files, but not a lot on exporting data to text files... I've checked but found no info on this.
I'm trying to export data from SQL Server to a fixed-width flat file and wondering if I'm doing it the right way.
I use a view as source (using a OLEDB connection manager) and I can see the data without problem.
I defined a Flat File Destination (using a flat file connection manager). When setting up the flat file connection manager, I am asked for a file... Does this mean one should create manually a template file with the desired output format? So I used a production file as template since we're replacing an existing process.
After having set up everything, I run the SSIS only to see all the data on the same row. There are no CRLF...
When I create the file connection manager, there's no way to mention the row delimiter. In the properties I see a "Row Delimiter" field and when I try with "{CR}{LF}" it makes no difference. Interesting to note that, contrary to the HeaderRowDelimiter field, the RowDelimiter field has no drop-down control to give choices.
So I had to return the CRLF as the last field of the source view (SELECT .... ,'CRLF' = CHAR(13) + CHAR(10) FROM ...) to make it work.
Ok I am faced with working with XML on a regular basis, which is fine.
DECLARE @ViewSN INT IF NOT EXISTS (select null from tblviews where viewcode = 'loadAtTerm') --<workflowEventType>loadAtTerminal</workflowEventType> insert into tblviews (ViewName,Description,OutBoundForm,StoredProcSN,TriggersReply,ViewCode,DispXactLayer,DispXactViewType,DispXfcTag,Comments) select 'QC:WF-LoadAtTerminal','This View Corresponds to the XML for loadAtTerminal in Omnitracs Workflow','0',NULL,'0', 'loadAtTerm','MCOM','MCOM',NULL,NULL
[code]...
What would be really useful is to be able to present any xml file and automatically parse the NODE names into a memory variable table and then the fields of each node in another.
I wasn't sure where to put this topic so I put it here since I figured it is a question that would apply to virtually any version even though I am using SQL Server 2005.
We have a vendor that sends us a fixed width text file every day that needs to be imported to our database in 3 different tables. I am trying to import all of the data to a staging table and then plan on merging/inserting select data from the staging table to the 3 tables. The file has 77 columns of data and 20,000+ records. I created an XML format file which I sampled below:
The data file is a fixed width file with no column delimiters or row delimiters that I can tell. When I run the following insert statement I get the error below it.
BULK INSERT myStagingTable FROM '.........myDataSource.txt' WITH ( FORMATFILE = '.........myFormatFile.xml', ERRORFILE = '.........errorlog.log' );
Here is the error:
Msg 4832, Level 16, State 1, Line 1 Bulk load: An unexpected end of file was encountered in the data file.
Msg 7399, Level 16, State 1, Line 1 The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1 Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".