XML Source: String Types Lose Length Attributes In XML Schema?
Jan 25, 2007
I'm having a problem with the XML Source data flow component not transferring the length attributes from an XML Schema to the column attributes of the output table.
An example schema that I have is:
<?xml version='1.0' encoding='UTF-8'?><data xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><xsd:schema> <xsd:simpleType name='NameType'> <xsd:restriction base='xsd:string'> <xsd:minLength value='0'/> <xsd:maxLength value='50'/> </xsd:restriction> </xsd:simpleType> <xsd:element name='Name' type='NameType' nillable='true'/> <xsd:simpleType name='FamilyType'> <xsd:restriction base='xsd:string'> <xsd:minLength value='0'/> <xsd:maxLength value='50'/> </xsd:restriction> </xsd:simpleType> <xsd:element name='Family' type='FamilyType' nillable='true'/> <xsd:element name='row'> <xsd:complexType> <xsd:sequence> <xsd:element ref='Name'/> <xsd:element ref='Family'/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name='data'> <xsd:complexType> <xsd:sequence> <xsd:element ref='row' maxOccurs='unbounded'/> </xsd:sequence> </xsd:complexType> </xsd:element></xsd:schema> <!-- data follows --> <row><Name>Fred</Name><Family>Jones</Family></row></data>
When I reference file in the XML Source data control, it correctly infers that there are two columns, but the length of the strings in the columns are set as 255.
This behaviour appears to be at odds with the SSIS documentation (SQL Server Integration Services/Integration Services Object and Concepts/Data Flow Elements/Integration Services Sources/XML Source), which states (highlighting mine):
When the data is extracted from the XML data file, it is converted to an Integration Services data type. The XSD or inline schema may specify the data type for elements, but if it does not, the XML Source Editor dialog box assigns the Unicode string data type (DT_WSTR) to the column in the output that contains the element, and sets the column length to 255 characters. If the schema specifies the maximum length of an element, the length of output column is set to this value. If the maximum length is greater than the length supported by the Integration Services data type to which the element is converted, then the data is truncated to the maximum length of the data type. For example, if a string has a length of 5000, it is truncated to 4000 characters because the maximum length of the DT_WSTR data type is 4000 characters; likewise, byte data is truncated to 8000 characters, the maximum length of the DT_BYTES data type. If the schema specifies no maximum length, the default length of columns with either data type is set to 255. Data truncation in the XML source is handled the same way as truncation in other data flow components. For more information, see Handling Errors in Data.
Has anyone had any luck in getting string lengths automatically extracted from an XML document? If so, where I am going wrong?
I am using CRM 3.0 and have a requirement to list all the the tables, attributes names and display names and datatypes. Is there any easy way to export this information from the CRM tool or prepare a SQL query that will list the information?
[crossposted]Hi, I wonder if anyone might lend me a brain.I have a stock database to build that covers over 1000 products, whichmight be said to exist in around 50 product families.Obviously, just to be awkward all the types of stock will havedifferent attributes. So one product might be a tube withinside/outside diameter and length and another a T shaped cable joint.All I can come up with is a separate table for each stock type familyand store the table name and product code in the main stock table, so:Tables:ProdAProdBProdCStockStock attributes:ProdIdProdTableAmountDateetc..ProdA attribute:ProdIdAttributeXAttributeYAttributeZetc..Then use code to parse the table and product ID to select the correctquery to get the product details. BUT This seems awefuly inelegant andpotentially wrong so I'm loathe to continue down this route.Can anyone tell me the "right" way to do this, I feel sure it must bea classic db design exercise, but unfortunatly one they didn't teachus at University -- or maybe I was asleep...Thanks!
Apart the IDENTITY property, what other properties or attributes are not transferred to the target schema?
I know that one can use NOT_FOR_REPLICATION for identities, but I am interested in a (complete?!) list of metadata objects that transactional replication *prefers* not to transfer across to the target by default.
Is there a way in-code to determine the maximum length of a Integration Services Data Type.
I need to determine based on the data type what the maximum length of a column is IN-CODE.
However, the column.Length property only gives me a length for DT_WSTR and DT_STR values. This is the only property that would seem to remotely give me the right answer.
I need to know the maximum lengths in columns for DT_BOOL, DT_CY, DT_I2, DT_I4, DT_I8, DT_NUMERIC, and DT_UI1. I can always hard-code these values into my program, but that makes no sense. There has to be some sort of way to determine what the maximum possible length of these values are.
For numeric values I could use the column.Precision value but that still leaves with with a lot of data types without a maximum length.
I have extensively revied both of the design methodologies and I cannot come up with a single clear reason to use one over the other!
Type - Attributes is where you have a table holding the type categories, type, a table holding the type attributes expected and then a table holding the type attribute value:
Now the above sure is flexible in the sence that a type of automobile can be added without affecting the database schema, but was if some attributes do not take a numeric value? How do you handle computations on attributes specific attributes? Why would I use this structure as opposed to the super type - sub type as shown below?
Now, adding new sub types probably isn't very flexible but, now you can specify data types for each attribute instead of using sql_variant, which by the documentation cannot be used in aggregate functions and may render poor result when used with ADO.
Regardless of the method used, alot of back end coding is required for computations, what table to send the attributes, etc...
Can anyone please help me clarify. What method is best and why. So far I am leaning for option 2. More work but seems to be more flexible in the sence of customization of each datatype.
E.G., what if you wanted to specify attributes about the cap that can be supplied to trucks?
I am working in SQL Server Master Data Services Version 11.0.5058.0 (SP 2).
I have been asked to group all the financial attributes together. When I move one of the attributes up using the arrows, it works good jumping over one attribute at a time. Then I reach a section of attributes where it leap frogs over 24 attributes.
It appears these 24 attributes are in a subgroup but there are no attribute groups and I removed the subscription view from the entity. If I move one of the 24 attributes in the group, it moves it outside of the 24 attributes.
This is under leaf member attributes. There are no collection or consolidated groups.
I'm using a DW from Northwind database to build a cube to do some analitical taks. I already create the cube and now I am "cleaning" the dimensions. I'm having some difficults to understand the logical off this part. The reason is that When I create the Data Source View, I only import the Foreign Keys that connect the Dimensions to Fact_Table. I have to drag the attributes of Dimension from Data Source View to the tab attributes?
Imagine this:
I have the following dimension:
Dim_Customer: Customer_ID Name_Customer Job_Function Date_of_Birth Contact Address City Country
When I create the cube only Customer_ID appears in attributes tab, it's normal?
One more question:
I don't want to create a hierarchy like:
Customer ID -> Name_Customer Customer ID -> Date_of_Birth Customer ID -> Address Customer ID -> City Customer ID -> Country
My idea is to create the following hierarchy:
Name_Customer -> Date_of_Birth -> Address -> City -> Country
But the first hierarchy that I show is always appears to me. Do you know what is happens?
I have a SSIS package loading Excel file. The Excel Source automatically give the length of 255 for all text columns. However, some of the column may exceed 255 length.
I cannot change the length of Error output columns. "Error at Data Flow Task [Excel Source [508]]: The data type for "output "Excel Source Error Output" (517)" cannot be modified in the error "output column "F45" (2345)". Error at Data Flow Task [Excel Source [508]]: Failed to set property "DataType" on "output column "F45" (2345)". "
How to change it or truncate it to 255? I am using 64bit VS.
Hi, I am trying to import data from Oracle RDB into SQL Server 2005 using SSIS. Created a ODBC data source to connect to Oracle and used DataReader Source component and ADO.net to connect to the ODBC data source.
Under the Component properties tab, the SQL Command looks something like this.
Select ID, ADDRESS, REVISED from ADDRESS
The data type for the source columns are Integer, Varchar(30) and DATE VMS.
Now when I look at the Input and Output properties window,
The External columns has the following data types.
ID - four-byte signed integer [DT_I4] ADDRESS - Unicode string [DT_WSTR], length = 0 REVISED - database timestamp [DT_DBTIMESTAMP]
The Output columns has the following data types
ID - four-byte signed integer [DT_I4] ADDRESS - Unicode string [DT_WSTR], length = 0 REVISED - database timestamp [DT_DBTIMESTAMP]
When I tried to change the length of the ADDRESS on the output column, I get the following error.
Error at Data Flow Task [DataReader Source [1]]: The data type of output columns on the component "DataReader Source" (1) cannot be changed.
Is this the default length for the Unicode string type. I was not able to load the ADDRESS column as it gets truncated before I load it into destination. Even if I use Derived or Data Conversion transformation, the ADDRESS is getting truncated before it reaches this transformation.
Is there a way to control the types for output columns of a DataReader Source? It appears that any System.String will always come out as DT_WSTR. As I have my own managed provider, and I know what went in, I can say that really it should be DT_STR. The GetSchemaTable call from my provider will always say System.String as it does not have much choice, but GetSchemaTable does contain a ProviderType which is different for my DT_STR vs DT_WSTR, or rather when I want each. I think something like MappingFiles as used by the Wizard would work, but can I do anything today?
While run time these values are lets suppose @SSN = '999-000-000' & @State='ABC'
Now the Result is displayed with the state data Like 'AB' only.
Output: 1 999-000-000 AB
instead it should give system generated error.
Here I have 2 Questions: 1. Why it is taking 1st 2 Charecters? 2. Why it does not have any system generated for length?
I can do validation with Length function for these 2 variables however if have 100 variables then it should not feasible case. So, what is the reason behind?
I'm getting a bit lost in SSIS. I've got an Excel source file that I'm trying to load into a table. I keep getting validation errors that warn about not being able to convert between unicode and non-unicode string data types.
I'm trying figure out where I have to change this and am frankly confused. It seems SSIS is selecting various columns as unicode/WSTR data types, but I want them to import as regular string types.
On the Data Flow tab in SSIS, I right-click on the source Data Flow component (the Excel file) and select Show Advanced Editor. Then on the last tab, Input and Output Properties, there's a tree view for the Excel output. There are "External Columns" and "Output Columns" containers in the tree view.
I tried setting some of these but they don't seem to "take". Do I need to change the data type for each column under both the External and Output columns?
That seems like a lot of work! And, as I say, I tried setting some, but I still got the same validation errors. So, then I go back to this spot (Advanced Editor -> Input and Output Properties tab) and my changes seem to have been lost.
I'm trying to import some data from an Excel 2007 file into a SQL table. I created the Source Connection Manager and an OLE DB Source Data Flow Component which uses it. (Correct me if I'm wrong, but I can't use the Excel Source because of the version of Excel the file is saved in.) The outgoing Data Flow Path thinks some of the fields being imported should be of type float, when in fact they have alpha characters in them.
The fields in the database are defined as varchars.
A Data Conversion Transform doesn't seem right because I need the data to come out of the source as string data (which it actually is in the Excel file). Even if I convert it to string on the way to the destination, I would still be missing the original alpha characters.
How/Where do I change it (Source Connect Manager, OLE DB Source Data Flow Component, something else) to correctly identify the field's type?
I need to write a query that tells me which string values are empty orblank in a table.Is it possible to return the length of the string contained in characterfield?
I create a dataset from a sqlce 3.5 database (I drag-dropped the tables of the database on the DataSet).
And when I try to put a text with more than 4000 characters in the dataset, it throw me an exception ....
--> I tried to insert this text directly in sql ce and it works well! So I think that the DataSet has a limitation?? But the maxlength of the string coloumn of the dataset is set to 536870911
I'm using the XML Source to process a document with 17 elements in it. That leaves me with 17 outputs from the source.
Any time I make even the slightest change to the schema, it causes the metadata collection of the source to be updated. This causes everything in the data flow to have inconsistent metadata. This can take a very long time to fix, even though all I have to do is open the top transform and allow name mapping to work.
Is there any better way to make changes to an XML schema being used this way? Any tricks?
I have a serialized XML that I got from a dataset. In my 'Data Flow Task', I bind the 'XML Source' source to this XML file. Since the XML file is having the schema along with it, I check the 'Use Inline schema' option. However, when I put a dataviewer to see the rows getting sent to the destination, I see that no rows are getting transfered. As you will see from the XML file I am trying to use, I do have one row to transfer.
I tried kepping the schema file and the content file separate and that worked. I am not sure if there are any inherent issues I need to take care of, when using inline schemas to transfer data. I have the SP2 for SQL 2005 installed.
I've just started looking at SSIS and have encountered what should hopefully be a simple problem to solve. I have a pipe-separated source file that looks like this (I've added Line numbers for simplicity):
In addition to a header and footer records, this file contains three record types for each person.
Record types are identified by the second column.
Each record type has a different number of columns:
Type 100 has 5 columns Type 200 has 4 columns Type 305 has 12 columns
The Row delimiter for all records is the {CR}{LF} character
I've set up a flat file input source and specified {CR}{LF} as the row delimiter for both header and data rows and the "|" character as the field delimiter.
It appears that SSIS is assuming that because the first data row has 5 columns, then everything must fit that format too. So the {CR}{LF} character that separates lines 02 and 03 is interpreted as text rather than a separation character and all remaining | field separators after 305 are interpreted as text containing in the fifth column. SSIS is also complaining that the last row is incomplete.
A bit like this (I've used tildes to indicate column separation):
I've seen one other reference to this behaviour but the response seemed to be SSIS doesn't know which columns are missing. In this scenario, we don't have missing columns, rather, we have different types of record in a single file. in DTS I would effectively parse the file once for each record type thus:
I have a specification table that has some attributes defined. SpecId - Id of the specification Attribute - Attribute of the spec. (Like Color, HP etc) Value - Is the value of the attribute Then I have a car table that actually has information about the cars. Intention is to take each specification and match the cars that match the specification. If the car has more attributes than the spec, we ignore the extra attributes for the match. But if the car has less attributes, we don't even consider the car as a match (even if the attributes present, match). To summarize, the car's attributes should be >= spec's attributes.
The code I have below is bad because I am joining the same tables twice. In addition, it fails in the condition "the car's attributes should be >= spec's attributes"
INSERT INTO @Specification VALUES ('S1', 'Type', 'Sedan') INSERT INTO @Specification VALUES ('S1', 'Transmission', 'Auto') INSERT INTO @Specification VALUES ('S1', 'HP', '220')
INSERT INTO @Specification VALUES ('S2', 'Type', 'SUV') INSERT INTO @Specification VALUES ('S2', 'Transmission', 'Manual') INSERT INTO @Specification VALUES ('S2', 'HP', '300')
INSERT INTO @Car VALUES ('Accord', 'Type', 'Sedan') INSERT INTO @Car VALUES ('Accord', 'Transmission', 'Auto') INSERT INTO @Car VALUES ('Accord', 'HP', '220') INSERT INTO @Car VALUES ('Accord', 'Color', 'Black')
INSERT INTO @Car VALUES ('Escape', 'Type', 'SUV') INSERT INTO @Car VALUES ('Escape', 'Transmission', 'Manual') INSERT INTO @Car VALUES ('Escape', 'HP', '300')
INSERT INTO @Car VALUES ('Explorer', 'Type', 'SUV') INSERT INTO @Car VALUES ('Explorer', 'Transmission', 'Manual')
SELECT DISTINCT Spec.SpecId, Car.CarName FROM @Specification Spec INNER JOIN @Car Car ON Spec.Attribute = Car.Attribute AND Spec.Value = Car.Value WHERE Spec.SpecId NOT IN (SELECT Spec.SpecId FROM @Specification Spec LEFT OUTER JOIN @Car Car ON Spec.Attribute = Car.Attribute AND Spec.Value = Car.Value WHERE Car.CarName IS NULL)
I have a created a table and entered data into the table as follows:
CREATE TABLE t ( id INT , txtcol varchar(1000) )
INSERT INTO t ( id , txtcol ) VALUES ( 1 , 'ATXR_SOURCE_ID,CDDL_AG_PRICE,CDDL_ALLOW,CDDL_ALTDP_EXCD_ID,CDDL_CAP_IND,CDDL_CHG_AMT,CDDL_COINS_AMT,CDDL_CONSIDER_CHG,CDDL_COPAY_AMT,CDDL_DED_AC_NO,CDDL_DED_AMT,CDDL_DIS_PA_LIAB,CDDL_DISALL_AMT,CDDL_DISALL_EXCD,CDDL_DISC_AMT,CDDL_DP_PRICE,CDDL_FROM_DT,CDDL_PAID_AMT,CDDL_PF_PRICE,CDDL_PR_PYMT_AMT,CDDL_PRICE_IND,CDDL_REF_IND,CDDL_RISK_WH_AMT,CDDL_SB_PYMT_AMT,CDDL_SURF,CDDL_TOOTH_BEG,CDDL_TOOTH_END,CDDL_TOOTH_NO,CDDL_TOT_PA_LIAB,CDDL_UNITS,CDDL_UNITS_ALLOW,CGCG_ID,CGCG_RULE,DPCG_DP_ID_ALT,DPDP_ID,DPTC_CD,PDVC_LOBD_PTR,PSDC_ID,UTUT_CD' )
Now if i select data using the query below the txtcol field displays only 255 characters :
I have a MsAccess db containing a table called Employees which i am transforming to Sql server 2005. Everything is working fine. I am using Foreach File enumerator and uploading the files one by one.
However I now plan to validate the schema of MsAccess before uploading it. For eg: My employee table in msaccess is as follows :
Employees empId int, empName varchar(60), empAge int
Since the files come from different vendor, while looping, i want to perform a check if the empid or empAge are not of type long or are not null. If they are of type smallint,i have no problem.
However if they are larger datatypes than the the ones kept in Sql server, then the file needs to be logged in the db with the reason and moved to the error folder. In short, if the datatypes in access tables are smaller than those in Sqlserver, allow it, otherwise reject it.
THe schema of Sqlserver table is same as of that of Employees in msaccess.
I want to be able to programmatically set the schema location for an XML source. I first thought it would be a simple task using expressions and variables but it doesn't appear to allow anything in the way of setting it at runtime. Is this possible?
I have a MsAccess db containing a table called Employees which i am transforming to a staging table in Sql server 2005. Everything is working fine. I am using Foreach File enumerator and uploading the files one by one.However I now plan to validate the schema of MsAccess before uploading it. For eg: My employee table in msaccess is as follows :
Code Snippet Employees empId int, empName varchar(60), empAge int
Since the files come from different vendor, while looping, i want to perform a check if the empid or empAge are not of type long or string etc. If they are of type smallint,i have no problem.
However if they are larger datatypes than the the ones kept in Sql server, then the file needs to be logged in the db with the reason and moved to the error folder. In short, if the datatypes in access tables are smaller than those in Sqlserver, allow it, otherwise reject it. THe schema of Sqlserver table is same as of that of Employees in msaccess.
I want to compare the schema of the incoming access tbl fields with my desired schema and all mdb's having data types that are higher or incompatible with the desired schema should be moved to the error.
Is there a way to configure the XML source component to use a schema file whose filename is in a variable? Now I know it doesn't make any sense to have an XML source whose schema can vary, however, I have a parent package with a ton of child packages, all of which process the same XML input using the same schema. What I would like to do is for the parent to load the filename of the XML input file into a variable, and the filename of the XML schema file into another. Each child would have its XML source configured to pickup these variables from the parent, so that if the location or name of the schema changes, I don't have to edit each package one-by-one. However, it doesn't seem possible to specify anything but a filename for the schema. Any suggestions?
I have a few columns in table with default value defined as zero length string (''). I want to insert record from DetailsView which uses SqlDataSource as DataSource. In the ItemInserting event, if the data is not valid, I want to use zero length string for the column. But I always get Null instead of zero length string. The code in ItemInserting event looks like this: If objddl.SelectedIndex > 0 Then e.Values("myFld") = objddl.SelectedItem.ValueElse e.Values("myFld") = ""End If The line: e.Values("myFld") = "" put Null in the column. How can I set a column as zero length string using the SqlDataSource? Any help is appreciated. Thanks.
I am trying to use DTC to import about 500 records from an Excelsheet to an SQL database.
Some fields contain strings with a length of about 1700 characters.
When i'm trying to execute the DTC generated code it is giving an error and saying that the maximum string length of 255 has been reached. Althoug the column in the database is defined as nchar(1750).
SQL 2000.I need a query that will search a field in the table that is 14characters long and begins with a number.I have a client that was allowing people to store credit card numbers,plain text, in an application. I need to rip through the table andreplace every instance of credit card numbers with "x" and the last 4digits. I got the replace bit going, but I am stuck on how to searchfor a string in a field of a specific length.Any ideas?Thanks,--Mike
I reinstalled SQL Server, setup new connetions in my existing project and then pointed the existing controls in my SSIS packege to my new OLE DB Connection manager.
Error at Package: The connection "{35FE7FF5-A1F5-4016-8C11-0B88A90AE3F7}" is not found. This error is thrown by Connections collection when the specific connection element is not found.
Error at Package: The connection "{35FE7FF5-A1F5-4016-8C11-0B88A90AE3F7}" is not found. This error is thrown by Connections collection when the specific connection element is not found.
Error at Execute SQL Task [Execute SQL Task]: Connection manager "{35FE7FF5-A1F5-4016-8C11-0B88A90AE3F7}" does not exist.
Error at Execute SQL Task: There were errors during task validation.
We have a single generic SSIS package that is used to import several hundred iSeries tables into SQL. I am not looking to rewrite the process. But I am looking for ways to improve performance.
I have tried retain same connection, maximum insert commit size, lock table (tablock), removed some large columns, played with the log file location and size, and now I am working to tweak the defaultbuffermaxrows.
To describe the data flow task - there are six data flows tasks (dft) working at the same time. Each dtf has their own list of iSeries tables and columns and the corresponding generic SQL table names. Each dtf determines their list of tables based on the number of columns to import. So there is dft30 (iSeries table has 1-30 columns to import), dtf60 (iSeries table has 31-60 columns to import), etc. The destination SQL tables are generically called Staging30, Staging60, etc. Each column in the generic Staging tables are varchar(100). The dtfs are comprised of an OLE DB Source and an OLE DB Destination.
The OLE DB Source uses a SQL Command from Variable to build a SELECT statement. The OLE DB Source uses a connection manager that uses an IBM iAccess IBMDA400 provider. The SQL Command ends up looking like this for the dtf30. This specific example is importing from the iSeries table TDACLR and it only has two columns so it will be copied to the Staging30 table.
select TCREAS AS C1,TCDESC AS C2,0 AS C3,0 AS C4,0 AS C5,0 AS C6,0 AS C7,0 AS C8,0 AS C9,0 AS C10,0 AS C11,0 AS C12,0 AS C13,0 AS C14,0 AS C15,0 AS C16,0 AS C17,0 AS C18,0 AS C19,0 AS C20,0 AS C21,0 AS C22,0 AS C23,0 AS C24,0 AS C25,0 AS C26,0 AS C27,0 AS C28,0 AS C29,0 AS C30,''TDACLR'' AS T0 from Store01.TDACLR
The OLD DB Source variable value looks like the following, but I am not showing the full 30 columns
select cast(0 AS varchar(100)) AS C1,cast(0 AS varchar(100)) AS C2,cast(0 AS varchar(100)) AS C3,cast(0 AS varchar(100)) AS C4,cast(0 AS varchar(100)) AS C5, ... cast(0 AS varchar(100)) AS C30.
The OLE DB Destination uses OpenRowSet Using FastLoad From Variable. The insert into Staging30 ends up looking like this.
Of course we then copy and transform the Staging30 data to the SQL table that equals T0.
But back to defaultbuffermaxrows. Previously the dtfs had default values of 10000 for DefaultBufferMaxRows and 10485760 for DefaultBufferSize. I added a SQL task to SUM the iSeries column sizes, TCREAS and TCDESC in this example, and set the DefaultBufferMaxRows by dividing the SUM of the columns max_length into 10485760. But I did not see a performance improvement. Do you think that redefining the columns as varchar(100) for the insert is significant? Should I possibly SUM the actual number of columns (2) as 2x100 or SUM the 30x100?