So I have three lookups in a row in my data flow. Basically they are doing data quality checks for me using a reference table.
I want to be able to take the error flows of the three lookups and merge them together (union all) so that I can insert the "errors" (or non matches) into a table.
Can't do it. Because SSIS deems non-matches as "errors" you automatically get the errorCode and errorColumn fields. When you try to union a lookup error output with another lookup's error output, you can't do it.
What I would like to see is a lookup act more like a conditional statment where you have three outputs of a lookup table: match found, no match found, and error. Either that, or I'd like to be able to edit the names of the errorCode and errorColumn fields.
Am I missing something here, or do I need to just add an OLE destination for each lookup error flow when I only want one? 'Course the problem then is that I want to count the number of rows that are in "error" across all of the lookups.
I have a package that has several data flows that run concurrently after some initial tasks and an initial data flow. I want transactions on each of the data flows and have set the transaction option to Required on the data flows (not on the package itself). I am also using checkpoint restart on the package. A couple things are happening.
1) the first data flow is successful and that releases the several that are waiting. Some of these complete OK but inevitably one or two will fail. The failing data flows will be different from run to run, sometimes one and sometimes two will fail. The error says:
Error: 0xC0202009 at Provider_NF_Code, Delete Provider_NF_Code [130]: An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Distributed transaction completed. Either enlist this session in a new transaction or the NULL transaction.".
My hunch is that DTC is getting the transactions mixed up. I think it is committing one just after another data flow has already started work expecting the transaction to still be active. That would explain why the failing data flows are random. Plus, if I set the MaxConcurrentExecutables to 1 the entire package is successful. BUT, why have concurrent tasks if you can't run them concurrently.
2) when the package fails with the DTC problem I restart it with the checkpoint file. I was expecting the package to restart with the failed data flows. Instead, it restarts with the initial data flow (that all the other flows wait for in the package). This data flow has always been successful. It's as if the transactions I have put on the individual data flows are actually placed on a single virtual container that all of them are in, and when down stream data flows fail the entire data flow chain is rolled back and set to restart.
How can I get multiple concurrent data flows to run with transactions?
Why are successful data flows being restarted? Can I get just the failed tasks to restart?
Easy: read a SQL table with 500 fields, transform, write to flat file using SSIS.
But, I have hundreds of transformations to define using Lookups and Aggregates, Derived Column transformations. I wan to group the data flow transformations in usable (reasonable size) groups (packages, containers, subroutines, whatever you want to call it).
I cannot figure out a simple easy way of doing this most "simple" obvious thing.
Am I the only one on the planet who needs to do this?
Good Afternoon, I have a package with reads data from a text file and persist that data in a table "X" on sql server. I have one dataflow wich does this operation. In the sequence, i have this dataflow connected to the next dataflow, wich read the data from the table "X" and do some joins with another tables and persist that new data on a table "Y". The first problem, the table "Y" have a foreign key to table "X" and i have to do this operation with transaction, the package give problem, it's something like "Cannot enlist this connection on distributed transaction...", but i have many packages, wich I exec with transaction and runs sucessfully. If i turn off the transaction, runs perfectly, when i put the transaction,it fails.
I'm using a script task to split my control flow before performing different data flow tasks depending upon the result of the script.
The dataflow tasks populate a single dataset with data before entering a foreach loop where the data in the dataset is consumed.
The foreach loop is valid for both dataflow tasks but I can€™t get it to accept multiple inputs.
I could have 2 copies of the foreach loop, one for each branch, but this seems wasteful and also means I have to maintain the same chunk of code in 2 places.
Can I join control flows or is there a better way to achieve this?
I've 6 data flow tasks in my package. I need to put all of these dataflows into a transaction and rollback if any one of the task fails. I dont want to use MSDTC.
Perhaps one too many 2000 DTS packages have permanently damaged my ability to think clearly - however, I've find myself very frustrated attempting to create a SSIS Data Flow which replaces a very simple 2000 DTS package.
Take data from table1 in database1, put it in table2 in database2. Table2 in Database2 has an additional column as part of the primary key - so I need to add an arbitrary unique value in each row as it's inserted. Previously, I did this in the transformation script through a variable I incremented.
What's the recommend method to do this now - since row level processing of variables seem to be a no-no?
We receive data files from different external customers, and these files have identical layouts.
I'm planning to set up a package for each customer. Each package will contain a flat file source -> OLEDB transformation dataflow, (followed by other customer-specific data flows).
What I'd like to do is just create this dataflow once, parameterising the flat file and table names. Is it possible to include this dataflow in each customer package so that if the flat file layout changes, I can just modify the connection managers in the one place, and then recompile each package to pick up the changes?
I'm stuck here when I try to open an existing package which contains several data flows, SSIS tries to validate each data flow and after a while a Visual Studio error message pops up and I can't do anything.
The error message says : "Unable to cast COM object of type 'System._ComObject' to interface type 'Microsoft.SqlServer.Dts.Pipeline.Wrapper.IDTSObject90'. This operation failed because the QueryInterface call on Com component for the interface with IID '...GUID...' failed due to the following error : The application called an interface that was marshalled for a different thread. (Exception from HResult: RPC_E_WRONG_THREAD)"
So I have to make a fairly dynamic Data flow. I will get the most of the configuration from a database table. I will look up the name of the procedure to run as a source (I can use expressions or a script component source for this), I will lookup columns names from a database table.I can use expressions (maybe) or a destination script component for the destination including the destination table name and column names, these will be looked up in a database table.What I am not sure is how I will do the mapping. How can I make this dynamic? The logic for mapping will be in the database as well. Could I create a custom dataflow all in one script? A source, destination and mappings all in one script? Is there an example of this out there.my task ios to make the data flow completely dynamic.all config info would be kept in a SQL Server database.A complete custom script component dataflow task.
I have a package set up basically with two consecutive data flows. The first flow takes data from an OLE DB Source and stores it into a Flat File Destination. The second flow uses this same flat file as a source, alters the data, and stores the data in the same flat file, overwriting the old file. I set DelayValidation to True on the flat file. Still, here are the error messages I am receiving:
Error: 0xC020200E at DO, Flat File Destination [7676]: Cannot open the datafile "C:Temp.txt".
Error: 0xC004701A at DO, DTS.Pipeline: component "Flat File Destination" (7676) failed the pre-execute phase and returned error code 0xC020200E.
I am new to SSIS, so I'm sure I have a setting wrong or something. Is the problem that SSIS is trying to write to a file from which it is simultaneously reading data?
I have a package with 10 synchronous dataflows, which, combined, load about 300MB of flat file data to a database. This package would run successfully on 2 of our database servers, but would regularly fail on a third. The server on which it was failing is a 4 processor box with 16GB Ram with Windows Server 2003, SQL 2005, SSIS and SSRS installed - much more robust than one of the others that the package worked on. The SSIS error messages returned alternated between the following (with no apparent reason why one would show up rather than another, though the first was the most common):
"The file name "\Server1Folder1File1.txt" specified in the connection was not valid."
"The file name property is not valid. The file name is a device or contains invalid characters."
"An error occurred while initializing the flat file parser."
For the first error message, the error would report different connection managers and their associated file as invalid from run to run. All of the files across the 10 dataflows resided in the same network folder, and the package would read in and process a few of them before failing, so the problem was definitely not the connection string.
Searching the forums, etc. for these errors provided no useful information - given the real cause of the problem, these error messages are worse than unhelpful, they send you looking in the wrong direction. It was only when trying to track down another problem on the same server that I discovered the issue. When trying to copy database backups greater than 12GB over the network to this server, the operation would fail with an "Insufficient System Resources" message.
Some research led to the discovery that problem was caused by the /3GB switch in the boot.ini file of the server (don't let your Server team use that switch if you have 16GB of memory or more). Removing the switch and setting SQL to utilize AWE, fixed both the file copy problem AND the SSIS package failure problem. The SSIS package failed, not due to a bad connection string, but rather to insufficient server resources (read memory) to handle the simultaneous connections.
I hope this may help any others trying to track down this kind of SSIS package failure.
I will also provide here what I have gleaned about setting up Memory usage for SQL Server 2005 running on 32 bit Windows Server 2003 (with the caveat that I am no expert €“ corrections and additional information are welcome).
The following links got me started in my research (thanks to the folks who provided such useful information): http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=55191 http://articles.techrepublic.com.com/5100-10878_11-6091280.html http://www.simple-talk.com/community/blogs/brian_donahue/archive/2007/09/30/37747.aspx http://blogs.technet.com/askperf/archive/2007/03/23/memory-management-demystifying-3gb.aspx http://www.modhul.com/2007/11/10/optimising-system-memory-for-sql-server-part-i/
Also, search BOL for: Server Memory Options Enabling Memory Support for Over 4 GB of Physical Memory Enabling AWE Memory for SQL Server
Windows Server 2003 provides access to 4GB of virtual address space. By default, 2GB is assigned to the OS and 2GB to applications. This default can be change to 1GB for the OS and 3GB for applications by the use of the /3GB switch in the boot.ini file.
Physical memory over 4GB can be addressed by enabling Physical Addressing Extensions (PAE), which is done by setting the /PAE switch in the boot.ini file. This does not increase the systems virtual address space, rather it increases the size of the page table (which is maintained within the virtual address space), adding entries to reference the physical memory above 4GB.
It is important to note that these two switches are not interdependent (they do different things and you can turn each on or off regardless of the others status), though the combination of them has an impact on server performance and the maximum amount of physical memory which can be addressed.
The /3GB switch only impacts the allocation of the first 4GB of memory (virtual address space) between the OS and applications (default 50/50 % split, with switch on - 25% OS and 75% applications). The /PAE switch enables the system to reference/manage physical memory above 4GB, but does not alter the allocation percentages of the first 4GB of memory between the OS and applications. However, when PAE is enabled, the OS requires more memory within the first 4GB to manage the physical memory above 4GB (due to increased page table entries). With the /3GB switch, the OS has only 1GB of virtual address space, and only enough space to manage a total of 16GB of physical memory. If 32GB of physical memory is installed, 16GB of it will go to waste.
Address Windowing Extensions (AWE) is an API that allows an application to address more than the 2-3GB of memory that is available to applications within the virtual address space (first 4GB of memory). SQL Server can utilize AWE to take advantage of memory above the first 4GB that is made available via PAE, and can even reserve portions for its own use. I believe (though I can€™t remember where I got this bit) that SQL utilizes AWE memory only for the page cache (buffer pool €“ which seems to be a misnomer), and not for other operations.
To enable AWE, see the BOL references above.
The big question: what are the recommended settings for all of these? That all depends on what you have running on the server. You need to leave space for the OS, SQL Server and any other applications you have.
The hard and fast rules: If you have more than 4GB of RAM, you must use the /PAE switch in order to take advantage of it. If you have more than 16GB of RAM, you must NOT use the /3GB switch in order to take advantage of it.
Based on anecdotal evidence, I€™ve noticed the following generally recommended guidelines €“ assuming the server is dedicated to SQL.
Use of the /3GB switch seems to be a generally accepted practice if you have 8GB of RAM or less. For between 8 and 16GB, some say never use the /3GB switch, others say you can use it up to 12GB and still others up to 16GB. I interpret this to mean that it all depends on what types of loads are being placed on the server and that testing on individual servers will be required to determine whether or not to use the switch. Certainly that was my experience - the /3GB switch worked fine with 16GB RAM, until the server encountered a certain workload. For me, no more /3GB switch.
For setting SQL to use AWE, most seem to agree that it should be enabled if you have more than 4GB RAM. The setting of max server memory is more complicated. BOL seems to suggest (the €˜Server Memory Options€™ entry) a formula of Total Physical Memory minus 1-2GB for the operating system. Based on a desire to be a bit more conservative, I am now using the following formula:
max server memory = total physical memory
minus
4GB for the OS and application processes (since the AWE memory is utilized for page cache, not SQL processes)
minus
AWE memory required by other applications, including other instance of SQL Server
If anyone has additional insight, or a more refined equation, I could certainly benefit from it.
Hi, My process passing 1,000,000 rows to a data flow with about 20 lookups to get the keys that I wantted . Most lookups have small number of rows except one with over 5,000,000 rows. I cannot get the process to run (the process hanged) probably because of memory issue. Any clue where/how I can tune it. Thanks
I want to do a lookup on date column. My lookup date is of type smalldatetime, and my date is of type datetime (date with time component). My lookup is failing because of incompatible data types.
How do I perform the lookup with date columns having date and time components?
Im having a little diff speeding up the below process
I have two tables table a contains 300,000 rows which are unique however the identifier can appear more than once. The fields im interested in are the identifier and a date/time field.
Table two also contains an identifier and a date field and again can contain multiple instances of the identifier with a variyity of dates.
For each row in table a i want to review the date and look up table b for the first date greater then or equal to the date linking by the identifier.
i have managed to do this via the code below however it takes 45 mins and i want to speed this up.
Select a.*, (select Min(DateB) as DateB From #tableB b where a.identifier = b.identifier and b.DateB >= a.DateA) asDATE From #TableA a
I have a source table with few million rows in it. As a part of transformation, I need around 10 lookups in 10+ different tables, all of them having few million rows each. I am looking for an approach that would be reasonably speedy and easy to manage future changes.
Here are some of the things that I have tried... (1) If I implement as lookup components, they cause developer machine to go really slow and takes forever to run. (2) I tried having OLE DB Source query to fetch required data up front. But the source query becomes very complicated which will be even harder for future changes. And this big query cause SQL Server to become unresponsive. (3) Update queries on target table are also causing server to be unresponsive.
What would you guys suggest for this type of implementation?
I have some SQL experience, but nothing past basic commands. I'm trying to take some data held by an application to use as CSV import into another application.I have two tables from an application, one holds references made in another.The first tables holds details about a person:
field1=name field2=age field3=country
Joe,50,1
Country is held as a number, then there is another table that holds all the countries:
Hi, We use lookups to join a few huge tables in SSIS (each has more than 40 million rows). The process took almost two days to complete when we select partial load on lookups. It stops/locks if we select full load on lookups.
We have a 32bit server so SSIS uses only 2-3GB of available memory no matter how big RAM we have. It seems the best solution for my problem is to move to 64bit server so SSIS uses up to 16GB of Ram.
For now I am researching for a remedy solution to get better performance from our current environment while we are waiting for the big server.
I’d like to hear your thoughts and options that may improve the performance of our package. Dose partitioning help? What else could be helpful?
I have a dimension table for Retail Order Size. Each row in the dimension has a Starting Value and Ending Value column pair. In TSQL, the correct RetailOrderSize key is found by using the BETWEEN statement, like so:
SELECT RetailLevelKEY
FROM dbo.DimRetailOrderSize
WHERE @Sec1Retail BETWEEN StartingValue AND EndingValue
Is there a Data Flow Task Transformation in SSIS that replicates this functionality, or some other way of getting to the same answer in SSIS?
When do a fact table load...I have to perform lookups against the dimension tables. The dimensions tables I have support slow changes, however, and thus have multiple rows for a single legacy key under different effective start and end dates. In order to do this lookup, I have to not just join on the legacy key, but also validate that the date of the transaction I'm loading is between the effective date range of the dimension item.
It seems the Lookup task only supports equijoins. Am I missing something here? How is this accomplished if you can use greater than or equal to and less than type join conditions?
Ive got an ETL process I have written which takes about 10 million rows from a staging database and loads it into production database with an INSERT statement. The INSERT statement makes a function call to retrieve the surrogate key for each row. The function looks in a replicated copy of our production database so no load is on our production environment during this time.
So: INSERT INTO foo(...) SELECT name, address, zip, dbo.fnGetSurrKey( name, address)
It took about 12hrs to insert 6 million rows last night and Im wondering if there is a better way of doing this. Maybe a multithreaded way like SSIS might have.
Assuming my function is optimized as much as possible, does anyone have any tips for speeding this up?
Also, the machine this ran on has 16gb of RAM but was setup to use only 2GB during this process. I have already changed it to 12gb and restarted the process a week ago, but the change doesnt take affect until you reboot. Would I see a significant performance increase from that?
In many of my packages I have to translate an organizational code into a surrogate key. The method for translating is rather convoluted and involves a few lookup tables. (For example, lookup in the OrgKey table. If there is a match, use that key; if not, do a lookup of the first 5 characters in the BUKey table. If there is a match, use that key; if not, do a lookup of the first 2 characters... You get the idea.)
Since many of my packages use this same logic, I would like to consolidate it all into one custom transformation. I assume I can do this with a script transform, but then I'd lose all the caching built into the lookup transforms.
Should I just bite the bullet, and copy and paste the whole Rube Goldberg contraption of cascading lookup transforms into each package? Or is there a better solution I'm overlooking?
In the dataflow of my package, I must check from one table whether a row exists, and if that row exists, I should get some other row from another table, and update that..
I think to check whether a row exists, i should use "Look Up"
But cant we pass parameters to LookUP?
I am trying to use this SQL:
SELECT count(*) FROM ServicePackets where ID = ? and CHANGEDATE > ? and status = 1
I should get if that row exists or not only... (true or false)
OK I have this table I am grbbing from Oracle and I need to take selected columns and do a value lookup against another table: IE Here is a list of fields I get from Oracle:
In this example it is only 2 but in other conversions it could be 20 or more... Now here is my select statement for each Field (The ? being the FIELD from before IE Status or Responsibility):
The is for Status:
Code Snippet SELECT VALUE AS STATUS_VALUE FROM Field_Values WHERE (NAME = 'Project Name') AND (ENUMID = ?) AND (FIELDNAME = 'Status')
This is for Responsibility:
Code SnippetSELECT VALUE AS RESPONSIBILITY_VALUE FROM Field_Values WHERE (NAME = 'Project Name') AND (ENUMID = ?) AND (FIELDNAME = 'Responsibility')
So for this the way I am doing it now is I have 2 "Lookup" Components setup... It works fine... However as I said when I get say 20 or so it gets really tiresome. I was wondering if I could feed it a EXCEL or XML file saying these are the fields that need a value lookup where:
FIELD - Project - FIELDNAME VALUE - OUTPUT VALUE
So with this example I would have a file saying something like this: STATUS - Project Name - Status - STATUS_VALUE RESPONSIBILITY - Project Name - Responsibility - RESPONSIBILITY_VALUE
Then it runs whatever and returns the *_VALUE for each row it goes through... Any suggestions?
Why can you not turn off the caching in a Lookup against Oracle?
I have an exceedingly complicated SQL statement like this -
SELECT OBJECT_ID, OBJECT_CODE FROM OBJECT_TABLE
If I turn off the cache for a lookup I get bombarded with this rubbish-
Error 8 Validation error. DFT Load STATUS: LKP Get RESULT_NO [128]: An OLE DB error has occurred. Error code: 0x80040E14. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E14 Description: "ORA-00933: SQL command not properly ended ". Update.dtsx 0 0
Error 9 Validation error. DFT Load DAY_STATUS: LKP Get RESULT_NO [128]: OLE DB error occurred while loading column metadata. Check SQLCommand and SqlCommandParam properties. Update.dtsx 0 0
I have tried modifying the Cache SQL Statement as well, but to no avail. I am using the MSDAORA.1 provider against "Oracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit Production".
I had used lookup on DIM table to get my SUK and if I use union transformation to get the out put from each lookup and then loading the data with some condition the data in my fact is not loading in a proper format.
The union transformation is splitting the out put in to different records
Please do inform me about which transformation should be used to get the data from lookup tables.
Or please do inform me the approach to load the fact table in SSIS.
I€™m basically INFORMATICA resource and I€™m implementing in terms of INFORMATICA
Hello, I'm trying to clean my data using fuzzy lookup algorithm though SSIS, but i get null values everywhere. This is what i did:
I applied the fuzzy lookup in a table (tblValues). As source table i have the tblValues, and as reference table in Fuzzy Lookup i have the tblValues as well, resulting null values in all fields/columns.
Do i have to create my own reference table? If yes, how do i do that and what values will i have in this table?I didn't understand how the reference table must be in order the algorithm to work. Any suggestions?
Our existing DW's ETL was written in a very complex fashion by the previous team. They use DTS package lookups to read a row in the Source SQL Server database see if that row exists in the taget SQL Server database. If the row does not exist, they use ActiveX scripts to INSERT the row in the target SQL Server database. If it exists, they update the row on the target side. How would you do this in SSIS? Apologize if this sounds like a basic question, however, I would have done this via Stored Procedures or SQL Scripts especially since it involves SQL Servers alone. Appreciate any help.
I notice that SQL Server 2005 creates worktables where SQL 2000 does not. Often these work tables appear in STATISTICS IO, but they show a 0 scan count and 0 logical reads. These worktables often appear to be substituted for bookmark lookups.
Has the optimizer decided to use worktables instead of bookmark lookups (often resulting in a higher cost plan)?