OK I have this table I am grbbing from Oracle and I need to take selected columns and do a value lookup against another table: IE Here is a list of fields I get from Oracle:
In this example it is only 2 but in other conversions it could be 20 or more... Now here is my select statement for each Field (The ? being the FIELD from before IE Status or Responsibility):
The is for Status:
Code Snippet
SELECT VALUE AS STATUS_VALUE
FROM Field_Values
WHERE (NAME = 'Project Name') AND (ENUMID = ?) AND (FIELDNAME = 'Status')
This is for Responsibility:
Code SnippetSELECT VALUE AS RESPONSIBILITY_VALUE
FROM Field_Values
WHERE (NAME = 'Project Name') AND (ENUMID = ?) AND (FIELDNAME = 'Responsibility')
So for this the way I am doing it now is I have 2 "Lookup" Components setup... It works fine... However as I said when I get say 20 or so it gets really tiresome. I was wondering if I could feed it a EXCEL or XML file saying these are the fields that need a value lookup where:
FIELD - Project - FIELDNAME VALUE - OUTPUT VALUE
So with this example I would have a file saying something like this:
STATUS - Project Name - Status - STATUS_VALUE
RESPONSIBILITY - Project Name - Responsibility - RESPONSIBILITY_VALUE
Then it runs whatever and returns the *_VALUE for each row it goes through... Any suggestions?
Hi Guys I have installed SqlServer2005 and client on my machine and just after the installation / configuration I am getting tons of log in Event Logs of .NET Runtime Optimization Service as following: .NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Began compiling: Microsoft.SqlServer.SqlTDiagM, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 .NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Succesfully compiled: Microsoft.SqlServer.msxml6_interop, Version=6.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 .NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Began compiling: Microsoft.SqlServer.msxml6_interop, Version=6.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91.NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Succesfully compiled: Microsoft.SqlServer.DTSRuntimeWrap, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 .NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Began compiling: Microsoft.SqlServer.DTSRuntimeWrap, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 .NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Succesfully compiled: Microsoft.SqlServer.ManagedDTS, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 .NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Began compiling: Microsoft.SqlServer.ManagedDTS, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91.NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Succesfully compiled: Microsoft.SqlServer.ForEachSMOEnumerator, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91.NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Began compiling: Microsoft.SqlServer.ForEachSMOEnumerator, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 .NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Succesfully compiled: Microsoft.SqlServer.DtsMsg, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 .NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Began compiling: Microsoft.SqlServer.DtsMsg, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 .NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Succesfully compiled: Microsoft.SqlServer.DTSPipelineWrap, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 .NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Began compiling: Microsoft.SqlServer.DTSPipelineWrap, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 .NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Succesfully compiled: Microsoft.SqlServer.PipelineHost, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91.NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Began compiling: Microsoft.SqlServer.PipelineHost, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 .NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Succesfully compiled: Microsoft.SqlServer.SqlTDiagM, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 .NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Began compiling: Microsoft.SqlServer.SqlTDiagM, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91.NET Runtime Optimization Service (clr_optimization_v2.0.50727_32) - Succesfully compiled: Microsoft.SqlServer.msxml6_interop, Version=6.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91 Almost 10 message in each seconds....Any Body please help me to get rid of this. Thanks & RegardsVishal Sharma
Hi, My process passing 1,000,000 rows to a data flow with about 20 lookups to get the keys that I wantted . Most lookups have small number of rows except one with over 5,000,000 rows. I cannot get the process to run (the process hanged) probably because of memory issue. Any clue where/how I can tune it. Thanks
I want to do a lookup on date column. My lookup date is of type smalldatetime, and my date is of type datetime (date with time component). My lookup is failing because of incompatible data types.
How do I perform the lookup with date columns having date and time components?
Im having a little diff speeding up the below process
I have two tables table a contains 300,000 rows which are unique however the identifier can appear more than once. The fields im interested in are the identifier and a date/time field.
Table two also contains an identifier and a date field and again can contain multiple instances of the identifier with a variyity of dates.
For each row in table a i want to review the date and look up table b for the first date greater then or equal to the date linking by the identifier.
i have managed to do this via the code below however it takes 45 mins and i want to speed this up.
Select a.*, (select Min(DateB) as DateB From #tableB b where a.identifier = b.identifier and b.DateB >= a.DateA) asDATE From #TableA a
I have a source table with few million rows in it. As a part of transformation, I need around 10 lookups in 10+ different tables, all of them having few million rows each. I am looking for an approach that would be reasonably speedy and easy to manage future changes.
Here are some of the things that I have tried... (1) If I implement as lookup components, they cause developer machine to go really slow and takes forever to run. (2) I tried having OLE DB Source query to fetch required data up front. But the source query becomes very complicated which will be even harder for future changes. And this big query cause SQL Server to become unresponsive. (3) Update queries on target table are also causing server to be unresponsive.
What would you guys suggest for this type of implementation?
I have some SQL experience, but nothing past basic commands. I'm trying to take some data held by an application to use as CSV import into another application.I have two tables from an application, one holds references made in another.The first tables holds details about a person:
field1=name field2=age field3=country
Joe,50,1
Country is held as a number, then there is another table that holds all the countries:
Hi, We use lookups to join a few huge tables in SSIS (each has more than 40 million rows). The process took almost two days to complete when we select partial load on lookups. It stops/locks if we select full load on lookups.
We have a 32bit server so SSIS uses only 2-3GB of available memory no matter how big RAM we have. It seems the best solution for my problem is to move to 64bit server so SSIS uses up to 16GB of Ram.
For now I am researching for a remedy solution to get better performance from our current environment while we are waiting for the big server.
I’d like to hear your thoughts and options that may improve the performance of our package. Dose partitioning help? What else could be helpful?
I have a dimension table for Retail Order Size. Each row in the dimension has a Starting Value and Ending Value column pair. In TSQL, the correct RetailOrderSize key is found by using the BETWEEN statement, like so:
SELECT RetailLevelKEY
FROM dbo.DimRetailOrderSize
WHERE @Sec1Retail BETWEEN StartingValue AND EndingValue
Is there a Data Flow Task Transformation in SSIS that replicates this functionality, or some other way of getting to the same answer in SSIS?
When do a fact table load...I have to perform lookups against the dimension tables. The dimensions tables I have support slow changes, however, and thus have multiple rows for a single legacy key under different effective start and end dates. In order to do this lookup, I have to not just join on the legacy key, but also validate that the date of the transaction I'm loading is between the effective date range of the dimension item.
It seems the Lookup task only supports equijoins. Am I missing something here? How is this accomplished if you can use greater than or equal to and less than type join conditions?
Ive got an ETL process I have written which takes about 10 million rows from a staging database and loads it into production database with an INSERT statement. The INSERT statement makes a function call to retrieve the surrogate key for each row. The function looks in a replicated copy of our production database so no load is on our production environment during this time.
So: INSERT INTO foo(...) SELECT name, address, zip, dbo.fnGetSurrKey( name, address)
It took about 12hrs to insert 6 million rows last night and Im wondering if there is a better way of doing this. Maybe a multithreaded way like SSIS might have.
Assuming my function is optimized as much as possible, does anyone have any tips for speeding this up?
Also, the machine this ran on has 16gb of RAM but was setup to use only 2GB during this process. I have already changed it to 12gb and restarted the process a week ago, but the change doesnt take affect until you reboot. Would I see a significant performance increase from that?
So I have three lookups in a row in my data flow. Basically they are doing data quality checks for me using a reference table.
I want to be able to take the error flows of the three lookups and merge them together (union all) so that I can insert the "errors" (or non matches) into a table.
Can't do it. Because SSIS deems non-matches as "errors" you automatically get the errorCode and errorColumn fields. When you try to union a lookup error output with another lookup's error output, you can't do it.
What I would like to see is a lookup act more like a conditional statment where you have three outputs of a lookup table: match found, no match found, and error. Either that, or I'd like to be able to edit the names of the errorCode and errorColumn fields.
Am I missing something here, or do I need to just add an OLE destination for each lookup error flow when I only want one? 'Course the problem then is that I want to count the number of rows that are in "error" across all of the lookups.
In many of my packages I have to translate an organizational code into a surrogate key. The method for translating is rather convoluted and involves a few lookup tables. (For example, lookup in the OrgKey table. If there is a match, use that key; if not, do a lookup of the first 5 characters in the BUKey table. If there is a match, use that key; if not, do a lookup of the first 2 characters... You get the idea.)
Since many of my packages use this same logic, I would like to consolidate it all into one custom transformation. I assume I can do this with a script transform, but then I'd lose all the caching built into the lookup transforms.
Should I just bite the bullet, and copy and paste the whole Rube Goldberg contraption of cascading lookup transforms into each package? Or is there a better solution I'm overlooking?
In the dataflow of my package, I must check from one table whether a row exists, and if that row exists, I should get some other row from another table, and update that..
I think to check whether a row exists, i should use "Look Up"
But cant we pass parameters to LookUP?
I am trying to use this SQL:
SELECT count(*) FROM ServicePackets where ID = ? and CHANGEDATE > ? and status = 1
I should get if that row exists or not only... (true or false)
Why can you not turn off the caching in a Lookup against Oracle?
I have an exceedingly complicated SQL statement like this -
SELECT OBJECT_ID, OBJECT_CODE FROM OBJECT_TABLE
If I turn off the cache for a lookup I get bombarded with this rubbish-
Error 8 Validation error. DFT Load STATUS: LKP Get RESULT_NO [128]: An OLE DB error has occurred. Error code: 0x80040E14. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E14 Description: "ORA-00933: SQL command not properly ended ". Update.dtsx 0 0
Error 9 Validation error. DFT Load DAY_STATUS: LKP Get RESULT_NO [128]: OLE DB error occurred while loading column metadata. Check SQLCommand and SqlCommandParam properties. Update.dtsx 0 0
I have tried modifying the Cache SQL Statement as well, but to no avail. I am using the MSDAORA.1 provider against "Oracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit Production".
I had used lookup on DIM table to get my SUK and if I use union transformation to get the out put from each lookup and then loading the data with some condition the data in my fact is not loading in a proper format.
The union transformation is splitting the out put in to different records
Please do inform me about which transformation should be used to get the data from lookup tables.
Or please do inform me the approach to load the fact table in SSIS.
I€™m basically INFORMATICA resource and I€™m implementing in terms of INFORMATICA
Hello, I'm trying to clean my data using fuzzy lookup algorithm though SSIS, but i get null values everywhere. This is what i did:
I applied the fuzzy lookup in a table (tblValues). As source table i have the tblValues, and as reference table in Fuzzy Lookup i have the tblValues as well, resulting null values in all fields/columns.
Do i have to create my own reference table? If yes, how do i do that and what values will i have in this table?I didn't understand how the reference table must be in order the algorithm to work. Any suggestions?
Our existing DW's ETL was written in a very complex fashion by the previous team. They use DTS package lookups to read a row in the Source SQL Server database see if that row exists in the taget SQL Server database. If the row does not exist, they use ActiveX scripts to INSERT the row in the target SQL Server database. If it exists, they update the row on the target side. How would you do this in SSIS? Apologize if this sounds like a basic question, however, I would have done this via Stored Procedures or SQL Scripts especially since it involves SQL Servers alone. Appreciate any help.
I notice that SQL Server 2005 creates worktables where SQL 2000 does not. Often these work tables appear in STATISTICS IO, but they show a 0 scan count and 0 logical reads. These worktables often appear to be substituted for bookmark lookups.
Has the optimizer decided to use worktables instead of bookmark lookups (often resulting in a higher cost plan)?
I need to do a 4 column lookup against a large table (1 Million rows) that contains 4 different record types. The first lookup will match on colums A, B, C, and D. If no match is found, I try again with colums A, B, C, and '99' in column D. If no match, try again with column A, B, D, and '99' in Column C. Finally, if no match in any of the above, use column A, '99' in B, '99' in C, '99' in D. I will retreive 2 columns from the lookup table.
My thought is that breaking this sequence out into 4 different tables/ lookups would be most efficient. The other option would be to write a script that handled this logic in a single transform with an in-memory table. My concern is that the size of the table would be too large to load into memory.
Is there any particular throughput/network/memory advantages to using Sql Compact files as lookup reference table sources, particular for static or largely static data, and if the SSIS package execution servers are running jobs remotely, aka, not "on the database server"?
I've been experimenting using Sql Compact as a OLEDB lookup source for reference data (business key => surrogate key), for example, using an OLEDB connection manager with the following connection string.
Data Source="C:\ISRoot\Cache\Cache.sdf";Provider=microsoft.sqlserver.mobile.oledb.3.0;
All I am in trying to clean and standardize the data during the ETL processes using the €œLookup Data Flow Transformation€? in SSIS€¦ I am able to clean data by replacing the values in columns with values from a reference table, using an exact lookup to locate values in a reference table. What I would like to do is €œif there is NO exact match€? replace it by e.g. zero or some other value which means €œno reference data available€?, how do I do this? Any help is much appreciated. Thanks, Manojkumar
i'd like to use ssis on a certain project but am concerned that one of my transformations needs lookup results to be based on actions taken on previous lookups and that the toolkit doesnt really offer something like that.
so, i have a dataflow whose first component extracts certain kinds of data from an xml document.
each row returned by the latter needs a lookup but the results of that lookup may dictate a certain kind of update. The next row's lookup may need to be influenced by the previous row's update.
So I think I have two challenges, 1) combining a lookup and update, 2) making sure the buffer architecture completes one lookup and update before the next lookup begins.
Have a situation where I need to check 100+ columns in the dataflow against lookup values to make sure all values are valid, and wanted to take a poll. Would it be better to
1) load the data into a working table and use traditional stored procedure (either NOT IN or LEFT OUTER JOIN where x is null) in order to weed out my bad values against my lookup table.....example
SELECT a.Col1 b.Col2 FROM Table1 a LEFT OUTER JOIN Table2 b ON (a.JoinCol = b.JoinCol) WHERE b.JoinCol IS NULL
This results in poor performance b/c my temp work table is not really optimized for joins over to the lookup tables & I have so many columns that I don't really want to add all these indexes - my thoughts were that the index builds would take longer than the table scans.
OR
2) Use a huge number of lookup transforms in my data flow and keep it all in SSIS.
#1 is easier to maintain (my opinion) for future purposes but slower b/c I don't want to deal with indexes on the work table b/c it will be highly volatile. So - less cumbersome but slower
#2 will be more difficult to maintain b/c of the sheer # of lookups (since I can't change the SQL statement @ run time I have to put them all in separate lookups). Probably will run faster though b/c I won't have to deal with the transfer of the data to the db and also will avoid the table scans from #1. So - more cumbersome but faster.
Let's say I have 4 columns coming from my OLE DB source.
Column1 Column2 Column3 Column4
I also have a table that I'll be using in a lookup, LUPTable. In LUPTable, I have two fields, LUPField, ReplaceField.
In my data flow, I need to take columns, 2-4, and look them up against LUPField in LUPTable. I then need to add the value of ReplaceField (when a match is found) into the data flow.
The problem that I'm running into is that I don't want to sequentially do the lookups in the dataflow, because that's just a waste of time/memory. I only need to build the in-memory lookup table once, because that exact same data (it is static, for the most part) will be used for the remaining lookups.
What is the best way to achieve this?
The goal is to have the following columns remaining in the dataflow: Column1 NewColumn2 (containing value from ReplaceField) NewColumn3 (containing value from ReplaceField) NewColumn4 (containing value from ReplaceField) Column2-4 can be dropped from the dataflow after the lookups.
I am just getting started studying SSIS with Kirk Haselden's "Integration Services" book. The problem I am trying to solve would seem easy enough to solve in code, but I am still early in the book and would like to be able to focus on the aspects of SSIS that would help me expedite this with SSIS, or to find out early whether what I need to do cannot be easily done.
The problem itself is simple enough: I have a database of roughly 100 tables. Ignoring the poor normalization in the database for the moment, my more pressing problem is that that I need to rekey all of the main OLTP tables from a mashup of different key schemes to UNIQUEIDENTIFIERS. For example, Client table is presenrly keyed on an INT, Client Number. ClientFile table is keyed as FileType = NVARCHAR(2), ClientNumber INT, FileNumber INT (incrementing, meaningless number). Child tables to ClientFile are the same key structure as ClientFile, plus yet another (incrementing, meaningless number) INT, etc like this. I would like to know if and how or where I should be looking to convert the Client table to a UNIQUEIDENTIFIER key, and the same for the ClientFile, makes its key also a GUID, and have a reference to the new Client tables GUID key as a foreign key in the ClientFile table, and on and on like that. The Client is at the top of the food chain.
In essence, I would like to have every table's key be called ID and be a UNIQUEIDENTIFIER (GuidRow), and I would like for example, the ClientFile table to reference the Client table with a column named ClientID. I would like ClientFile's children to have a foreign key called ClientFileID, and their own keys to be ID (RowGuid).
There are also several lookups in each table where, of course, the actual string values were stored instead of a key to the value (i.e. full state or country name instead of a code from a state or country lookup table) that I need to convert to something more sensible, like replacing the state name with a state code and a country name with a country code and link to appropriate respective tables. :) In fact, some of the values I need to break out from columns could also be keyed with RowGuids as well and I would much prefer to use those than string or INT keys.
Other than those problems, most of the rest of the data in those tables could essentially ba a straight copy operation since the source database is SQL Server 2000 and moving to SQL 2005 (one notable exception is that I am converting ntext columns to nvarchar(MAX) columns.
I am assuming this is probably ridiculously simple and I just haven't found my way there yet, but I still have much of this book and the help files to go through and the Index didn't give me any comfort that this was something I will or will not be able to do easily.
The real help I am looking for is two fold: a) somebody tell me to stop reading this 700 page (very well written) tome if I would be better off writing this all in code myself, and b) if this is something that most of you could do with SSIS with both hands tied behind your back, please at least help me focus on the important transforms and tools so that I don't smend a month becoming a data warehouse wizard and ultimately not solve the problem I am most concerned with.
Please be mecriful with the heat, I have already confessed that I am new to this and am scrambling to come up to speed as fast as I can, but am beginning to think this problem is either to trivial for coverage in this book, or perhaps just not what SSIS was designed to do.
An SSIS task imports data from a flat file and inserts the data into a staging table. The staging table holds the data in its raw form. A second process then selects the data from the staging table, looking up the foreign key id's for raw data values, and then inserts the data into the live table.
SQL - Only key columns shown for clarity -- Staging Table CREATE TABLE Staging (Information VARCHAR(10), MachineName VARCHAR(10), Status VARCHAR(10))
[code]...
The insert into the live table should look up the id for machine 1, and the id of status success and insert the foreign key values into the live table for the row.There could be 1000's of rows for the output of machine 1 all with different status's - (all pre set in the Status table, i.e success, failure, rerun) and the same for lots of other machines held in the machine table.
What is the best to insert this data all in one go, rather than reading each row of the staging table one by one, looking up the foreign key values depending on the machine and status values, then inserting the data.
I was thinking along the lines of:
INSERT INTO dbo.LiveTable (Information, MachineID, StatusId) SELECT Staging.Information, dbo.Machine.MachineId, dbo.Status.StatusId FROM dbo.Staging JOIN Machine ON Machine.MachineName = Staging.MachineName JOIN STATUS ON Status.Status = Staging.Status But I notice the problem with this is, it doubles up the inserts!
I have implemented a single lookup and would like to know the optimal approach to implement multiple lookups within a single €œdata flow task€? i.e. my question was if I had to look up multiple reference tables to obtain surrogate keys. I am oversimplifying for illustration purposes€¦
I have an SSIS package with around 25 lookups. Developing the package itself was slow. Now, everytime I try to load the package it takes forever and whenever I execute it I get an error.
Here are my questions:
1. Is there a way I can optimize the package? 2. Is it abnormal to have so many lookups? I am loading a dimension table with many fields and I need to look up on 25 tables to get the keys. I know one alternative is to use left joins in the source query and get the keys in the Source itself but we can have more visibility of what's happenning with Lookups. I would like to know other possibilities with lookups.