Accessing A Lookup Table From Inside A Transform Script Component
Feb 6, 2007
I have a requirement to access a lookup table from within an SSIS Transform Script Component
The aim is to eliminate error characters from within the firstname, lastname, address etc. fields by doing a lookup of an ASCII code reference table and making an InStr() type comparison.
I cannot find a way of opening the reference data set from withing the transform.
I'm creating a new Integration Services Project that copies data out of a SQL 7 server, transforms it, and places the data on a SQL 2005 (SP 2) Server. When defining a lookup transformation, if I specify an OLE DB Connection to my server running SQL 7 as the reference table, as soon as I click on the Colums tab, Visual Studio closes / crashes and dumps me to windows. I don't get an error message. If however I specify a connection to a server running SQL 8, or SQL 2005, no problems.
Is this supposed to happen?
My workstation is running Windows XP Pro SP2, Visual Studio 2005 Pro.
Microsoft SQL Server Integration Services Designer Version 9.00.1399.00
The server that doesn't work for a reference table is running Windows 2000 Server SP4 SQL 7.00.623
In a Data Flow, I have the necessity to use a SSIS variable of type €œObject€? inside Script Component and assign to it the content of 'n' variables of string type. On exiting from the script the variable of type object should contain something like in the following lines: AAAAAAAAAAAAAAAAAAAAAAAAAAAAA BBBBBBBBBBBBBBBBBBBBBBBBBBBBB CCCCCCCCCCCCCCCCCCCCCCCCCCCCC DDDDDDDDDDDDDDDDDDDDDDDDDDDDD €¦€¦€¦€¦€¦€¦€¦. €¦€¦€¦€¦€¦€¦€¦. On exiting from the data flow I will use the variable of type Object in a Script Task, by reading each element in a cyclic fashion. Is there anyone who have experienced something like this? Could anyone provide any example of that? Thanks in advance!
No idea where this bug crept in from. Have been using SSIS for 1.5 years now without hitting this problem.
I had a script component opening an XML document and parsing it using XPATH. I added some code that uses StreamReader / Streamwriter (closing one stream before starting the other). The code works without issue in my C# app.
And it ran without issue 2-3 times in SSIS. Then suddenly after running my package again, the script component says it completes successfully, yet nothing happens. I set a breakpoint on the first line of code - it never hits it. I add a msgbox as the first line of code - and it never displays.
I then close my package / exit out of ssis ... and then re-open it. When i open my script component, all of my code is GONE. All references that I added are gone.
I tried adding the streamreader/writer process to a dll I created from my c# app ... and added the DLL to the package -- same result.
I can reproduce this on 2 different computers.
Anyone experience this problem ? Any idea how to stop it ? Or debug it ?
Here is a slimmed down code sample of what causes the error :
Public Class ScriptMain Public Sub Main() Try Dim xmlDoc As New XmlDocument xmlDoc.Load("c:ulkasync_86281519_20070628045850225_4.xml") MsgBox("xmlLoaded") --this doesn't display once the package starts "acting up" Catch ex As Exception MsgBox(ex.Message) UpdateXML("c:ulkasync_86281519_20070628045850225_4.xml", ex.Message) End Try Dts.TaskResult = Dts.Results.Success End Sub Private Sub UpdateXML(ByVal fileName As String, ByVal message As String) Try Dim invalidChar As String = message.Trim().Substring(message.Trim().IndexOf("0x"), 4) Dim rd As StreamReader = New StreamReader(fileName) Dim xml As String = rd.ReadToEnd() Xml = Xml.Replace(invalidChar, String.Empty) xml = xml.Replace("", String.Empty) xml = xml.Replace("<![CDATA[<![CDATA[", "<![CDATA[") xml = xml.Replace("]]>]]>", "]]>") MsgBox("replaced") rd.Close() Dim wr As StreamWriter = New StreamWriter(fileName) wr.Write(xml) wr.Close() Dim xdoc As XmlDocument = New XmlDocument() xdoc.Load(fileName) Catch ex As Exception UpdateXML(fileName, ex.Message) End Try End Sub End Class
I have a very simple problem I am trying to solve.
I have a table with a "DateEntered" field, and I have an ssis pkg set up to load data from a file into the database table. I just want to make sure that no one loads the same file twice in one day.
For example, if today is 8/22/07, and "DateEntered" is "2007-08-22", then I want to add a Lookup transform to run a query that will check and see if there's any rows in the table with a "DateEntered" is "2007-08-22". If so, don't load the file again!
Here's my query:
SELECT Code FROM myTable WHERE DATEADD(dd, DATEDIFF(dd, 0, DateEntered), 0) = DATEADD(dd, DATEDIFF(dd, 0, GETDATE()), 0)
(all the dateadd stuff is doing is removing the time portion from the DateEntered field, so we are comparing apples to apples).
Now, if the query returns a bunch of "Codes" then we know that the data has already been entered for the day! So far, so good.
Now, how do I set up the Lookup to get it to work? I'm getting this error message: Error 1 Validation error. Data Flow Task: Lookup [1299]: The lookup transform must contain at least one input column joined to a reference column, and none were specified. You must specify at least one join column. FXRateLoader.dtsx 0 0
But I thought I did this! On the columns tab, I have: Lookup column: code Lookup operation: Replace 'code' Output alias: code
I have my error output set to: Lookup output - redirect row
Hi! I am a newbie, grateful for some help. I have a Source Ole DB w sql-command selecting the customer.salary and customer.occupation, which I want to match with demo_id in Ole DB destination. salary, occupation also in dim_demographic. But in Lookup editor I find no column demo_id... how do I do this?
I have a ETL that have a Lookup transform to get a rate from a table SpotRates.
The problem is when the match od some date in SpotRates Table doens't exist...
And for that records I need to lookup for next date...
For example...
SpotRate Table
Date Currency Rate
05-04-2006 0:00 DOLAR ESTADOS UNIDOS 1,2262
06-04-2006 0:00 DOLAR ESTADOS UNIDOS 1,2312
07-04-2006 0:00 DOLAR ESTADOS UNIDOS 1,2179
10-04-2006 0:00 DOLAR ESTADOS UNIDOS 1,2099
11-04-2006 0:00 DOLAR ESTADOS UNIDOS 1,2105
12-04-2006 0:00 DOLAR ESTADOS UNIDOS 1,2125
13-04-2006 0:00 DOLAR ESTADOS UNIDOS 1,2094
18-04-2006 0:00 DOLAR ESTADOS UNIDOS 1,2252
19-04-2006 0:00 DOLAR ESTADOS UNIDOS 1,2346
20-04-2006 0:00 DOLAR ESTADOS UNIDOS 1,2346
21-04-2006 0:00 DOLAR ESTADOS UNIDOS 1,2315
24-04-2006 0:00 DOLAR ESTADOS UNIDOS 1,2365
25-04-2006 0:00 DOLAR ESTADOS UNIDOS 1,2425
When I first try to lookup the date 17-04-2006, doesnt give me any records... and I need to create a new lookup for the next date from 17-04-2006. And in this example the next date is 18-04-2006.. How can I do it??
I made a sql query date gives me the next date with 2 parameters ... but I'm having some errors...
SELECT TOP 1 Data FROM Spot_Rates WHERE (Currencies_Name = ?) AND (Data > CONVERT(DATETIME, ?, 102)) ORDER BY Data DESC
In this exampple, the parameters returned from lookup1 is:
Currencies_name= 'DOLAR ESTADOS UNIDOS'
DATE='17-04-2006'
I need to create a second lookup transform to return the next date/currency for each row that didnt match in the first lookup...
I want to do something relatively simple with SSIS but can't find an easy way to do this (isint it always the case with SSIS )
I have a column lets say called iorg_id, and I want to lookup the matching rows for this col in a table. In this table iorg_id may have several potential matching rows. In this table there is another col called 'Amount'. I want to retrieve for each iorg_id the matching iorg_id in the other table but only the row with the largest value in the 'Amount' col.
I couldn't find a way to do this all in the Lookup Transform. I can match the iorg_ids and retrieve the Amount column, but can't find a way just to retrieve the matching row with the largest value in the Amount col. The only way I can think to do this is then run the output from the Transform through an Aggregate function and determine the Max (although haven't tested this yet).
Seems strange to me in that the SQL in the Advanced tab gives me something like: select * from (select * from [dbo].[Table1]) as refTable where [refTable].[iorg_id] = ?
where I believe the first 'select *' is retrieving all the cols that are listed in the LookupColumns list in the Columns tab. I thought I would be able to amend this to something like: select max(amount) from (select * from [dbo].[Table1]) as refTable where [refTable].[iorg_id] = ?
but I get a metadata type error.
So, questions are: Is it possible to do this all in the Lookup Transform are do I have to use the Aggregate function as I think ? Why is it not possible to amend the sql in the Advanced tab to manipulate the returned data ?
I am trying to digest this logic, and have been unsuccessful so far. I am designing a package for incremental loads, but the destination table has a composite primary key on 2 columns, one of which is nullable. The source data comes from a SPROC. Uptill now, I have been banging my head trying to get this logic to work via the Lookup transform with a conditional split, but it doesn't work. Am I on the right track, or should I be using the SCD Wizard?
As a side note, I have been trying to work a solution using Andrew's blogpost on doing incremental loads: http://sqlblog.com/blogs/andy_leonard/archive/2007/07/09/ssis-design-pattern-incremental-loads.aspx
Please indulge my ignorance, as I have only been using SSIS for a couple of weeks. I'm trying to create a data warehouse using two input tables. A column needs to be added to one table by using a lookup into the second table. SSIS seems to handle the "no matches" and "single match" cases perfectly. I can't for the life of me figure out how to properly handle multiple matches. SSIS defaults to the first match, but I need to compute the "best" match.
Is it possible to use a VARIABLE in the Lookup Transform? I am setting the cache mode to partial and have modified the caching SQL statement on the advanced tab to include the parameterized query, but the parameter button only allows me to select columns to map to the parameter. I need to use a variable instead. I see the ParameterMap property of the transform in the advanced editor, but don't see how I can use this to map to a variable.
Can this be done, or do I need to use a new source, sort and left join component to accomplish the same thing?
i'm creating a Lookup programmatically. But i can't find out how to assign the ConnectionManager that references the lookup data. Do you have an example for me?
i configure the lookup transform in the data flow task "Input column has a data type that cannot be matched".
This is the query that i use to set the reference table dataset
select firstname, lastname, address, email from customers_dimension cd , cust_test ct where cd.address<>ct.address.
I basically want to try and find all those records that have the same firstname, lastname, email in the customer dimension table where the records do not match. Both the input fields and the lookup fields have the same data type [varchar(max)].
It is pretty confusing, so much so that i did the lookup against the exact same table and got the same error.
Does anyone have a better idea as to what the problem is?
Thankyou
P.S.-This is the caching statement in the advanced tab select * from
(SELECT firstName FROM Customers_Dimension) as refTable where [refTable].[firstName] = ?
I'm trying to lookup a value in another table linking on a column of datatype DT_R8. The lookup transform is complaining that I can't link on that datatype. However, the documentation says that it should work. I'm using the April CTP. Is this fixed in a later version? Any suggestions?
I am having problems with a lookup transformation. I have a row in my lookup table for blank ('') source data. If I test the join using SQL the match is made, but the Lookup transform doesn't consider it a match and sends it to error output. Is there a property that I don't have set correctly or something else I am forgetting?
Has anyone find solution for this problem. i also checked http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=298056&SiteID=1 and http://blogs.conchango.com/kristianwedberg/archive/2006/02/22/2955.aspx
Suppose have a Dimension table
DimColor ---------------------------- ColorKeyPK(smallint) ColorAlternateKey(nvarchar(30)) -1 UnknownMember 1 2 Blue 3 Red 4 Black
Color with the ID 1 is empty string
FactOrders --------------------------- OrderID Date Color Quantity
OrderID = 1 Color = 'Black' Quantity = 10 OrderID = 2 Color = 'Red' Quantity = 20 OrderID = 3 Color = '' Quantity = 10 OrderID = 4 Color = 'Blue' Quantity = 5 OrderID = 5 Color = Black Quantity = 10
When i use the Lookup transform it cannot find the ColorKeyPK
The result of the Lookup transform is. ------------------------------ OrderID = 1 Color='Black' ColorKey=4 OrderID = 2 Color='Black' ColorKey=3 OrderID = 3 Color='Black' ColorKey=NULL ----> This is the problem Lookup cannot find empty string. It should be 1. OrderID = 4 Color='Black' ColorKey=2 OrderID = 5 Color='Black' ColorKey=4
I've a simple lookup transform in SSIS 2008 (R2). I've created it with a full cache and it worked fine. When i switch to partial cache, it will give me this error:
-------------------------------------------------------------------------------------------------- TITLE: Package Validation Error ------------------------------ Package Validation Error ------------------------------ ADDITIONAL INFORMATION: Error at DFT_AdventureWorks [Lookup [411]]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.
[Code] ....
I've created a OLE source with the following query :
SELECT SalesOrderID, OrderDate, CustomerID FROM Sales.SalesOrderHeader
And this will flow into the lookup transform and this has the following lookup reference query:
SELECT CustomerID, AccountNumber FROM Sales.Customer WHERE CustomerID % 7 <>0
I have a package that works fine in development. I move the package over to test and it fails validation in the lookup transform.
Error 46 Validation error. Data Flow Task - PO Lines Interface: Lookup - LIST PRICE [29621]: output column "LIST_PRICE_PER_UNIT" (29667) and reference column named "LIST_PRICE_PER_UNIT" have incompatible data types. SPO_TO_ORACLE_PO.dtsx 0 0
What strikes me as odd is the fact that I don't have a way of specifying the data types. I just specify the column I wish to return as a new column with the same name. Anyway, why would this work in one instance but not another?
I am using the lookup transform with the following settings:
Reference table: Use results of an SQL query. The query retrieves the surrogate key and four business key columns from a dimension table which contains a few thousand rows.
Columns: business keys in the incoming data are mapped to the business keys in the reference table, and the surrogate key is looked up from the reference table.
Advanced: Enable memory restriction is OFF (and the other items on the Advanced tab are greyed out).
I assumed that this means that the lookup transform would cache all of the rows in the SQL query, and then perform the lookups against this cache. This is the behaviour that I saw when I was running the package in my local environment in the BIDS debugger.
However, a colleague was doing some profiling on our production database server, and noticed that the lookup transform is instead issuing a single SQL query for each row of input. Our production ETL server has many GBs of free RAM available (way more than enough to cache the contents of the lookup table in memory), and given that memory restriction is disabled, I don't understand why the lookup transform is behaving in this fashion. Does anyone have an explanation for this? I'm probably misunderstanding a key concept here.
Hi I am currently trying to write a custom transform componet in c# that will take a row of data, perform a look-up via an external system, then if there is a match then send the data from the extranel system down macth ouptut (which will have different columns to the input) and drop the data that was read, else send the data down the unmacthed output which will be the same as the input.
So I would like to write a synchrons transform becuase I don't need read all the rows from the input buffer before I started processing, also I wish have millions of rows load in memory.
Can this be done? also does any have explame code of how to do this? becuse I can't see how to send data down the match output buffer, as this will have the lookup results data which will have diffent columns to the input data and how disgard the input data as well.
In my data flow, I am reading addresses from a CSV file. Then for each row, I would like to execute a process from the command line which outputs the latitude and longitude for the address, parse the output, and add the latitude and longitude into the pipeline. To call the process, I am using a script component transform. Here's my code:
Dim m_Latitude As Double
Dim m_Longitude As Double
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim street As String
Dim city As String
Dim state As String
Dim zip As String
street = Row.address
city = Row.city
state = Row.state
zip = Row.zip
Dim p As Process = New Process()
p.StartInfo.FileName = "C:\GeoCodeDotNet.exe"
p.StartInfo.Arguments = String.Format("""{0}"" ""{1}"" ""{2}"" ""{3}""", street, city, state, zip)
p.StartInfo.WorkingDirectory = "C:\"
p.StartInfo.UseShellExecute = False
p.StartInfo.CreateNoWindow = True
p.StartInfo.RedirectStandardOutput = True
AddHandler p.OutputDataReceived, New DataReceivedEventHandler(AddressOf ConsoleDataReceived)
p.Start()
p.BeginOutputReadLine()
If p.WaitForExit(10 * 1000) Then
Row.Latitude = m_Latitude
Row.Longitude = m_Longitude
Else
p.Kill()
Row.Latitude = 0.0
Row.Longitude = 0.0
End If
End Sub
Private Sub ConsoleDataReceived(ByVal sender As Object, ByVal e As DataReceivedEventArgs)
Dim output As String() = e.Data.Split(New [Char]() {" "c})
m_Latitude = CDbl(output(0))
m_Longitude = CDbl(output(1))
End Sub
I'm just getting very weird behavior. First of all, at the point where I assign values to Row.Latitude and Row.Longitude, m_Latitude and m_Longitude don't always have valid values (e.g. - they are unassigned). Secondly, after attempting to process the first couple rows, it just stops. In my data flow, the script component is yellow, but execution has ended, and the final step of writing to the output CSV file has not even started. Finally, in the directory where my source CSV file is located, I get a SQL dump file with the following content:
I'm guessing this all has to do with some kind of threading/concurrency thing and how the data flow pipeline works. Could someone please shed some light on this?
By the way, the script component transform is synchronous. Much thanks.
I work in the healthcare area, and am handling the survey data ETL's. There are around 8 different survey areas and based on information received from them for the visit they reference, I want to pull in more info from our invoicing database. My idea is this:
1.) Pull in the flat file to an ODBC staging table 2.) Cache all invoice records that fall between the MIN(Date of Service) and MAX(Date of Service) from the staging table. 3.) First lookup the information needed on patientID, providerID, date of service, and billing location. 4.) For the surveys that didn't match on those 4 columns, try looking up based on patientID, date of service, and billing location (since I could be 99% sure this would still return the record I need). 5.) For the remaining surveys, lookup based just on patientID and date of service. These records will be flagged for manual review because clearly, if a patient has multiple appointments in the same day, this will be prone to error.
However, in trying to use only 3 of the columns in the lookup, I get the error saying basically that I need to utilize all 4. Is there a way around this, or is there an entirely different way I should be approaching this? The reason I thought cache transform was the answer is because I will need to run a different package for each lookup, as the data and logic between each survey will vary, but the invoice data "pool" will stay the same regardless.Â
I would like to know what happens when a very large reference data set for a lookup transform with full caching enabled is getting loaded during package execution and the computer memory runs out or is very low. Does SSIS a) give an out of memory error of some sort b) resort to a no caching or partial caching mode c) maintain the full caching mode but will switch to using the paging file(virtual memory).
I think it will resort to using the page file in which case the benefits of in memory lookups are lost and performance would suffer. If I cannot upgrade the memory or shrink the reference set somehow, i should switch that lookup task to use partial caching or no caching with an indexed lookup table. Would this make sense?
Would anyone happen to have any pointers or know of any good code examples to either programmatically change the type of an input column when it is passed through the component, or add a new column to the output? I am extracting data from an Oracle database which is in Julian date format (represented within SSIS as a DT_NUMERIC column) and I need to to either transform the input column holding it into a date column, or to dynamically add a new output column holding the transformed data.
We are using lookup transformation in SSIS 2012. The lookup transformation queries a table with two date columns. When we hover the mouse over the two columns in the 'columns' tab of the lookup transformation editor, the two columns show as DT_WSTR instead of DT_DBDATE. This causes the SSIS package to fail due to data type mismatch.A similar abandoned thread is available at: URL....
In one of my ScriptTasks, I instantiate FileSystemWatcher class and set events for it. This has been done inside the Main() method. I can access all the varibles declared in the SSIS package inside the Main() method (by using Dts.Variables("").Value) but none can be accessed inside the event methods. One possible reason for this can be, events might be running in different threads.
Now my question is, How the varibles can be accessed inside the event methods.
Hope someone can give me a solution. Appricate all your solutions. Thanks
I am writing a Sql Function in CLR. That function receives some data for processing. For processing the data, the function requires some additonal data to be fetched from the same database.
So, how does a CLR function execute SELECT or other Sql statements (or does it need to open up a SqlConnection for the purpose)???
I have a variable in SSIS that I want to access inside the Script Task. I assigned the variable in the ReadOnlyVariables in the Script Task property. How do I access it?
I am facing an error while displaying a sql reporting service report on my asp.net page. The report it self is build using Report server project. It consists of a Main report which contains a bar graph. Upon clicking on the bar graph, it takes you to another report (i have given in the jump to report section), call it Report 2. Upon clicking on Report 2 it shows another report say Report 3. It is displayed properly as long as i run the preview of the report in my report project. When i deploy this report on the report server and trying it to access using my asp.net code behind. The main reports runs very fine. Upon clicking, it does not do anything and when i click again it display error, the report does not contain any thing. Sometime it does not show even the error. But just could not drill me in.
I presume on the click event of the report which is a char item, it calls the page load event, and it loses it parent report information or something like that. Please can any body guide me how to do this.
The following functionality i need to achieve with my web page.
My main report : Main report :upon click on any char item, it should take you to the second report. My second report : and on clicking on charitem on the the second report, it should take me to the third report.
I have a package which loads the fact data from Stage into Warehouse database. This packages normally handles early arriving facts. In that package I use lookup to check the dims which exists, and where they don't I populate the dimension and use the surrogate key to load the facts. This works fine.
I had a request to load 7 years worth of historical data. Instead of re-writing the package I took the package which handles early arriving facts and deleted the section which handles early arriving facts. I knew all the dimensions already exists and I don€™t want to hinder the performance when I load millions of rows. During testing I found something very interesting.
If you have configured error path in the lookup component and removed the error path later, the package will NOT fail (won't produce error) even if the lookup can't find matching values.
Correct Behaviour Example 1: [1] Stage fact table has 2 records, with product code 1 and 2. [2] Warehouse Product table has only product code 1. [3] Source - Lookup - Destination in the data flow task. Error port on lookup is not configured. [4] From source we read 2 records, and the package will fail at lookup as it can't find Product Code 2.
Correct Behaviour Example 2: [1] Stage fact table has 2 records, with product code 1 and 2. [2] Warehouse Product table has only product code 1. [3] Source - Lookup - Destination in the data flow task. Error port on lookup is configured to go to RowCount. [4] From source we read 2 records, and the package will run successfully. It will put one record into warehouse table and send the invalid record into RowCount.
Incorrect Behaviour Example 3: [1] Stage fact table has 2 records, with product code 1 and 2. [2] Warehouse Product table has only product code 1. [3] Source - Lookup - Destination in the data flow task. Delete the configured error port from lookup. [4] From source we read 2 records, and the package will run successfully. It will put one record into warehouse table and discard the other.
My understanding if the error port is NOT configured as shown in example 2, it should fail as shown in example 1.
Am I missing a point or is this suppose to be a correct behaviour or is it a bug?