How To Choose Cache Size For A Lookup Transformation?
Feb 6, 2006
Hello,
I searched around and not been able to find any guidance on this question: if I am designing a lookup transformation, how do I decide what I should set the cache size to?
For the transformation on which I am currently working, the size of the lookup table will be small (like a dozen rows), so should I just reduce it to 1MB? Or should I not even bother with caching for a lookup table that small?
Hypothetically speaking, if I am working with much larger lookups in the future (let's say 30,000 rows in the lookup table), is there some methor or formula that I should use to try to figure out the best cache size?
If the cache size is set larger than the actual data being cached, is the entire cache size allocated in memory, or is the cache size managed to only be as large as it needs to be? In other words, is the cache size a maximum to which the cache will grow, or is it a preset that sets the cache for that lookup to exactly the specified size?
I have to perform a lookup in a table based on a query like:
"... where ? = [RefTable].fieldID and ? between [RefTable].AnotherFieldValue and [RefTable].AThirdFieldValue"
So, SSIS has put the CacheType to none. As I really need to speed up the job I want to set the CacheType to partial (full isn't an option due to the custom query I use here).
But here it comes: when using partial CacheType, one has to set the cache size manually - and I really don't know what value I should assign to it - is there a guideline on this topic?
I work on a Win2003 server platform with sql server 2005 - 2 processors - 2Gb Ram - enough disc space
My source has 2.2 million of records. I'm performing the incremental load.In the lookup transformation i used the destination table for the reference using Full cache mode.For the first time package executed successfully but when i executed the package second time, Suddenly Package hangs while running.Than i truncate the data from the destination table and restart the SQL Server Services.After doing all this i executed package again and it worked but when i executed package second time, again package hangs up .I have 8GB RAM and i5 2.5 GHz Processor laptop.
After converting from SSIS 2008 to SSIS 2012, I am facing major performance slowdown while loading fact data.When we used 2008 - one file used to take around 2 hours average and now after converting to 2012 - it took 17 hours to load one file. This is the current scenario: We load data into Staging and Select everything from Staging (28 million rows) and use a lookups for each dimension. I believe it is taking very long time due to one Dimension table which has (89 million rows).
With the lookup, we currently are using partial cache because full cache caused system out of memory.Lookup Transformation Editor - on this lookup - how to increase the size on partial Cache size 64-bit? I am being stuck at 4096 MB and can not increase it. In 2008, I had 200,000 MB partial cache size.
I designed an SSIS package about 200 packages in one project. the package extract from live to reporting server. some of my packages are very slow about 10 of them. strage enough the ones with more data number of rows run very fast. I'm using Source->Lookup->Conditional split->OLEDB Destination or OLDB Comand.
Can someone help me to found out what could be the problem. I'm very new in SSIS.
Does anyone know how to get SQL Reporting Services, when it runs a report, to tell Adobe Acrobat to "Choose Paper Source by PDF page size", which is an option in the print dialog box?
When I, or a user, runs a report in Visual Studio or live:
The report is set to render on 8.5" x 14" paper (legal)
The report is exported to a PDF
The report opens in Adobe Acrobat
The report is formatted on the screen to fit 8.5" x 14" paper (legal)
When "Print" is selected, the Preview has it on 8.5" x 11" paper (letter)
If I select "Choose Paper Source by PDF page size", then it fits it to the 8.5" x 14" paper (legal) How do I tell it to "Choose Paper Source by PDF page size"?
We are using the cache transformation in our project , while doing the cache transformation our disk space goes to 0 MB free and SSIS package execution not completes even after 3 hr..Initially we have around 34 GB free space on C: drive .Our server configuration is 64 RAM. We are caching the data from table which contains around 21 million records.We changed the path in properties (“BLOPTempStoragePath”,”BufferTempStoragePath”) of Data Flow task of SSIS in which we are using Cache Transformation.
I've a simple lookup transform in SSIS 2008 (R2). I've created it with a full cache and it worked fine. When i switch to partial cache, it will give me this error:
-------------------------------------------------------------------------------------------------- TITLE: Package Validation Error ------------------------------ Package Validation Error ------------------------------ ADDITIONAL INFORMATION: Error at DFT_AdventureWorks [Lookup [411]]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.
[Code] ....
I've created a OLE source with the following query :
SELECT SalesOrderID, OrderDate, CustomerID FROM Sales.SalesOrderHeader
And this will flow into the lookup transform and this has the following lookup reference query:
SELECT CustomerID, AccountNumber FROM Sales.Customer WHERE CustomerID % 7 <>0
I am using a lookup and full cache, occasionally i get this warning:[Lookup [150]] Warning: The component "Lookup" (150) encountered duplicate reference key values when caching reference data. This error occurs in Full Cache mode only. Either remove the duplicate key values, or change the cache mode to PARTIAL or NO_CACHE. Now I know it is only a warning but it is highlighting a real issue.Is there a way of capturing that this has happened?
I am doing something really simple and it doesnt work, may be I am missing something, What I am trying to accomplish is to load a fact table using lookup transaformation, however my source data was different from the data in my dimension (or the datatype ) I had to use a data conversion task before my lookup , so the data flow is something like this source -> Data Conversion -> Lookup -> destination , I am getting an error at my lookup task where it says the "[Lookup [82]] Error: Row yielded no match during lookup". and then it just fails. I know for sure that there has to be matching data. donno what is it that I am missing.
I am totally new to SSIS. I need an example of how to use a Lookup transformation. Basd on that i need to lookup for some recs and delete records from transaction table. I have used the Execute SQL task for this and i am able to achive my requirement. But now i am using XML configurations for Connection Managers and for that very reason i dont want to hard code the catalog names(database names) inside my Execute SQL task.
Can any one suggest me how to do the same using a Lookup transformation or any other. Any suggestions will be greatly appreciated.
I have a Dataflow task which loads data from a flat file to a Fact table named Inventory , doing a dimensional Key lookup with DIMStores - which is the dimension table for stores information.
If I have some rows in the flatfile whose 'Store' column doesnt have a corresponsing key in the DIMStores table, I want to insert all these stores in to DIMStores table and then update the Inventory table accordingly ..
I have a question based on Lookup Transformation component. If using Lookup component, the data cannot be NULL for available columns mapping. How about I want to keep the NULL value like outer join instead of inner join? Is there any way to do since I have several Lookup components inside of my dataflow?
I'm trying to perform a lookup transformation. But the deal is, I have this one value that I am passing into the transformation, but I would like to gather all values that match the value I put in....does the lookup transformation do this? I tried it, and it appears as if it only returns one value for the one input. After the lookup, I have an access OLE DB destination setup...so I can capture all those values that corresponds to that one value I passed into the lookup. Does anyone have any ideas on how I can go about this?
I am using a lookup component in a SSIS data flow. The lookup is a select to a foxpro table. THe lookup works fine with full cache selected. I cannot get the lookup to work with a partial or no cache. I have the latest Foxpro OLE DB driver installed which I understand to support paramaterized queries. Has anyone had success with using cached lookup to Foxpro? Does anyone know how to set the lookup properties of sqlcommand and sqlcommandparam? I am unable to find any examples in BOL or on the web.
Here are some details. IF I go with "use a table or a view" option with the default cache query I get initialization errors
[lkp_lab_worst_value [6170]] Error: An OLE DB error has occurred. Error code: 0x80040E14. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Visual FoxPro" Hresult: 0x80040E14 Description: "Command contains unrecognized phrase/keyword.".
In the advanced editor I see
SQLCommand set to
"select * from `kcf`"
and SQLCommandParam set to
"select * from (select * from `kcf`) as refTable where [refTable].[patkey] = ? and [refTable].[dayof_stay] = ? and [refTable].[modifier] = ? and [refTable].[kcf_code] = ? and [refTable].[source] = ? and [refTable].[kcf_time] = ?"
I believe the above error is because Foxpro V7 does not support the inner subselect . In addition the query contains CRLF without a continuation character (";").
If I remove the CRLF in the sqlcommandparam query, using the advanced editor, I get this design time error "OLE D error occurred while loading column metadata. Check the sqlcommand and sqlcommandparam properties". The designer requires both properties to be set, its unclear to me how the interact.
I cannot find any examples in BOL or on the web on how to set these 2 properties. Can someone give me a few guidelines?
I can get past the design errors by changing sqlcommandparam to a plain select that is VFP 7 compatible ( I removed the subselect and the square brackets):
select * from kcf as refTable where refTable.patkey = ? and refTable.dayof_stay = ? and refTable.modifier = ? and refTable.kcf_code = ? and refTable.source = ? and refTable.kcf_time = ?
But then I get a runtime error
[lkp_lab_worst_value [6170]] Error: An OLE DB error has occurred. Error code: 0x80040E46. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Visual FoxPro" Hresult: 0x80040E46 Description: "One or more accessor flags were invalid.".
[lkp_lab_worst_value [6170]] Error: OLE DB error occurred while binding parameters. Check SQLCommand and SqlCommandParam properties.
I'm either missing something or this is a bug. I have a Lookup that finds no matches if I use the default option of full caching (everything on the Advanced tab unchecked). The lookup table is relatively small (15348 bytes) in only 544 rows. If I check only the Enable Memory Restriction box and eliminate caching, it works fine. I can also check the Enable Caching box and accept the default cache size of 5MB and it works fine. Anyone have any ideas? I'm running on Standard Edition, SP2.
I have a question. I'm using a lookup table which contains descriptions for a field from one of my tables. I have added a lookup query that looks like this: SELECT STORE_DESRIP, BANNER, STORE_ID FROM `stores.txt` WHERE (STORE_ID = ?)
My main table (stagging) contains the store_id field. What I want to do is populate my destination fields (Store_descriptin and Banner) from the lookup table, based on the Store_id. I have written an ActiveX script that looks like this: Function Main() DTSDestination("Store_id") = DTSSource("store_id") DTSDestination("Store_description") = DTSLookup("storelookup").Execute(DTSSource("store_ id")).value Main = DTSTransformStat_OK End Function
I receive an error when I try to test this script. It complains about line 8, which is the line that contains the DTSLookup function. Does anyone have any ideas what I'm doing wrong?
The process I see here is that when it comes to populating the "Store_description" field is will look the lookup table and based on the store_id, pass the description back. I would also like to add another destination field into this script, but won't until I resolve this.
20 I need to do a translation of "group" to local group?:
Customer
Group
Local Group
A
10
11
B
20
21 When a match is found, the group code should be replaced by the local group code ... but, when no match is found the group code should stay.
Is there any way to do this using Lookup ? I tried but when you set the error output to ignore it replaces the value that has no match with a NULL value. Maybe there is another way to get this done with or without the lookup component ?
I have a lookup transformation that retrieves a key for a certain column of values, in this case, a name. So, I go in to the lookup table with a name and come out with its key. I had it working and then I added new entries to the lookup table for a bunch of new names. Now, for some reason, I am not getting the matches for the new names. But I am still getting the matches for the names that existed before I added the new ones.
I'm wondering if the lookup transformation is using the old set of data and some how not picking up the new names. Do I have to trigger something in the lookup transformation to let it know that the lookup table data has changed?
I needed to lookup some table values based on a join of two fields...
I've configured the lookup transform to get the values via a SQL statement to minimize loading time.
However, when creating the relationships between the input columns and the lookup columns I receive following error: input column [BATCH_ID] has a datatype which cannot be joined on
I've checked both input and lookup columns, both are of type DT_R8... Both columns in the different tables do have the same datatypes
I've been going round in circles trying to understand what design I should use for a particular transformation/lookup problem I have. Would appreciate a few pointers.
On the data flow, I can create the data source easily with a SQL Query that returns 4 columns...
eg. myDb.dbo.Table1 Col1, Col2, Col3, Col4
The end result at the Destination is:-
myDb.dbo.Table2 Col1, Col5
Table1.Col1 maps directly to Table2.Col1 -- easy
Table2.Col5 is a result of a lookup query to different tables in the same db based on the value of Table1.Col1, Table1.Col2 & Table1.Col3.
I've already had problems with the Lookup component that forced me to give up using it because it wouldn't adequately support parameters. I was forced to use the Script component instead. However, this problem is a bit different because I don't need to try and reference variables as parameters and, instead, I need to use the values from the source query (Col1, Col2 & Col3).
There is one data flow task in the package. I have a column in the input set called "ID". The total number of rows in the input set is > 50000.
There is one table in the database which has the description for all the IDs. I need to get the "Description" value from the database for each row. The table contains nearly 12 lack records.
For that I am using lookup transformation. In lookup, I specified the query and column mapping and I got the new column "Description".
Here is the problem.
while running the package, the lookup is getting all the 12 lack rows from table and then it is matching the rows.
It is taking huge time.
Can any one suggest me how to improve the performance of this situation?
I just wanted to know if there is any way to Allow Null values while doing a lookup on a table in SSIS.
Let me elaborate the situation...
I have a flat file source that has a field called 'code'. I want to lookup in a code table to see if the code in the file is a valid code but the flat file may contain a NULL value as a 'code' (i.e. zero length string which treated as NULL by my package).
My problem is, the SSIS package tries to search for the NULL in the table and the lookup fails and an error is logged as per the business logic but actually NULL is also an acceptable value and the error should not be logged.
I tried inserting a NULL value in the lookup column but that doesn't work. I am not sure but I think I have read somewhere that two null values cannot be compared for equality. I cannot use conditional split to check the null value because I have to use a large number of lookups and a conditional split everywhere will mess up the things.
I may have misunderstood how Lookupu works because it's not doing what I want. From the OLTP datasource I have a long list of revenue items (from a SQL server database). I want to assign these to specific accounts as they are transferred into our accounting system. I have another table with a list of words to search for and which account they belong to. For example if the OLTP source might be Description - Amount "Sales of cars"- "$20,000" "Motorcycle sales" - "$15,000" "Bike rentals" - "$2,000"
The account lookup table is like Wordsearch - Account "sale" - "ACCT_SAL" "rental" - "ACCT_RENT"
So by looking up whether "wordsearch" is found in "desription" I should get an output of ACCT_SAL - $20,000 ACCT_SAL - $15,000 ACCT_RENT - $2,000
Back in DTS I did this with an array and "If Instr" using VBScript in the Data Transformation Task. I'm sure there must be something in SSIS to do this - it should be something like a Fuzzy Lookup ?, but I'm drifting toward Script Component. Anyone got any ideas for SSIS
I have an input flow with dates and fields like this :
ID BEGINNING_DATE ENDING_DATE
1 12/01/2006 12/16/2006
and a reference table like this:
ID PRICE BEGINNING_PRICE_DATE ENDING_PRICE_DATE
1 400 11/28/2006 12/03/2006
1 500 12/03/2006 12/06/2006
1 600 12/06/2006 12/09/2006
I have to get the intersection periods between the two tables joining on ID. I would like to have this result flow :
ID BEGINNING_DATE ENDING_DATE PRICE
1 12/01/2006 12/03/2006 400
1 12/03/2006 12/06/2006 500
1 12/06/2006 12/09/2006 600
I'm using a lookup transformation and modifying the SQL instruction in advanced tab like this:
select * from (select * from [dbo].[Price]) as refTable where [refTable].[ID] = ? and (? between [refTable].[BEGINNING_PRICE_DATE] and [refTable].[ENDING_PRICE_DATE] or ? between [refTable].[BEGINNING_PRICE_DATE] and [refTable].[ENDING_PRICE_DATE] or ([refTable].[BEGINNING_PRICE_DATE] between ? and ? and [refTable].[ENDING_PRICE_DATE] between ? and ?))
My problem is that the transformation looks for only one matching element and outputs 0 or 1 row per input row... In that case the 3 rows of my lookup table are matching with the row in my input table, but I have only the first one.
I used to use Lookup Transformation for my SSIS, now I am having problem and cannot find the problem. I have my source table, one lookup to join source column to my lookup column as L1. I then have another lookup to join L1 to L2, and will show L2. It seems not working. I used to have source to join several lookup and get different Li, not this one. Any help?
Has anyone else noticed this? I want to be able to use a paremter in my reference table of my Lookup Transformation. I couldn't find any way for the dialog to accept SQL with a parameter so I checked on MSDN How to: Implement a Lookup Using the Lookup Transformation and sure enough in the article is says to click on the Parameter button. I don't have a Parameters button on this dialog. Error? Is this possible?
6. In the Lookup Transformation Editor, on the Reference Table tab, select a connection manager in the Connection manager list, and then do one of the following:
Click Use a table or a view, and then select either a data source view, a data source reference, or an OLE DB connection manager.
Click Use results of an SQL query, and then build a query in the SQL Command window, or click Build Query to build a query using the graphical tools that the Query Builder provides. Alternatively, click Browse to import an SQL statement from a file. If the query includes parameters, click Parameters to map parameters to variables. For more information, see How to: Map Query Parameters to Variables in Data Flow Components. To validate the SQL query, click Parse Query. To view a sample of the data that the query returns, click Preview.
I work in the healthcare area, and am handling the survey data ETL's. There are around 8 different survey areas and based on information received from them for the visit they reference, I want to pull in more info from our invoicing database. My idea is this:
1.) Pull in the flat file to an ODBC staging table 2.) Cache all invoice records that fall between the MIN(Date of Service) and MAX(Date of Service) from the staging table. 3.) First lookup the information needed on patientID, providerID, date of service, and billing location. 4.) For the surveys that didn't match on those 4 columns, try looking up based on patientID, date of service, and billing location (since I could be 99% sure this would still return the record I need). 5.) For the remaining surveys, lookup based just on patientID and date of service. These records will be flagged for manual review because clearly, if a patient has multiple appointments in the same day, this will be prone to error.
However, in trying to use only 3 of the columns in the lookup, I get the error saying basically that I need to utilize all 4. Is there a way around this, or is there an entirely different way I should be approaching this? The reason I thought cache transform was the answer is because I will need to run a different package for each lookup, as the data and logic between each survey will vary, but the invoice data "pool" will stay the same regardless.
I found that sometimes when the cache is filled up, the performance drops significantly. Anyone knows the caching strategy behind the Lookup Transformation? If it's LRU or FIFO for example, I may have to sort the input based on FK to get better performance. If it's frequency based then sort might not help. Cheers,