Parameterized queries are only allowed on partial or none cache style lookup transforms, not 'full' ones. Is there some "trick" to parameterizing a full cache lookup, or should the join simply be done at the source, obviating the need for a full cache lookup at all (other suggestion certainly welcome)
More particularly, I'd like to use the lookup transform in a surrogate key pipeline. However, the dimension is large (900 million rows), so its would be useful to restrict the lookup transform's cache by a join to the source.
For example:
Source query is: select a,b,c from t where z=@filter (20,000 rows)
Lookup transform query: select surrogate_key,business_key from dimension (900 M rows, not tenable)
Phil, great links, really helpful and appreciated.
I just need to verify one thing on the lookup method: --One of the lookup methods people were discussing is non-cached lookup -- which seem to be evaluated to be the fastest. Is the non-cached the default of LookUp transformation? and when I wanted the lookup method to be cached, I need to go into the Advance tab and set it to however %, right? thanks.
Is it possible to keep a Cached Lookup in memory when executing multiple Data Flows? Executing DFT€™s in parallel will cache and use the same LOOKUP statement. But what if I€™m executing the DFT sequentially, can I keep the LOOKUP from the first DFT in memory for the second DFT? For example, in my case, I€™m caching a lookup against the Customer dimension for invoices. The second DFT then processes credits and again does a lookup against the Customer dimension. I want to use the cached Customer records from the first DFT.
We have a package using a lookup query on DB2 to validate data from a file. Everything works fine, except for the lookup query that has to cache about 1,5 million rows.
Now I would like to specify parameters to that query to minimize the data being cached. I tried using parameters in the query, but I get an error:
"Provider cannot derive parameter information and SetParameterInfo has not been called."
guys i'm trying to use a Lookup in a dataflow that looksup stuff in the results of a query.
Problem I have is that the query needs to take two parameters.. (Source and BaseCurrency in the code below) and i can't figure out how to supply the parameters..
Parameters can be supplied in other task types or transforms .. but can't see how to do it in the Lookup...
PJ
SELECT ForeignCurrency, RateFromFile AS YesterdaysRate
FROM inputrates IR
WHERE fileheaderid in (
SELECT top 1 MAX(ID) FROM FileInputAttempts FIA WHERE Source = '?' AND FIA.BaseCurrency = '?' AND status = 'SUCCESS' Group by CAST(FLOOR(CAST(LoadDate AS float))AS datetime) order by MAX(loaddate) DESC )
1) I am using exceute SQL tasks in my control flow. 3 variables have been defined at the package level.They are mapped to 3 parameters respectively in the Execute SQL task.
When I try using these parameters in SQL error is thrown.Query is not getting parsed.My connection is OLEDB. Target and source are in SQL Server.
Can anyone suggest a workaround?
2) Before loading my target I need to define a Lookup . My requirement is if say consumer key matches in fact table then update it else insert.
2 kinds of lookup are available in SSIS dataflow tools. Simple Lookup for exact matching and Fuzzy Lookup for matching based on probability.
Neither of it supports my requirement? Can i put a select and insert query directly in Lookup or will need to call it from a file as a stored procedure?
I am using a lookup and full cache, occasionally i get this warning:[Lookup [150]] Warning: The component "Lookup" (150) encountered duplicate reference key values when caching reference data. This error occurs in Full Cache mode only. Either remove the duplicate key values, or change the cache mode to PARTIAL or NO_CACHE. Now I know it is only a warning but it is highlighting a real issue.Is there a way of capturing that this has happened?
Hi Gurus, I have a Dataflow Task which has an OLE DB Source calling a SP with parameters (?, ?). Then this OLE DB Source is conencted to a Lookup Transform which also calls a SP but on a different database. I am unable to figure out how to pass parameters in a Look up Transform. In the 'Use Results of an SQL Query' pane of Lookup Transform:
Code SnippetEXEC GetMonthlyDataExtract 4, 2007
( I am passing month and year values) this works ok.
But when I chage to
Code SnippetEXEC GetMonthlyDataExtract ?, ?
It says EXEC not supported. Also I can not figure out how to configure parameters since 'Reference Table' Tab of the Lookup Transform does not have any option where we can attach variables to parameters. Also I am interested to map parameters to variables not to input columns. If mention if that is not possible or any other alternative.
When I retrieve data using an OLE DB Source I can create a SQL query and pass parameters to filter the data I get back. I'd like to do the same thing with the Lookup Transform but the parameters button isn't there. Am I missing something or do I have to use some special text format to insert my parameters into the query?
I've been searching around for a while now and slowly been making progress but I've finally hit a road block and I'm wondering if anyone else has ever gotten this to work. I'm using SS SP2 and the Microsoft OLE DB Provider for Oracle.
I have a lookup task in the data flow. The lookup table is in Oracle and it works fine as long as I don't check the "Enable Memory Restriction" box on the Advanced tab. As soon as that box is checked, the task will throw an error when I try to run it. I need to check it though to get to the Modify SQL Statement. Here is what I do: Create new Lookup task Set the Oracle OLE DB connection Use the following SQL for the reference table source: SELECT COST_CENTER_ID, COST_CENTER_NB, start_dt, end_dt, decode(SIGN(TO_NUMBER(TO_CHAR(START_DT,'MM'))-9),-1, TO_CHAR(START_DT,'YYYY'), 0, TO_CHAR(START_DT,'YYYY'), 1, TO_CHAR(START_DT + 365,'YYYY') ) START_FY, decode(SIGN(TO_NUMBER(TO_CHAR(END_DT,'MM'))-9), NULL, decode(SIGN(TO_NUMBER(TO_CHAR(SYSDATE,'MM'))-9), -1, TO_CHAR(SYSDATE,'YYYY'), 0, TO_CHAR(SYSDATE,'YYYY'), 1, TO_CHAR(SYSDATE + 365,'YYYY') ), -1, TO_CHAR(END_DT,'YYYY'), 0, TO_CHAR(END_DT,'YYYY'), 1, TO_CHAR(END_DT + 365,'YYYY') ) END_FY FROM DIM_COST_CENTER
Then I go to the columns page and connect 1 field from the input column to the lookup column. Then click ok and it runs fine. However, now I go to the advanced page and click the Enable Memory Restriction (at this point is where the problem occurs). As soon as the memory restriction is checked, the thing throws errors: [Lookup [4732]] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E14. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E14 Description: "ORA-00933: SQL command not properly ended ". Then if I go in and Modify the SQL Statement into the Oracle syntax by removing the word AS and the [ ]'s it will get a new error: select * from (SELECT COST_CENTER_ID, COST_CENTER_NB, start_dt, end_dt, decode(SIGN(TO_NUMBER(TO_CHAR(START_DT,'MM'))-9),-1, TO_CHAR(START_DT,'YYYY'), 0, TO_CHAR(START_DT,'YYYY'), 1, TO_CHAR(START_DT + 365,'YYYY') ) START_FY, decode(SIGN(TO_NUMBER(TO_CHAR(END_DT,'MM'))-9), NULL, decode(SIGN(TO_NUMBER(TO_CHAR(SYSDATE,'MM'))-9), -1, TO_CHAR(SYSDATE,'YYYY'), 0, TO_CHAR(SYSDATE,'YYYY'), 1, TO_CHAR(SYSDATE + 365,'YYYY') ), -1, TO_CHAR(END_DT,'YYYY'), 0, TO_CHAR(END_DT,'YYYY'), 1, TO_CHAR(END_DT + 365,'YYYY') ) END_FY FROM DIM_COST_CENTER) refTable where refTable.COST_CENTER_NB = ?
Now when running I get the error:
[Lookup [4732]] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E5D. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E5D Description: "Parameter name is unrecognized.". Followed by: [Lookup [4732]] Error: OLE DB error occurred while binding parameters. Check SQLCommand and SqlCommandParam properties.
This is where I get stuck. I've gone into the XML and looked through everything and it all seems to match up in terms of variables lineage ID's and such, but I can't see any place to set the parameter name, which should be 0 since it is OLE DB. When I click the Enable Memory Restriction, the only difference I can notice in the XML is that the cachetype line changes from 0 to 2.
<property id="4738" name="CacheType" dataType="System.Int32" state="default" isArray="false" description="Specifies the cache type of the lookup table." typeConverter="CacheType" UITypeEditor="" containsID="false" expressionType="None">2</property>
Has anyone ever got parameters to work with Oracle and a lookup? Any work arounds? I have used a Merge Join with Conditional Split successfully, but I have about 5 other lookups that have to be done and it will be a killer and lots of work to try and re-sort for each merge join and conditional splits for each of them. Looking for any help with making the lookup work or some nicer work arounds.
I'm either missing something or this is a bug. I have a Lookup that finds no matches if I use the default option of full caching (everything on the Advanced tab unchecked). The lookup table is relatively small (15348 bytes) in only 544 rows. If I check only the Enable Memory Restriction box and eliminate caching, it works fine. I can also check the Enable Caching box and accept the default cache size of 5MB and it works fine. Anyone have any ideas? I'm running on Standard Edition, SP2.
I would like to know what happens when a very large reference data set for a lookup transform with full caching enabled is getting loaded during package execution and the computer memory runs out or is very low. Does SSIS a) give an out of memory error of some sort b) resort to a no caching or partial caching mode c) maintain the full caching mode but will switch to using the paging file(virtual memory).
I think it will resort to using the page file in which case the benefits of in memory lookups are lost and performance would suffer. If I cannot upgrade the memory or shrink the reference set somehow, i should switch that lookup task to use partial caching or no caching with an indexed lookup table. Would this make sense?
I'm currently loading a package that does a lookup on a column of data type nvarchar(4).The values itself are (A+, A, B+, B, C, D, /). The strange lookup behaviour is happening for each of the cases, so it's not related to a specific value. After trying to put the cache on NO CACHE, the lookup works perfectly. When using the default FULL CACHE the strange behaviour happens. Could it be related to the data type? I have not yet tried to use a CHAR instead of a NVARCHAR but it looks like people have similar issues using CHAR.
My source has 2.2 million of records. I'm performing the incremental load.In the lookup transformation i used the destination table for the reference using Full cache mode.For the first time package executed successfully but when i executed the package second time, Suddenly Package hangs while running.Than i truncate the data from the destination table and restart the SQL Server Services.After doing all this i executed package again and it worked but when i executed package second time, again package hangs up .I have 8GB RAM and i5 2.5 GHz Processor laptop.
Hi, I have a SQL statement that works great when I don't use a SQL Parameter, but when I do it just takes the @Searchfor as literal text "@SearchFor" instead of the string @SearchFor represents. Any ideas? Below is the two versions of the sql statements sqlComm.Parameters.Add(new SqlParameter("@SearchFor", strSearchFor)); sqlComm.CommandText = "SELECT RANK, intID, chTitle, chDescription " "FROM FREETEXTTABLE( tblItems, *, 'ISABOUT("+ strSearchFor +" WEIGHT(1.0))') a " + "JOIN tblItems b on a.[KEY] = b.intID ORDER BY RANK DESC; ";
sqlComm.Parameters.Add(new SqlParameter("@SearchFor", strSearchFor)); sqlComm.CommandText = "SELECT RANK, intID, chTitle, chDescription " + "FROM FREETEXTTABLE( tblItems, *, 'ISABOUT(@SearchFor WEIGHT(1.0))') a " + "JOIN tblItems b on a.[KEY] = b.intID ORDER BY RANK DESC; ";
How can you use SQL Full Text Search CONTAINS() with an asp.net 2.0 ObjectDataSource using @Parameters? MSDN says something like this, but only works directly using like the Query from SQL Manager: USE TestingDB;GODECLARE @SearchWord NVARCHAR(30)SET @SearchWord = N'performance'SELECT TestTextFROM TestingTableWHERE CONTAINS(TestText, @SearchWord); I tryed to mak something like that work with the DataSet DataAdapter Query Builder for the ObjectDataSource, but you can't use DECLARE or SET. SELECT TestTextFROM TestingTableWHERE CONTAINS(TestText, @SearchWord); But again it says @SearchWord not a valide SQL Construct Is there anyway to make a DataSet.DataApater.ObjectDataSource work with an SQL FTS CONTAINS() with @Parameters?
I'd like to incorporate search functionality (SQL Server 2005 Full-Text Search) into a web application, so I want to be able to return a paged list of results based on the user's search terms. I already have a parameterized stored procedure that returns a list of products when a category ID is supplied. I modified this procedure to use a different input parameter (@SearchTerms), but I'd still like to return the number of records, as in the original stored procedure.
However, I'm getting this error: Invalid object name 'ProductEntries'.
Here's the original stored procedure:
ALTER PROCEDURE dbo.GetProductsByCategoryID ( @CategoryID INT, @PageIndex INT, @NumRows INT, @CategoryName VARCHAR(50) OUTPUT, @CategoryProductCount INT OUTPUT ) AS
BEGIN SELECT @CategoryProductCount = (SELECT COUNT(ProductID) FROM Products WHERE Products.CategoryID = @CategoryID) SELECT @CategoryName = (SELECT CategoryName FROM Categories WHERE Categories.CategoryID = @CategoryID)
WITH ProductEntries AS ( SELECT ROW_NUMBER() OVER(ORDER BY ProductID) AS Row, ProductID, CategoryID, Description, ProductImage, UnitCost FROM Products WHERE CategoryID = @CategoryID )
SELECT ProductID, CategoryID, Description, ProductImage, UnitCost FROM ProductEntries WHERE Row BETWEEN @startRowIndex AND @startRowIndex + @NumRows - 1
END
And here's the modified one:
ALTER PROCEDURE dbo.GetSearchResults ( @SearchTerms VARCHAR(200), @PageIndex INT, @NumRows INT, @ProductCount INT OUTPUT ) AS
BEGIN SELECT @ProductCount = (SELECT COUNT(ProductID) FROM ProductEntries)
WITH ProductEntries AS ( SELECT ROW_NUMBER() OVER(ORDER BY ProductID) AS Row, ProductID, CategoryID, Description, ProductImage, UnitCost FROM CONTAINSTABLE (Products, *, @SearchTerms, 25) AS c, Products p WHERE c.[KEY] = p.ProductID )
SELECT ProductID, CategoryID, Description, ProductImage, UnitCost FROM ProductEntries WHERE Row BETWEEN @startRowIndex AND @startRowIndex + @NumRows - 1
END
I thought I might be getting this error because SELECT @ProductCount occurs before the ProductEntries table is created, but when I move that SELECT statement further down, I still get the same error.
How can I get the value of @ProductCount in this scenario so that I can display it in the UI of the web app?
We did some "at scale" fuzzy lookup tests today and were rather disappointed with the performance. I'm wanting to know your experience so I can set my performance expectations appropriately.
We were doing a fuzzy lookup against a lookup table with 25 million rows. Each row has 11 columns used in the fuzzy lookup, each between 10-100 chars. We set CopyReferenceTable=0 and MatchIndexOptions=GenerateAndPersistNewIndex and WarmCaches=true. It took about 60 minutes to build that index table, during which, dtexec got up to 4.5GB memory usage. (Is there a way to tell what % of the index table got cached in memory? Memory kept rising as each "Finished building X% of fuzzy index" progress event scrolled by all the way up to 100% progress when it peaked at 4.5GB.) The MaxMemoryUsage setting we left blank so it would use as much as possible on this 64-bit box with 16GB of memory (but only about 4GB was available for SSIS).
After it got done building the index table, it started flowing data through the pipeline. We saw the first buffer of ~9,000 rows get passed from the source to the fuzzy lookup transform. Six hours later it had not finished doing the fuzzy lookup on that first buffer!!! Running profiler showed us it was firing off lots of singelton SQL queries doing lookups as expected. So it was making progress, just very, very slowly.
We had set MinSimilarity=0.45 and Exhaustive=False. Those seemed to be reasonable settings for smaller datasets.
Does that performance seem inline with expectations? Any thoughts to improve performance?
I'm working with an existing package that uses the fuzzy lookup transform. The package is currently working; however, I need to add some columns to the lookup columns from the reference table that is being used.
It seems that I am hitting a memory threshold of some sort, as when I add 3 or 4 columns, the package works, but when I add 5 columns, the fuzzy lookup transform fails pre-execute:
Pre-Execute Taking a snapshot of the reference table Taking a snapshot of the reference table Building Fuzzy Match Index component "Fuzzy Lookup Existing Member" (8351) failed the pre-execute phase and returned error code 0x8007007A.
These errors occur regardless of what columns I am attempting to add to the lookup list.
I have tried setting the MaxMemoryUsage custom property of the transform to 0, and to explicit values that should be much more than enough to hold the fuzzy match index (the reference table is only about 3000 rows, and the entire table is stored in less than 2MB of disk space.
Say I want to lookup a value in another dataset, but there is a grouping that requires you to know what the values for each level is in order to get to the correct detail record.  Can you still use the lookup function with more than one field to compare against? So for example
Department \___SalesPerson     \___Measure
I want to be able to add a new row at the Measure level, but lookup each field from another dataset. In order to do that I will need the Department AND SalesPerson values to do the lookup, but I dont think the Lookup function will let us do that will.
Hello all, I have a report with a table and a chart. It uses dataset1 as the data source. All works fine. I create a new dataset called dataset2. The queries are exactly the same. The only differences between the 2 datasets is the database server and the fact that one of the columns is a smallint (in dataset2) and an int(in Dataset1) I change the datasetName property of both the table and the chart to use dataset2. When I run the report I get a conversion error stating that there was an overflow of int2 while using dataset1. I have verified the report is not using dataset1 anywhere. If I delete dataset1 and run the report the error goes away. If I add it back, I get the error again. Why is the report looking at dataset1 if it is not referenced at all in the report? Does SQL RS cache the datasets and verify each when it compiles?
I am using SQL server 2005. I have a VIEW that joins several tables. One of the table's column can be added dynamically by the user from a GUI interface. However, after a column is added, it does not show up in the VIEW immediately. It will take a while (I haven't figured out exactly how long) before the extra column shows up as the execution result of the VIEW. So it seems like SQL server is caching that VIEW's schema. Is there anyway I can make this view always comes back with the latest schema? Thanks a lot! Penn
I want to check the performance of m query and i just want to remove cached query results. Is there any suggestion how can i do this. I just want to check after each modificatin how much improvement in performance
I'm trying to understand the cases where it's more interesting to use snapshot and when it's more interesting to use cached instances.
If I have 100 users trying to reach a report, is it better to use snaphsot or cache instance ? In both case, the 100 users will have the same report result. And what about the performance, are they similar ?
I would like to know what is the difference between a snapshot and a cached instance in SSRS?
Which one has the best performances and which one is the best for multiple users and reports containing parameters (the parameters are then passed in the where clause of the sql code; ex: WHERE IN(@param1))?
Actually this is in regard to SCD Type 2 Dimension, Scenario is like that I am moving Fact table from some old source and I have dimensionA description value in fact which I want to replace with appropriate id from Dimension Table and that Dimension table is SCD Type 2 based on StartDate and EndDate and Fact Table doesn't contains direct date value rather there is timeId in Fact so to update the value in Fact table I have to Join Time Dimension table and other Dimension Table to replace fact Description with proper Id.
I am doing a lookup that requires mapping 2 columns in the column mapping section. When I do this, I get the error "Row yielded no match during lookup" . The SQL that I captured in SQL profiler does find the record when I run it in Management Studio. I have already tried trimming everything to no avail.
Why is this happening?
I tried enabling memory restrictions but then I my package hangs and I get a SQLDUMPER_ERRORLOG.log file with the following logged:
I was wondering if anyone had an concrete information about if there is a problem with having too many stored procedures or plans in the cache? Obviously there is an impact on memory but if we can ignore that for the time being, does SQL perform just as well with 100 query plans as it does with 10's millions of plans?