I have an SSIS package that unpivots data - each row from the source becomes 20-24 rows in the destination db. In the dataflow it looks up some type-data IDs before inserting into the destination table. The whole process flies through the data at an incredible rate, however I find that the final commit on data insertion takes a very long time, and two of the lookup tasks remain yellow throughout. It appears that there may be a contention issue because the final table has FKs to those lookup tables.
In T-SQL it is possible to do SELECT..... WITH (NOLOCK), but when I modified the SQL statement of the lookup task by adding this clause, it caused an error "Incorrect syntax near the keywork 'with'."
Actually this is in regard to SCD Type 2 Dimension, Scenario is like that I am moving Fact table from some old source and I have dimensionA description value in fact which I want to replace with appropriate id from Dimension Table and that Dimension table is SCD Type 2 based on StartDate and EndDate and Fact Table doesn't contains direct date value rather there is timeId in Fact so to update the value in Fact table I have to Join Time Dimension table and other Dimension Table to replace fact Description with proper Id.
I've looked at so many examples on the web but I can't get my dataflow with lookup to work correctly. It just keeps inserting the records instead of updating them because they already exist. I know it's something simple but I've been trying to figure it out for the last 4 hours and am over my head. In the lookup I have 5 columns as available input and of course the matching lookup columns and selected 4 with lookup operation replace and tried add as new. It just keeps adding them to my destination table when they already exists. I can't figure out what i am doing wrong?
Is there a way to set the Lookup task to be not case sensitive? For example, the lookup table has id 1, value 'ABCD'. The value that I'm using to lookup is 'abcd'. I cannot make it to return the id 1 unless I convert my lookup value to be all upper case.
Hi, I'm using the lookup task to perform updates on some records and it seems to be working correctly but isn't making all updates. I have a case where there are 543 rows in my source connection, there are 436 of them that do not match the date column of the 543 in my destination table. It only updated 224 of them when it should have updated all 436? Does anyone know why it would only update this many instead of all? I also had another that only updated 18 out of 19 instead of all 19.
I am trying to use this painful new SSIS process. I basically need to use a lookup task to check to see whether a record exists or not. If not, then I need to insert the record. However, because this is treated as an error situation (which is stupid in itself), I get a problem when the number of records not found reach the MaximumErrorCount, and the rest of the package fails. Is there any other method of doing this type of thing, without simply increasing the MaximumErrorCounty to some ludicrous value. I could do this type of thing very very very easily when using DTS packages using the Data Driven Task, it seems so stupid that I can't perform the same kind of task using SSIS.
Hi, My SQL Server database is case in-sensitive. I have a lookup task to join on varhar column. If the data is only case different such as 'DataModel' and 'Datamodle', lookup task consdiers them as different. Is there a way to set up lookup task as case insentive? I know I can work around it by converted them to upper case and compare them.
Can anyone explain to me (or show me where to get info) in what scenario should I use Full Cache, Partial, or No_Cache mode in the Lookup task? I got below warning message during execute my package. I understand that my lookup data have duplicate reference keys. I want to change the mode to partial or no cache to see how it works, but don't know what to change for the parameter map.
Warning: 0x802090E4 at Data Flow Task, Lookup [37]: The Lookup transformation encountered duplicate reference key values when caching reference data. The Lookup transformation found duplicate key values when caching metadata in PreExecute. This error occurs in Full Cache mode only. Either remove the duplicate key values, or change the cache mode to PARTIAL or NO_CACHE.
We have a LKP table that we will use to decode incoming codes from multiple sales feeds to link to the relevant surrogate key for the ultimate dimension table.
The LKP table has a start date, end date and active row flag to track history of the codes.
This table in its entirety is required for the FULL history load but we will need a CURRENT version of the table containing only the ACTIVE rows for the DAILY incremental loading.
I am thinking of putting a standard view over the top of the LKP table to present only the ACTIVE rows rather than creating a 2nd table and reloading it each night. We don't have enterprise edition to use materialised views.
My question is, will the LKP cache populate as the SSIS packages begins, loading it with all the rows from the view, as it would if I used the 2nd physical table with only ACTIVE rows?
SSIS data flow transformation - Lookup task - best practice concerning Cachetype
I would like to know if there's any best practice concerning the CacheType property for the the Lookup task. Default value is "Full", but if the SSIS package is working with at lot of data, i.e. +10 mill. records from the OLE DB source to be handled through a variated numbers of data flow transformation tasks, it must have an impact memory usage if the lookup table also is a large table, i.e. +8 mill. records? When should I consider turning the property value to "none"
I Can't reproduce the error if I run the package stand-alone.
I'm using the same lookup call (same table, etc.) in 2 packages that are running in parallel (called by a parent package).
[LKP_UnderwriterId [72283]] Error: An OLE DB error has occurred. Error code: 0x80040E05. An OLE DB record is available. Source: "Microsoft OLE DB Provider for SQL Server" Hresult: 0x80040E05 Description: "Object was open.".
Why is there no Expressions collection for the Fuzzy Lookup task? Is there another good way to dynamically configure the ReferenceTableName and MatchIndexName properties?
I try to convert a Procedure that join 8 tables with INNER AND OUTER JOIN, my understanding is that the Lookup task is the one to use and I should break these joins into smaller block, it takes a long time to load when I do this, since each of these tables had 10-40mill. rows and I have 8 tables to go thru, currently this Stored Procedure took 3-4min to run, by converting this to 8 Lookup tasks, it ran for 20min. has anyone run into this issue before and know a work around for this.
Is it possible to parameter the connection of a Lookup Transformation task - specifically the table/view name? I would like to be able to dynamically set the table that the Lookup Transformation is connecting to at runtime.I've looked into the "Use results of an SQL query" on the connection screen (which correlates to the "SqlCommand" property), but I'm unable to pass in a parameter this way.I've also looked into the SqlCommandParam, but that doesn't allow me to use a parameter in the "FROM" clause of the sql syntax.
The logic I am trying to recreate via SSIS is the following SQL statement:
insert into db3.dbo.targettable1 -- Target database table (SiteC, Objecte, Attrib1)
select distinct ?, ?, from ? -- Source database table join dbo.targettable2 c1 -- Target database table on c1.Alias = ? and c1.CSetID = ? and c1.FacID = (select f.PFacID from dbo.Fac f where f.FacID = ?) where not exists (select * from dbo.targettable2 c -- Target database table where c.Alias = ? and c.FacID = ? and c.CSetID = ?)
I have an OLE DB Source that consists of an expression to approximate the following portion of the Above Select statement:
Select ?, from ? -- Source database table and
The package has 2 global variables User:CSetID and User::FacID whose scope is global to the package and whose values are set within a Foreach Loop Container outside of the Data Flow Task
I was trying to reference the 2 global variables within the Looup Transformation to recreate the following portion of the SQL statement.but encounter errors:
join dbo.targettable2 c1 -- Target database table on c1.Alias = ? and c1.CSetID = ? and c1.FacID = (select f.PFacID from dbo.Fac f where f.FacID = ?)
In the Advanced Editor window of Lookup Transaction
select * from (select * from [dbo].[targettable2 ]) as refTable where [refTable].[Alias] = ? and [refTable].[FacID] = ? and [refTable].[CSetID] = ?
Is there away to reference global variables in a Lookup Transformation that are set outside a Data Task Flow?
Hi, I'm trying to implement an incremental data pull (Oracle to SQL) based on Andy's blog: http://sqlblog.com/blogs/andy_leonard/archive/2007/07/09/ssis-design-pattern-incremental-loads.aspx
My development machine is decent: 1.86 GHz, Intel core 2 CPU, 3 GB of RAM. However it seems the data flow task gets hung whenever I test the package against the ~6 million row source, as can be seen from these screenshots. I have no memory limitations on the lookup transformation. After the rows have been cached nothing happens. Memory for the dtsdebug process hovers around 1.8 GB and it uses 1-6 percent of CPU resources continuously. I am not using fast load to insert new records into my sql target table. (I am right clicking Sequence Container 3 and executing this container NOT the entire package in the screenshots)
The same package works fine against a similar test table with 150k rows. http://i248.photobucket.com/albums/gg168/boston_sql92/7.jpg http://i248.photobucket.com/albums/gg168/boston_sql92/8.jpg
The weird thing is it only takes 24 minutes for a full refresh of the entire source table from Oracle to the SQL target table. Any hints,advice would be appreciated.
We did some "at scale" fuzzy lookup tests today and were rather disappointed with the performance. I'm wanting to know your experience so I can set my performance expectations appropriately.
We were doing a fuzzy lookup against a lookup table with 25 million rows. Each row has 11 columns used in the fuzzy lookup, each between 10-100 chars. We set CopyReferenceTable=0 and MatchIndexOptions=GenerateAndPersistNewIndex and WarmCaches=true. It took about 60 minutes to build that index table, during which, dtexec got up to 4.5GB memory usage. (Is there a way to tell what % of the index table got cached in memory? Memory kept rising as each "Finished building X% of fuzzy index" progress event scrolled by all the way up to 100% progress when it peaked at 4.5GB.) The MaxMemoryUsage setting we left blank so it would use as much as possible on this 64-bit box with 16GB of memory (but only about 4GB was available for SSIS).
After it got done building the index table, it started flowing data through the pipeline. We saw the first buffer of ~9,000 rows get passed from the source to the fuzzy lookup transform. Six hours later it had not finished doing the fuzzy lookup on that first buffer!!! Running profiler showed us it was firing off lots of singelton SQL queries doing lookups as expected. So it was making progress, just very, very slowly.
We had set MinSimilarity=0.45 and Exhaustive=False. Those seemed to be reasonable settings for smaller datasets.
Does that performance seem inline with expectations? Any thoughts to improve performance?
I'm working with an existing package that uses the fuzzy lookup transform. The package is currently working; however, I need to add some columns to the lookup columns from the reference table that is being used.
It seems that I am hitting a memory threshold of some sort, as when I add 3 or 4 columns, the package works, but when I add 5 columns, the fuzzy lookup transform fails pre-execute:
Pre-Execute Taking a snapshot of the reference table Taking a snapshot of the reference table Building Fuzzy Match Index component "Fuzzy Lookup Existing Member" (8351) failed the pre-execute phase and returned error code 0x8007007A.
These errors occur regardless of what columns I am attempting to add to the lookup list.
I have tried setting the MaxMemoryUsage custom property of the transform to 0, and to explicit values that should be much more than enough to hold the fuzzy match index (the reference table is only about 3000 rows, and the entire table is stored in less than 2MB of disk space.
Say I want to lookup a value in another dataset, but there is a grouping that requires you to know what the values for each level is in order to get to the correct detail record.  Can you still use the lookup function with more than one field to compare against? So for example
Department \___SalesPerson     \___Measure
I want to be able to add a new row at the Measure level, but lookup each field from another dataset. In order to do that I will need the Department AND SalesPerson values to do the lookup, but I dont think the Lookup function will let us do that will.
I am doing a lookup that requires mapping 2 columns in the column mapping section. When I do this, I get the error "Row yielded no match during lookup" . The SQL that I captured in SQL profiler does find the record when I run it in Management Studio. I have already tried trimming everything to no avail.
Why is this happening?
I tried enabling memory restrictions but then I my package hangs and I get a SQLDUMPER_ERRORLOG.log file with the following logged:
I have a Conditional Split with 3 outputs. On the first output I have a lookup, when I execute the package I have 56 rows going through the Conditional Split, all rows are then going to the 2nd and 3rd output but the lookup on the first output generates an error "Row yielded no match during lookup".
I don't understand why the lookup is generating an error while there is no row going through it.
I need some help to under stand when the right time is for NOLOCK. I work in a small dev group and NOLOCK seams to be a buzz word and others are throwing it in all over for no apparent reason.
I read the thing from http://www.sql-server-performance.com/ and I am sure that our web and SQL servers are about 100x over sized for the application. While are ASP.Net (VB) app may demonstrate some hesitation from time to time I am more inclined to blame poor VB.Net coding techniques before slow SQL. The point being the NOLOCK is being added to SELECTS that are not part of a transaction and were using the SQL data adapter to return datasets or single column values.
Also I am not even sure it’s being used correctly. The OLM has the example: SELECT au_lname FROM authors WITH (NOLOCK)
However I am seeing it formatted like this: SELECT au_lname FROM authors (NOLOCK)
I am by no mean an expert, I follow what I read in books or from examples from others. And I have never read in a book go crazy with NOLOCK because it’s the bomb!
Any thoughts? I am trying to learn as much as I can before I raise my hand and say this might be a bad idea.
Hi, I have a job that runs 3 seperate DTS packages.
The first step imports a file and runs successfully.
The second step which is the 2nd DTS package is hanging in the execute mode until I manually stop the job. Apparently,We discovered a bulk insert that is blocking a select statement--both proccesses are within this second DTS package. I tried using the WITH (UNLOCK) on the tables but this DTS package is still failing.
Does anyone have any suggestion? It would be greatly appreciated.
There has been a discussion/debate going on this thread about the benefits and drawbacks of using the NOLOCK hint: http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=67294
It occurred to me that you might know more about this than any of us, or at least be able to point us to a white paper or knowledge base article that explains the subject in more detail. Any light you can shed on the subject would be a big help.
I am designing a ssis package,This is intends to mine text data(Data extracted from websites). Term lookup/Term extraction has been used as tools for mining. I have lookup terms defined with me for reference table,but the main problem lie in extracting the nearby text/number/charcters to these lookup terms during mining. For example : I found noun "Email" 200 (frequency score) times in my text,Now I want to extract nearby email address(this is also true for PhoneNumber,Address attributes also).so how can I achieve this with SSIS. If u have some idea/suggestion to carry out this challenge with or without Term Extraction/Term Lookup,plz do write here.
Hello, Does anyone know if you place NOLOCK after a view in a select statement, if the effects trickle down to the tables in the view? Or does one have to add NOLOCK to each table within the view?