Problems With Lookup Task And Dates
Aug 2, 2005I€™m trying to make a Lookup transformation with a relation on two string columns and a between dates condition set by parameter:
View 7 RepliesI€™m trying to make a Lookup transformation with a relation on two string columns and a between dates condition set by parameter:
View 7 RepliesHi All,
Actually this is in regard to SCD Type 2 Dimension, Scenario is like that I am moving Fact table from some old source and I have dimensionA description value in fact which I want to replace with appropriate id from Dimension Table and that Dimension table is SCD Type 2 based on StartDate and EndDate and Fact Table doesn't contains direct date value rather there is timeId in Fact so to update the value in Fact table I have to Join Time Dimension table and other Dimension Table to replace fact Description with proper Id.
Lets assume DimensionA Structure
id
Description
StartDate
EndDate
Fact Table
id
measure1
measure2
TimeId
Description
Time Dimension
TimeId
Date
Day
Hour ...
Hi,
I need to perform a lookup based on a business key and a date in the source that must be between an effective from and effective to date in the reference table. I've been able to achieve this by updating the SQL caching command on the advanced tab but the performance is very slow. I have 6 lookups of this type in the data flow with a source SQL statement that returns approx 1 million rows and this package takes over 90 minutes to run.
The caching SQL command now looks like this
select * from
(select * from [ReferenceTable]) as refTable
where [refTable].[Key] = ? and ? BETWEEN [refTable].[StartDate] AND [refTable].[EndDate]
and I've set up the parameters so that the business key is the first parameter and the source date is the second.
I have another lookup in the flow that does a straight equality comparison using 2 columns and the Progress tab shows that this lookup is cached (even though I haven't enabled it on the Advanced tab of the transformation editor) but none of the other lookups (using the date) appear to be cached, even though I have enabled them to be.
Can anyone suggest how I can improve the performance?
Thanks.
Hi all I this stored procedure is suppose to turn give the users the ability to put the month and date in so they can see how many each inspector did for that month and year.
Create Procedure LookupYearMonth
AS
SELECT I.[Last Name], I.[First Name],
[Date by Month] = DATENAME(month, MT.[Date] + ' ' + DATENAME(year, MT.[DATE]),
[Count Of Main Table] = count(*)
FROM Inspectors I
INNER JOIN [Main Table] MT
ON (I.ID = MT.Inspector)
GROUP BY I.[Last Name], I.[First Name],
DATENAME(month, MT.[Date] + ' ' + DATENAME(year, MT.[DATE])
WHERE (MT.[Date] >= CONVERT(datetime, @EnterMonthYear)) AND
(MT.[Date] < DATEADD(month, 1, CONVERT(datetime, @EnterMonthYear)))
GO
I had this (what seems to be a) simple question asked today and I'm afraid I didn't like my answer. Does anyone know the proper answer to this one:
Any ideas on how I can constrain a lookup or merge join based on the dimension row's effective and expired dates so three criteria are needed as follows:
1. DataStagingSource.ModifyDate < DataWarehouseDimension.RowExpiredDate AND
2. DataStagingSource.ModifyDate >= DataWarehouseDimension.RowEffectiveDate AND
3. DataStagingSource.NaturalKey = DataWarehouseDimension.NaturalKey
-- Brian
Hi,
I've looked at so many examples on the web but I can't get my dataflow with lookup to work correctly. It just keeps inserting the records instead of updating them because they already exist. I know it's something simple but I've been trying to figure it out for the last 4 hours and am over my head. In the lookup I have 5 columns as available input and of course the matching lookup columns and selected 4 with lookup operation replace and tried add as new. It just keeps adding them to my destination table when they already exists. I can't figure out what i am doing wrong?
thanks,
Is there a way to set the Lookup task to be not case sensitive? For example, the lookup table has id 1, value 'ABCD'. The value that I'm using to lookup is 'abcd'. I cannot make it to return the id 1 unless I convert my lookup value to be all upper case.
And I don't want to use Fuzzy Lookup.
Thanks,
Ash
I have an SSIS package that unpivots data - each row from the source becomes 20-24 rows in the destination db. In the dataflow it looks up some type-data IDs before inserting into the destination table. The whole process flies through the data at an incredible rate, however I find that the final commit on data insertion takes a very long time, and two of the lookup tasks remain yellow throughout. It appears that there may be a contention issue because the final table has FKs to those lookup tables.
In T-SQL it is possible to do SELECT..... WITH (NOLOCK), but when I modified the SQL statement of the lookup task by adding this clause, it caused an error "Incorrect syntax near the keywork 'with'."
Is there any way around this problem? THNX!
PhilSky
Hi, I'm using the lookup task to perform updates on some records and it seems to be working correctly but isn't making all updates. I have a case where there are 543 rows in my source connection, there are 436 of them that do not match the date column of the 543 in my destination table. It only updated 224 of them when it should have updated all 436? Does anyone know why it would only update this many instead of all? I also had another that only updated 18 out of 19 instead of all 19.
thanks,
This doesn't seem possible but I'll ask anyway...
Can I build a lookup task where the lookup query is based on a variable, rather than hardcoding the SQL staement?
Hi
I am trying to use this painful new SSIS process. I basically need to use a lookup task to check to see whether a record exists or not. If not, then I need to insert the record. However, because this is treated as an error situation (which is stupid in itself), I get a problem when the number of records not found reach the MaximumErrorCount, and the rest of the package fails. Is there any other method of doing this type of thing, without simply increasing the MaximumErrorCounty to some ludicrous value. I could do this type of thing very very very easily when using DTS packages using the Data Driven Task, it seems so stupid that I can't perform the same kind of task using SSIS.
Any help would be appreciated
Thanks
Darrell
Hello:
I'm attempting to use the lookup task and i'm running into an issue about incompatible datatypes.
The table that i'm referencing in the lookup has a field named DateCreated, and it's DataType is DateTime.
But when I hook it up to the Lookup Task, the lookup task thinks it's a DB_Timestamp, so i get an incompatible data type error.
What can i do?
Hi,
My SQL Server database is case in-sensitive.
I have a lookup task to join on varhar column. If the data is only case different such as 'DataModel' and 'Datamodle', lookup task consdiers them as different. Is there a way to set up lookup task as case insentive? I know I can work around it by converted them to upper case and compare them.
Hello,
Can anyone explain to me (or show me where to get info) in what scenario should I use Full Cache, Partial, or No_Cache mode in the Lookup task? I got below warning message during execute my package. I understand that my lookup data have duplicate reference keys. I want to change the mode to partial or no cache to see how it works, but don't know what to change for the parameter map.
Warning: 0x802090E4 at Data Flow Task, Lookup [37]: The Lookup transformation encountered duplicate reference key values when caching reference data. The Lookup transformation found duplicate key values when caching metadata in PreExecute. This error occurs in Full Cache mode only. Either remove the duplicate key values, or change the cache mode to PARTIAL or NO_CACHE.
Any info is appreciated.
Ash
We have a LKP table that we will use to decode incoming codes from multiple sales feeds to link to the relevant surrogate key for the ultimate dimension table.
The LKP table has a start date, end date and active row flag to track history of the codes.
This table in its entirety is required for the FULL history load but we will need a CURRENT version of the table containing only the ACTIVE rows for the DAILY incremental loading.
I am thinking of putting a standard view over the top of the LKP table to present only the ACTIVE rows rather than creating a 2nd table and reloading it each night. We don't have enterprise edition to use materialised views.
My question is, will the LKP cache populate as the SSIS packages begins, loading it with all the rows from the view, as it would if I used the 2nd physical table with only ACTIVE rows?
SSIS data flow transformation - Lookup task - best practice concerning Cachetype
I would like to know if there's any best practice concerning the CacheType property for the the Lookup task. Default value is "Full", but if the SSIS package is working with at lot of data, i.e. +10 mill. records from the OLE DB source to be handled through a variated numbers of data flow transformation tasks, it must have an impact memory usage if the lookup table also is a large table, i.e. +8 mill. records? When should I consider turning the property value to "none"
I Can't reproduce the error if I run the package stand-alone.
I'm using the same lookup call (same table, etc.) in 2 packages that are running in parallel (called by a parent package).
[LKP_UnderwriterId [72283]] Error: An OLE DB error has occurred. Error code: 0x80040E05. An OLE DB record is available. Source: "Microsoft OLE DB Provider for SQL Server" Hresult: 0x80040E05 Description: "Object was open.".
Anyone seen this one?
Why is there no Expressions collection for the Fuzzy Lookup task? Is there another good way to dynamically configure the ReferenceTableName and MatchIndexName properties?
View 1 Replies View RelatedI try to convert a Procedure that join 8 tables with INNER AND OUTER JOIN, my understanding is that the Lookup task is the one to use and I should break these joins into smaller block, it takes a long time to load when I do this, since each of these tables had 10-40mill. rows and I have 8 tables to go thru, currently this Stored Procedure took 3-4min to run, by converting this to 8 Lookup tasks, it ran for 20min. has anyone run into this issue before and know a work around for this.
Thanks
Is it possible to parameter the connection of a Lookup Transformation task - specifically the table/view name? I would like to be able to dynamically set the table that the Lookup Transformation is connecting to at runtime.I've looked into the "Use results of an SQL query" on the connection screen (which correlates to the "SqlCommand" property), but I'm unable to pass in a parameter this way.I've also looked into the SqlCommandParam, but that doesn't allow me to use a parameter in the "FROM" clause of the sql syntax.
View 4 Replies View Related
The logic I am trying to recreate via SSIS is the following SQL statement:
insert into db3.dbo.targettable1 -- Target database table
(SiteC,
Objecte,
Attrib1)
select distinct ?,
?,
from ? -- Source database table
join dbo.targettable2 c1 -- Target database table
on c1.Alias = ? and
c1.CSetID = ? and
c1.FacID = (select f.PFacID
from dbo.Fac f
where f.FacID = ?)
where not exists (select * from dbo.targettable2 c -- Target database table
where c.Alias = ? and
c.FacID = ? and
c.CSetID = ?)
I have an OLE DB Source that consists of an expression to approximate the following portion of the Above Select statement:
Select ?,
from ? -- Source database table and
The package has 2 global variables User:CSetID and User::FacID whose scope is global to the package and whose values are set within a Foreach Loop Container outside of the Data Flow Task
I was trying to reference the 2 global variables within the Looup Transformation to recreate the following portion of the SQL statement.but encounter errors:
join dbo.targettable2 c1 -- Target database table
on c1.Alias = ? and
c1.CSetID = ? and
c1.FacID = (select f.PFacID
from dbo.Fac f
where f.FacID = ?)
In the Advanced Editor window of Lookup Transaction
select * from
(select * from [dbo].[targettable2 ]) as refTable
where [refTable].[Alias] = ? and [refTable].[FacID] = ? and
[refTable].[CSetID] = ?
Is there away to reference global variables in a Lookup Transformation that are set outside a Data Task Flow?
Hi,
I'm trying to implement an incremental data pull (Oracle to SQL) based on Andy's blog:
http://sqlblog.com/blogs/andy_leonard/archive/2007/07/09/ssis-design-pattern-incremental-loads.aspx
My development machine is decent: 1.86 GHz, Intel core 2 CPU, 3 GB of RAM.
However it seems the data flow task gets hung whenever I test the package against the ~6 million row source, as can be seen from these screenshots. I have no memory limitations on the lookup transformation. After the rows have been cached nothing happens. Memory for the dtsdebug process hovers around 1.8 GB and it uses 1-6 percent of CPU resources continuously. I am not using fast load to insert new records into my sql target table. (I am right clicking Sequence Container 3 and executing this container NOT the entire package in the screenshots)
http://i248.photobucket.com/albums/gg168/boston_sql92/1.jpg
http://i248.photobucket.com/albums/gg168/boston_sql92/2.jpg
http://i248.photobucket.com/albums/gg168/boston_sql92/3.jpg
http://i248.photobucket.com/albums/gg168/boston_sql92/4.jpg
http://i248.photobucket.com/albums/gg168/boston_sql92/5.jpg
http://i248.photobucket.com/albums/gg168/boston_sql92/6.jpg
The same package works fine against a similar test table with 150k rows.
http://i248.photobucket.com/albums/gg168/boston_sql92/7.jpg
http://i248.photobucket.com/albums/gg168/boston_sql92/8.jpg
The weird thing is it only takes 24 minutes for a full refresh of the entire source table from Oracle to the SQL target table.
Any hints,advice would be appreciated.
We did some "at scale" fuzzy lookup tests today and were rather disappointed with the performance. I'm wanting to know your experience so I can set my performance expectations appropriately.
We were doing a fuzzy lookup against a lookup table with 25 million rows. Each row has 11 columns used in the fuzzy lookup, each between 10-100 chars. We set CopyReferenceTable=0 and MatchIndexOptions=GenerateAndPersistNewIndex and WarmCaches=true. It took about 60 minutes to build that index table, during which, dtexec got up to 4.5GB memory usage. (Is there a way to tell what % of the index table got cached in memory? Memory kept rising as each "Finished building X% of fuzzy index" progress event scrolled by all the way up to 100% progress when it peaked at 4.5GB.) The MaxMemoryUsage setting we left blank so it would use as much as possible on this 64-bit box with 16GB of memory (but only about 4GB was available for SSIS).
After it got done building the index table, it started flowing data through the pipeline. We saw the first buffer of ~9,000 rows get passed from the source to the fuzzy lookup transform. Six hours later it had not finished doing the fuzzy lookup on that first buffer!!! Running profiler showed us it was firing off lots of singelton SQL queries doing lookups as expected. So it was making progress, just very, very slowly.
We had set MinSimilarity=0.45 and Exhaustive=False. Those seemed to be reasonable settings for smaller datasets.
Does that performance seem inline with expectations? Any thoughts to improve performance?
I'm working with an existing package that uses the fuzzy lookup transform. The package is currently working; however, I need to add some columns to the lookup columns from the reference table that is being used.
It seems that I am hitting a memory threshold of some sort, as when I add 3 or 4 columns, the package works, but when I add 5 columns, the fuzzy lookup transform fails pre-execute:
Pre-Execute
Taking a snapshot of the reference table
Taking a snapshot of the reference table
Building Fuzzy Match Index
component "Fuzzy Lookup Existing Member" (8351) failed the pre-execute phase and returned error code 0x8007007A.
These errors occur regardless of what columns I am attempting to add to the lookup list.
I have tried setting the MaxMemoryUsage custom property of the transform to 0, and to explicit values that should be much more than enough to hold the fuzzy match index (the reference table is only about 3000 rows, and the entire table is stored in less than 2MB of disk space.
Any ideas on what else could be causing this?
Say I want to lookup a value in another dataset, but there is a grouping that requires you to know what the values for each level is in order to get to the correct detail record.  Can you still use the lookup function with more than one field to compare against? So for example
Department
\___SalesPerson
    \___Measure
I want to be able to add a new row at the Measure level, but lookup each field from another dataset. In order to do that I will need the Department AND SalesPerson values to do the lookup, but I dont think the Lookup function will let us do that will.
I am doing a lookup that requires mapping 2 columns in the column mapping section. When I do this, I get the error "Row yielded no match during lookup" . The SQL that I captured in SQL profiler does find the record when I run it in Management Studio. I have already tried trimming everything to no avail.
Why is this happening?
I tried enabling memory restrictions but then I my package hangs and I get a SQLDUMPER_ERRORLOG.log file with the following logged:
07/24/07 13:35:48, ERROR , SQLDUMPER_UNKNOWN_APP.EXE, AdjustTokenPrivileges () failed (00000514)
07/24/07 13:35:48, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, Input parameters: 4 supplied
07/24/07 13:35:48, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, ProcessID = 5952
07/24/07 13:35:48, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, ThreadId = 0
07/24/07 13:35:48, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, Flags = 0x0
07/24/07 13:35:48, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, MiniDumpFlags = 0x0
07/24/07 13:35:48, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, SqlInfoPtr = 0x0100C5D0
07/24/07 13:35:48, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, DumpDir = <NULL>
07/24/07 13:35:48, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, ExceptionRecordPtr = 0x00000000
07/24/07 13:35:48, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, ContextPtr = 0x00000000
07/24/07 13:35:48, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, ExtraFile = <NULL>
07/24/07 13:35:48, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, InstanceName = <NULL>
07/24/07 13:35:48, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, ServiceName = <NULL>
07/24/07 13:35:48, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, Callback type 11 not used
07/24/07 13:35:48, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, Callback type 15 not used
07/24/07 13:35:49, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, Callback type 7 not used
07/24/07 13:35:49, ACTION, SQLDUMPER_UNKNOWN_APP.EXE, MiniDump completed: C:Program FilesMicrosoft SQL Server90SharedErrorDumpsSQLDmpr0033.mdmp
07/24/07 13:35:49, ACTION, DtsDebugHost.exe, Watson Invoke: No
Why am I getting this error with "Enable Memory Restriction"?
Hi all,
I don't understand what's happening here.
I have a Conditional Split with 3 outputs. On the first output I have a lookup, when I execute the package I have 56 rows going through the Conditional Split, all rows are then going to the 2nd and 3rd output but the lookup on the first output generates an error "Row yielded no match during lookup".
I don't understand why the lookup is generating an error while there is no row going through it.
Any idea ?
Sébastien.
SQL 2005 Dev
How can I do this with Parameters? I can get a single parameter to filter for a single date (or even a combo list of the dates in DB). But I want my parameters to interact so that they specify a range. Is this possible?
I am designing a ssis package,This is intends to mine text data(Data extracted from websites).
Term lookup/Term extraction has been used as tools for mining.
I have lookup terms defined with me for reference table,but the main problem lie in extracting the nearby text/number/charcters to these lookup terms during mining.
For example :
I found noun "Email" 200 (frequency score) times in my text,Now I want to extract nearby email address(this is also true for PhoneNumber,Address attributes also).so how can I achieve this with SSIS.
If u have some idea/suggestion to carry out this challenge with or without Term Extraction/Term Lookup,plz do write here.
Today I have got one scenario to calculate the (sum of days difference minus(-) the dates if the same date is appearing both in assgn_dtm and complet_dtm)/* Here goes the table schema and sample data */
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[temp_tbl]') AND type in (N'U'))
DROP TABLE [dbo].[temp_tbl]
GO
CREATE TABLE [dbo].[temp_tbl](
[tbl_id] [bigint] NULL,
[cs_id] [int] NOT NULL,
[USERID] [int] NOT NULL,
[code]....
I have a table that has hotel guests and their start stay date and end stay date, i would like to insert into a new table the original information + add all days in between.
CREATE TABLE hotel_guests
(
[guest_name] [varchar](25) NULL,
[start_date] [date] NULL,
[end_date] [date] NULL,
[comment] [varchar](255) NULL
[code]...
OK. I give up and need help. Hopefully it's something minor ...
I have a dataflow which returns email addresses to a recordset.
I pass this recordset into a ForEachLoop configuring the enumerator as (Foreach ADO Enumerator). I also map the email address as a variable with index 0.
I then have a Execute SQL task which receives this email address as a varchar variable (parameter 0) which I then use in my SQL command to limit the rows returned. I have commented out the where clause and returned all rows regardless of email address to try to troubleshoot this problem. In either event, I then use a resultset to store the query result of type object and result name 0.
I then pass this resultset into a script variable to start parsing the sql rows returned as type object. ( I assume this is the correct way to do this from other prior posts ...).
The script appears to throw an exception at the following line. I assume it's because I'm either not passing in the values properly or the query doesn't return anything. However, I am certain the query works as it executes just fine at the command prompt.
Try
ds = CType(Dts.Variables("VP_EMAIL_RESULTS_RS").Value, DataSet)
My intent is to email the query results to each email address with the following type of data by passing the parsed data from the script to a send mail task. Email works fine and sends out messages but the content is empty. I pass the parsed data as string values to the messagesource and define the messagesourcetype as a variable in the mail task.
part number leadtime
x 5
y 9
....
Does anyone have any idea what I might be doing wrong?
thanks
John
I want to know if there is a way to compare dates in the sql statement with dates that I input into a database and todays date. the datatype that I'm using is smalldatetime.The statement I used is:Select Date from Table where Date > 'Today.now'I get an errorCould this be done or is there another approach?
View 1 Replies View Related