Fuzzy Lookup Transform Row Scores 'inconsistent' With Individual Column Scores

Sep 1, 2006

I am trying to interpret some of the results I observe when trying to match similar records using a fuzzy lookup transform, but it's not entirely clear how the overall row similarity score is calculated. In particular, sometimes rows with lower individual column similarity scores will achieve a higher similarity and confidence score than a matching row with higher individual column scores.

The transform is configured with 6 text fields set to fuzzy mapping and a minimum similarity of 0, and 3 additional numeric fields with an exact mapping. It is set to return a maximum of 2 matches per lookup and to do an exhaustive search of the reference table.

For example, from the following matching pair of records Match 1 is picked over Match 2 even though it's individual scores are lower.

Match 1 Match 2
----------------- -----------------
_similarity_author 1.0 1.0
_similarity_title 0.85344648 1.0
_similarity_headline 0.0125 0.0125
_similarity_summary 0.0125 0.0125
_similarity_picture 1.0 1.0
_similarity_caption 1.0 1.0

_similarity 7.8429267E-2 7.3196657E-2
_confidence 0.55728668 0.44271332

In another case both matching records have *identical* scores for every mapped column and yet their similarity and confidence scores are different.

Clearly there are other factors involved in calculating the overall row score. Anybody know what these are?


Fernando Tubio

View 2 Replies


ADVERTISEMENT

Top 10 Scores - Row Query

Jul 12, 2004

I have a database that holds student details in one table and their scores in particular subjects in another. The tables are linked through ID. The subjects table looks like this. (example)

ID Sub1 Sub2 Sub3 Sub4 etc
1 2 4 2 6
2 5 5 1 3
3 7 4 2 5
etc

What I need to do is return a students details followed by their top 10 subjects in ascending order.
Student 1's results would look like this...

Student 1
details

Sub1 2
Sub3 2
Sub2 4
Sub4 6

This is driving me insane. Please help.

View 1 Replies View Related

Calculating Survey Scores

Oct 16, 2007

I have surveys that I need to add weights to and was wondering if there was a way to convert the contents of a column.

select empid, ans_for_ql,
(if ans_for_ql = A then 0, B then 3, C then 5) as weightscore,
ans_for_q2,
(if ans_for_q2 = A then 8, B then 4, C then 2, D then 1, E then 0) as weightscore

Here is what the table looks like:

empid | Ans_for_Q1 | Ans_for_Q2
1001 A C
1002 B E

And these are the possible answers and what they need to be converted to:

Weights for answers to Question1
Q1_A = 0
Q1_B = 3
Q1_C = 5

Weights for answers to Question2
Q2_A = 8
Q2_B = 4
Q2_C = 2
Q2_D = 1
Q2_E = 0

View 3 Replies View Related

Cluster Discrimination Scores

Aug 4, 2006



Dear all here:

I have a question about the cluster mining result.

In the discrimination tab shows the different between group 1 and group 2.

I just wonder how to calculate the discremination scores.

Do any expertise can answer the question?



Regards!



Jerry

View 1 Replies View Related

SQL 2012 :: Get Percentage Over Sum Of All Scores Over Last 90 Days

Sep 30, 2014

I am trying to build a query where I want to extract the sum of the scores for each code MCC and get the percentage over the sum of all the scores over the last 90 days

select MCC, sum(score) as total from scores
(select Datediff(day, creationdate, getdate()) as Q from scores
where Datediff(day, creationdate, getdate()) <90)
group by MCC

TABLE
ID creationdate score MCC
1 2014-08-02 30 7422
. . . .
. . . .
. . . .

View 4 Replies View Related

Find 90th Percentile Scores For Each Student

Sep 15, 2006

I am stumped on a set-based approach for this one.

A cursor approach is straightforward enough, but i want to avoid that.

Here's my table:

create table StudentScores
(
id int primary key identity(1,1),
student_id int not null,
score int not null
)

with some sample data:

insert into StudentScores (student_id, score)
select 1, 10 union all
select 1, 29 union all
select 1, 50 union all
select 1, 53 union all
select 1, 45 union all
select 1, 10 union all
select 1, 29 union all
select 1, 50 union all
select 1, 53 union all
select 1, 45 union all
select 1, 88 union all
select 2, 23 union all
select 2, 54 union all
select 2, 55 union all
select 2, 34 union all
select 2, 56 union all
select 2, 78 union all
select 2, 23 union all
select 2, 54 union all
select 2, 55 union all
select 2, 34 union all
select 2, 56 union all
select 2, 78 union all
select 2, 23 union all
select 2, 54 union all
select 2, 55 union all
select 2, 34 union all
select 2, 56 union all
select 2, 78 union all
select 2, 98


What I want is, for each student, what is their 90th percentile score?

For a given single student, one possibility would be:

declare @studentid int
set @studentid = 2
select top 1 @studentid as student_id, a.score as [90th percentile score]
from
(
select top 90 percent score from StudentScores
where student_id = @studentid order by score asc
) as a
order by a.score desc

But I want this for all students, and not use a cursor.

Any ideas?

Thanks!

View 6 Replies View Related

Select Last 20 Records By Date Then Sum Lowest 10 Scores???

Jan 21, 2004

I have a table and am usuing ASP to query the database, the connection is to a MS Access table at the moment but am working to convert to SQL Server.

Question:

I need to select the last 20 records, by a date field, then from those 20 records select the 10 lowest scores.

Example is a member logs on an that member has say 80 total records in the table. Then I need to select the last 20 records entered by the date field then select the lowest 10 scores out of those 20.

I am new to more compex SQL Statements any help would be mostly appreciated!

table = HC_ID
date field = date
member_id = member
score = ScoreHC

View 14 Replies View Related

Fuzzy Lookup | Mutiple Identity Column Error

May 22, 2007

Hi,

I have a table t1 and t2 with following structure.

t1(

ID1 int IDENTITY PRIMARY KEY,

name1 varchar(20),

addr1 varchar(20)

)



t2(

ID2 int IDENTITY PRIMARY KEY,

name2 varchar(20),

addr2 varchar(20)

)



Objective here is to match name1 and name2 column using fuzzy look up. So, I used t1 as source table and t2 as reference/lookup table and mapped name1 and name2 column in Fuzzy look up editor. As output column I selected "ID2" column from t2.



Now when i run the package, it throws error

"Multiple identity columns specified for table '##FLRef_070522_14:16:39_5064_c1c6cbbd-5a54-4e36-9154-1371118f0931'. Only one identity column per table is allowed."



I suppose that during Fuzzy lookup, SSIS internally created temporary table and thats where this error occurs when adding two columns as identity. Can someone help me in resolving this issue.



Thanks

Sid

PS: I need ID2 column as output for further calculation.







View 3 Replies View Related

Difference Between The Fuzzy Lookup And Fuzzy Grouping In Ssis

Aug 14, 2007

Dear Friends,



i think fuzzy lookup

COMPARES WHAT WE ARE MAPING THE COLUMNS WITH SPELLING (IT WILL REJECT ATLEAST 1 LETTER IS DIFFRENT IN ANY RECORD MAPPED COLUMN) EX: RAVI != REVI


what is fuzzy grouping ???? please explain

regards
koti




View 3 Replies View Related

Performance Expectations For Fuzzy Lookup Against 25mill Row Lookup Table

Oct 31, 2007

We did some "at scale" fuzzy lookup tests today and were rather disappointed with the performance. I'm wanting to know your experience so I can set my performance expectations appropriately.

We were doing a fuzzy lookup against a lookup table with 25 million rows. Each row has 11 columns used in the fuzzy lookup, each between 10-100 chars. We set CopyReferenceTable=0 and MatchIndexOptions=GenerateAndPersistNewIndex and WarmCaches=true. It took about 60 minutes to build that index table, during which, dtexec got up to 4.5GB memory usage. (Is there a way to tell what % of the index table got cached in memory? Memory kept rising as each "Finished building X% of fuzzy index" progress event scrolled by all the way up to 100% progress when it peaked at 4.5GB.) The MaxMemoryUsage setting we left blank so it would use as much as possible on this 64-bit box with 16GB of memory (but only about 4GB was available for SSIS).

After it got done building the index table, it started flowing data through the pipeline. We saw the first buffer of ~9,000 rows get passed from the source to the fuzzy lookup transform. Six hours later it had not finished doing the fuzzy lookup on that first buffer!!! Running profiler showed us it was firing off lots of singelton SQL queries doing lookups as expected. So it was making progress, just very, very slowly.

We had set MinSimilarity=0.45 and Exhaustive=False. Those seemed to be reasonable settings for smaller datasets.

Does that performance seem inline with expectations? Any thoughts to improve performance?

View 4 Replies View Related

Fuzzy Lookup Error When Adding Additional Lookup Columns

Sep 26, 2007

I'm working with an existing package that uses the fuzzy lookup transform. The package is currently working; however, I need to add some columns to the lookup columns from the reference table that is being used.

It seems that I am hitting a memory threshold of some sort, as when I add 3 or 4 columns, the package works, but when I add 5 columns, the fuzzy lookup transform fails pre-execute:

Pre-Execute
Taking a snapshot of the reference table
Taking a snapshot of the reference table
Building Fuzzy Match Index
component "Fuzzy Lookup Existing Member" (8351) failed the pre-execute phase and returned error code 0x8007007A.

These errors occur regardless of what columns I am attempting to add to the lookup list.

I have tried setting the MaxMemoryUsage custom property of the transform to 0, and to explicit values that should be much more than enough to hold the fuzzy match index (the reference table is only about 3000 rows, and the entire table is stored in less than 2MB of disk space.

Any ideas on what else could be causing this?

View 4 Replies View Related

Inconsistent SSIS Data Transform Behavior

Jan 15, 2008



High all,

I have a very simple SSIS package that is moving data from a DB2 database to a Teradata box. I've run it around 10 times, twice it pushed data over, the balance of the time, it executes with no error, but moves nothing over. In the "incomplete" runs, a command line box pops up for half a second, then the package ends.

Does anyone have ideas as to why this behavior is occurring?

Thanks,

Mark

View 1 Replies View Related

Fuzzy Grouping Transform Corrupts Pass-through Data

Aug 2, 2005

We are working with a client and are using Fuzzy Group transform for de-duping, and hierarchy creation for a national account list.

View 4 Replies View Related

Fuzzy Lookup

Feb 16, 2007

Hi,

I am using a fuzzy lookup to cleanse data from a sales line details table, during the import process. The sales order line details contains a filed called 'reference' and this is compared to a field called 'category' in another table.
Using data viewers to check through the cleansing process, I notice that the fuzzy lookup doesn't seem to match i.e.
tbl.salesline.reference = 'I3' -> tbl.sales.category ='I03'
the above is OK, but the lookup also returns the following
tbl.salesline.reference = 'I9' -> tbl.sales.category ='I01'
The value I9 doesnt exist, and is miskeyed by user entry, and should have been 'I99'. I would have expected the fuzzy lookup to pickup the I99 value as at least two of the chrs are matching, but no, it picks the first 'I*' in the table.
If I expand the fuzzy lookup to return more results, i.e. 5 per record, then it returns the first 5 results....I01, I02 I03 and so on.
Is there a way of improving the fuzzy lookup itself?

View 1 Replies View Related

Fuzzy Lookup

Feb 6, 2008

The enterprise edition of SQL server includes some advanced BI features, for example the fuzzy lookup feature of IS. If the IS package lives on an enterprise edition of SQL server and the database the package it is targeting lives on a standard edition of SQL server can the advanced features be used? Can you run a fuzzy look against a database on a standard edition of SQL server when th IS package lives on an enterprise edition of SQL server? THANKS!

View 1 Replies View Related

Fuzzy Lookup

Jan 19, 2007

Hi Friends,

Can some body briefly explain me what is the difference between fuzzy lookup and fuzzy grouping?



thanks and regards

View 2 Replies View Related

SSIS Lookup Transformation To Update Individual Columns

Mar 4, 2008

Hi,
I have an example situation that seems like it should have a super easy solution, but my jobs keep failing.
Here we go. . .

I have a SQL Server 2005 table as my source in a data flow task.
This table contains raw data.
We'll call it FACT_Product_Raw - which contains a field called ProductType varchar(1)
Let's say that ProductType contains values of "A" or "B" or "C" - or for that matter, some null and garbage values

I have a lookup table, LOV_Product_Types
This table contains 3 fields that will transform my raw data table
We'll call these fields ProdTypeID smallint, ProdTypeRaw varchar(1) and ProdType smallint
It contains pairs such that A = 1, B = 2, and so on.


Here's what I want to do.
I want to ADD a field to FACT_Product_Raw that contains the "looked up" value from LOV_Product_Types.
Let's say that I want to add the ProdTypeID field to my _Raw table.

I have used the _Raw table as both my source and destination
It blows up every time.
Help.
Thanks,
David

View 5 Replies View Related

Fuzzy Lookup And Case

May 25, 2007

Hi,

Could someone please help!

Im doing a fuzzy lookup based on 3 fields (Surname/DOB/Gender). The only difference between the two sets of data is the case of the first letter of the Surname.

Reference table has "Stuart" Lookup has "stuart", I have set Fuzzy Lookup Input for Surname to Ignore Case but still it won't match.

The DOB/Gender are Exsactly the same.

Why does this not work? I there a work around?

Many Thanks, Deano

View 2 Replies View Related

Error When Doing Fuzzy Lookup

May 16, 2006

I am trying to run a SSIS package that contains a fuzzy lookup. I am using a flat file with about 7 million records as the input. The reference table has about 2000 records. The package fails after about 40,000 records with the following information:

------------------------

Warning: 0x8007000E at Data Flow Task, Fuzzy Lookup [228]: Not enough storage is available to complete this operation.
Warning: 0x800470E9 at Data Flow Task, DTS.Pipeline: A call to the ProcessInput method for input 229 on component "Fuzzy Lookup" (228) unexpectedly kept a reference to the buffer it was passed. The refcount on that buffer was 2 before the call, and 1 after the call returned.
Error: 0xC0047022 at Data Flow Task, DTS.Pipeline: The ProcessInput method on component "Fuzzy Lookup" (228) failed with error code 0x8007000E. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.
Error: 0xC0047021 at Data Flow Task, DTS.Pipeline: Thread "WorkThread0" has exited with error code 0x8007000E.
Error: 0xC02020C4 at Data Flow Task, Flat File Source [1]: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.
Error: 0xC0047039 at Data Flow Task, DTS.Pipeline: Thread "WorkThread1" received a shutdown signal and is terminating. The user requested a shutdown, or an error in another thread is causing the pipeline to shutdown.
Error: 0xC0047021 at Data Flow Task, DTS.Pipeline: Thread "WorkThread1" has exited with error code 0xC0047039.
Error: 0xC0047038 at Data Flow Task, DTS.Pipeline: The PrimeOutput method on component "Flat File Source" (1) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.
Error: 0xC0047021 at Data Flow Task, DTS.Pipeline: Thread "SourceThread0" has exited with error code 0xC0047038.

-------------------------------

I have tried many things - changing the BufferTempStoragePath path to a drive that has plenty space, changed the MaxInsertCommitSize to 5,000...

What else can I do?

Thanks!





View 10 Replies View Related

Fuzzy Lookup Problems.

Mar 8, 2006

Fuzzy lookup seems to be causing some problems to me. It seems to work at times and doesn't at other times. It would work a couple of times fine and give me the desired results but then without changing anything in the dataflow or the data the next few times it would not run at all and fail the pre-execute of the.

Now I'm currently getting the following error:

[Fuzzy Lookup [248]] Error: An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Login timeout expired". An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "An error has occurred while establishing a connection to the server. When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections.". An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Named Pipes Provider: Could not open a connection to SQL Server [233]. ".

[DTS.Pipeline] Warning: A call to the ProcessInput method for input 249 on component "Fuzzy Lookup" (248) unexpectedly kept a reference to the buffer it was passed. The refcount on that buffer was 2 before the call, and 1 after the call returned.

[DTS.Pipeline] Error: The ProcessInput method on component "Fuzzy Lookup" (248) failed with error code 0xC0202009. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.

Any help would be appreciated.



View 1 Replies View Related

Fuzzy Lookup Error

Oct 18, 2006

Hi

I get the following error when I use Fuzzy Lookup in a Data Flow task with TransactionOption property set to €œRequired€?

[Fuzzy Lookup [61]] Error: An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Cannot create new connection because in manual or distributed transaction mode.".

When I Change the TransactionProperty to €œSupported€? it works fine.
I need the property set to Required for it does an undo in the event of a failure.
Any ideas on how to get the Fuzzy Lookup to work

View 3 Replies View Related

Fuzzy Lookup Error

Sep 30, 2007

I have a Fuzzy Lookup in a Data Flow Task that is performing a simple text match based on a data view in SQL Server.

I keep obtaining the error below and I have no idea why. Is there a minimum number of rows required in the view in order for the lookup to work properly?

When I take the Store/Manage Index options off the lookup seems to work properly.

Thank you!


[Fuzzy Merchant Lookup [2832]] Error: SSIS Error Code DTS_E_OLEDBERROR.
An OLE DB error has occurred. Error code: 0x80040E14.
An OLE DB record is available.
Source: "Microsoft SQL Native Client"
Hresult: 0x80040E14
Description: "A .NET Framework error occurred during execution of user-defined routine or aggregate "sp_FuzzyLookupTableMaintenanceInstall": System.Data.SqlClient.SqlException: Error number 8197 is invalid. The number must be from 13000 through 2147483647 and it cannot be 50000.
System.Data.SqlClient.SqlException:
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.SqlInternalConnectionSmi.EventSink.DispatchMessages(Boolean ignoreNonFatalMessages) at Microsoft.SqlServer.Server.SmiEventSink_Default.DispatchMessages(Boolean ignoreNonFatalMessages)
at System.Data.SqlClient.SqlCommand.RunExecuteNonQuerySmi(Boolean sendToPipe)
at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(DbAsyncResult result, String methodName, Boolean sendToPipe)
at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
at Microsoft.SqlServer.Dts.TxBestMatch.TableMaintenance.RaiseErrorId(SqlCommand cmd, FltmErrorMsgId MsgId, FltmErrorState State, SqlServerSeverity Severity)
at Microsoft.SqlServer.Dts.TxBestMatch.TableMaintenance.ReportErrors(SqlCommand cmd, ExceptionType Type, String ErrorMessage, FltmErrorMsgId MsgId, FltmErrorState State, SqlServerSeverity Severity, SqlErrorCollection errors)
at Microsoft.SqlServer.Dts.TxBestMatch.TableMaintenance.TranWrap(DataCleaningOperation c)
at Microsoft.SqlServer.Dts.TxBestMatch.TableMaintenance.ServerInstall(String etiTableName) .".

View 4 Replies View Related

Multiple Fuzzy Lookup

Aug 31, 2006

Is it possbile to have multiple fuzzy lookup within a data flow?

I need to have at least 3 fuzzy lookup in a data flow. Here're the conditions that I try to find match: 1=Zip&City, 2=Zip&State, 3=City&State. I've the first fuzzy lookup working fine. After that, I've a conditional split to get any unmatch, then use another fuzzy lookup for a second condition...at this point, I get the error saying "The package contains two objects with duplicate name of output column _Similarity..." I do not need to get the _Similarity and _Confidence, so is there a way to exclude them from returning in the output?

Any comments?

Thanks in advance.

View 4 Replies View Related

Fuzzy Lookup Problems

Jun 16, 2006

Hi everyone,

Ive just started looking at the Fuzzy Lookup feature and i think i must be getting something fundamentally wrong. I have two tables - each contain different meta data representations for a set of potentially similar documents. The only chance i have of matching a document in table A to a document in table B is a common title field. However, manual input means that the titles may differ in both tables although they are potentially quite similar in most cases.

In the lookup i get to specify the output columns from table B (Reference) which is fine, but i don't seem to get to choose the columns from table A that i would also like to see. So my output shows me all the documents from table B that it thinks are similar to ones in table A...but not identifying which record it's similar to.

I initially thought that the "pass through" columns that i identified would appear in the output - but this does not seem to be the case.

I must be using it incorrectly, but i have no idea how to progress with this apart from creating a new source table (C) which is a full outer join of table A and B - and then also using table C as the reference table, but that seems madness.

any help would be appreciated - ta

Andrew

View 3 Replies View Related

Fuzzy Lookup Questions

Nov 15, 2007

Hi all

I've been doing some research and running some PoCs on using the Fuzzy Lookup Transformation (FLT) and had two questions:

1) When you choose to have a maximum of 1 output returned for each input, does FLT pick this output based on the best (highest) similarity and confidence scores or the first one it finds?

2) Why does FLT not support dynamically setting properties such as ReferenceTableName or MatchIndexName?

Any help or guidance with this is greatly appreciated.

View 3 Replies View Related

Fuzzy Lookup No Columns Available

Apr 3, 2008

I created a fuzzy transformation with an input table and a reference table. When I go to the Columns tab, there are no available input or lookup columns displayed. But if I select a different reference table, sometimes it works.

Are there any specific properties a reference table must have in order for columns to show up?

Thanks,

Tom

View 5 Replies View Related

Difference Between ‘Fuzzy Lookup Transformations ‘

Mar 5, 2007

What is the difference between ‘Fuzzy Lookup Transformations ‘ and ‘Lookup Transformations in ssis .any real time senario for better understanding

View 1 Replies View Related

SSIS Fuzzy Lookup Error

Jul 31, 2006

I am trying to run the Fuzzy Lookup on a SQL2K ref table using 2005 SSIS package and keep getting the following error:

[Fuzzy Lookup [2601]] Error: An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Cannot create a row of size 8061 which is greater than the allowable maximum of 8060.".

Regardless of the changes I make I cannot get this to work and it would make a huge difference if I could get it to run.

Can I create the FuzzyLookupIndex on a SQL2K database?

Any help or advice would be greatly appreciated.



Many thanks



C.

View 4 Replies View Related

Does Fuzzy Lookup Work With Oracle ?

Apr 13, 2007



Sorry, this might be an obvious question, but I can not find anything in the documentation/forum.



I want to use a Fuzzy Lookup between 2 Oracle tables.

I select the Reference Table.

I then switch to the Columns tab, but the "Available Input Columns" and "Available Lookup Columns" lists are always empty.



I have experimented quite a bit, but to no avail. I noticed this on the Reference Table tabpage : "The table maintenance feature requires the installation of a trigger on the reference table". My guess would be that SSIS does not support Oracle for this, but I am not able to find anything in the documentation that it doesn't.



Any answer/pointer greatly appreciated.



Thanks



Jan Vandepitte





View 5 Replies View Related

Fuzzy Lookup Similarity Calculations

Mar 26, 2008

I have come across something on Fuzzy Lookup and dont know am I doing something wrong or is that the behaviour we are expected to get from Fuzzy Lookup.

I have a Test table as shown below with couple of sample rows.


IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Test]') AND type in (N'U'))

DROP TABLE [dbo].[Test]

GO

CREATE TABLE [dbo].[Test](

[Code] [varchar](4) NOT NULL,

[Name] [varchar](50) NULL,

[Server] [varchar](50) NULL

) ON [PRIMARY]

GO

INSERT INTO [Test] ([Code],[Name],[Server])VALUES('PQR','CONTROL GEAR (GROUP) LTD','ELPS122')

GO

INSERT INTO [Test] ([Code],[Name],[Server])VALUES('PQR','CONTROL GEAR (GROUP)','ELPS122')

GO

IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vwTest]'))

DROP VIEW [dbo].[vwTest]

GO

CREATE VIEW [dbo].[vwTest]

AS

SELECT Code, [Name]

FROM Test

GO



OLE DB Data Source - I read the data from Test Table.

Fuzzy Lookup - vwTest is used as Reference Table Name. Joined by Code & Name. Maximum No of matches to output per lookup is set to 5.

Row Count - Data Viewer between Fuzzy Lookup and RowCount
The results as shown below:
Name Name (1) _Similarity_Name
CONTROL GEAR (GROUP) LTD CONTROL GEAR (GROUP) LTD 1
CONTROL GEAR (GROUP) LTD CONTROL GEAR (GROUP) 0.6
CONTROL GEAR (GROUP) CONTROL GEAR (GROUP) 1
CONTROL GEAR (GROUP) CONTROL GEAR (GROUP) LTD 0.8

The result produced by Fuzzy Lookup has shown above.

My question is are we expected to get same similarity value or not. It doesnt produce same similarity value during my testing.

I was expecting same similarity score if I do the following two statements.
Is "CONTROL GEAR (GROUP) LTD" same as "CONTROL GEAR (GROUP)"
Is "CONTROL GEAR (GROUP)" same as "CONTROL GEAR (GROUP) LTD"

I think I know the answer, but I would like to know why though?


Thanks
Sutha

View 7 Replies View Related

Usage Of Variables In Fuzzy Lookup

Nov 22, 2007

Hi,

I am using Fuzzy Lookup in my transformation. I wanted to know if there is a way to use variables for the MinSimilarity property in the Advanced Editor tab. Instead of giving a hardcoded value between 0 to 1, I want to take the value from a variable and use it. Is this possible in SSIS.

Thanks,
Akalya

View 3 Replies View Related

Fuzzy Lookup Match Issue

May 29, 2007

Hello,



I have a peculiar problem in my project. My project design is like this

The number in (...) are count of records.


File feed (1000)
|
|
Fuzzy Lookup
against Table2
|
|
Split Fz Lookup results
(_Similarity >= 0.60 && _Confidence >= 0.85)
| |
| |

| Write matches to Table1 (250)

|

Fuzzy Group

Remaining rows (750)

|

|

Split Fz Group results

| |

| |

Write Canonicals Write Dupes

to Table2 to Table1

(300) (450)



This is basically a customer de-dupification project.

The Table2 has the canonicals and Table1 has the dupes (of the canonicals).

I already have some data in these tables and the new data is matched against the existing data

in these tables and classified as new customers and duplicate customers.



In the above process one could notice that the rows identified as dupes of already exsting canonicals

by the Fuzzy Lookup task are written into the dupes table (Table1) and will not be processed further down

the line in the project.

But in my case I see that those matches identified by Fuzzy lookup are further being included in the

Fuzzy Grouping also.



When I run this in debug mode in BIDS, it shows the correct numbers as I have depicted in the

illustration above. But, after execution, when I query the tables it shows that all 1000 rows

went through Fuzzy Grouping.



Any thoughts?



Btw, is there anyway to upload attachments to the postings here?

View 1 Replies View Related

Fuzzy Lookup Taking Too Much Time

Dec 14, 2006

I have a SSIS package where a small table of 270 rows are fuzzy looked up with a table in another sql server and inserts the records to a temporary table. This takes more than 3 hours in debug mode or so and never goes beyond this step.I have used a OLE DB destination to insert to temporary table and temporary table doesn't get a value.

View 2 Replies View Related







Copyrights 2005-15 www.BigResource.com, All rights reserved