Fuzzy Grouping Similarity Calculations

Apr 30, 2008

Hi all,

My question is how to calculate the similarity by using SQL query, example LIKE % , order by.....? Now i'm doing a function same like fuzzy grouping but i do not know how to get the answer, mean how they get match with those selected row of data.


Hope my question is clear. How to write the correct query? What should i do? I 'm newbie in Integration Services, so i need ur explaination in step by step if there hv correction.



I am looking forward to hearing from you shortly and thanks a lot in advance.

Thanks!

rgds,
xuenly

View 3 Replies


ADVERTISEMENT

Fuzzy Lookup Similarity Calculations

Mar 26, 2008

I have come across something on Fuzzy Lookup and dont know am I doing something wrong or is that the behaviour we are expected to get from Fuzzy Lookup.

I have a Test table as shown below with couple of sample rows.


IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Test]') AND type in (N'U'))

DROP TABLE [dbo].[Test]

GO

CREATE TABLE [dbo].[Test](

[Code] [varchar](4) NOT NULL,

[Name] [varchar](50) NULL,

[Server] [varchar](50) NULL

) ON [PRIMARY]

GO

INSERT INTO [Test] ([Code],[Name],[Server])VALUES('PQR','CONTROL GEAR (GROUP) LTD','ELPS122')

GO

INSERT INTO [Test] ([Code],[Name],[Server])VALUES('PQR','CONTROL GEAR (GROUP)','ELPS122')

GO

IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vwTest]'))

DROP VIEW [dbo].[vwTest]

GO

CREATE VIEW [dbo].[vwTest]

AS

SELECT Code, [Name]

FROM Test

GO



OLE DB Data Source - I read the data from Test Table.

Fuzzy Lookup - vwTest is used as Reference Table Name. Joined by Code & Name. Maximum No of matches to output per lookup is set to 5.

Row Count - Data Viewer between Fuzzy Lookup and RowCount
The results as shown below:
Name Name (1) _Similarity_Name
CONTROL GEAR (GROUP) LTD CONTROL GEAR (GROUP) LTD 1
CONTROL GEAR (GROUP) LTD CONTROL GEAR (GROUP) 0.6
CONTROL GEAR (GROUP) CONTROL GEAR (GROUP) 1
CONTROL GEAR (GROUP) CONTROL GEAR (GROUP) LTD 0.8

The result produced by Fuzzy Lookup has shown above.

My question is are we expected to get same similarity value or not. It doesnt produce same similarity value during my testing.

I was expecting same similarity score if I do the following two statements.
Is "CONTROL GEAR (GROUP) LTD" same as "CONTROL GEAR (GROUP)"
Is "CONTROL GEAR (GROUP)" same as "CONTROL GEAR (GROUP) LTD"

I think I know the answer, but I would like to know why though?


Thanks
Sutha

View 7 Replies View Related

Difference Between The Fuzzy Lookup And Fuzzy Grouping In Ssis

Aug 14, 2007

Dear Friends,



i think fuzzy lookup

COMPARES WHAT WE ARE MAPING THE COLUMNS WITH SPELLING (IT WILL REJECT ATLEAST 1 LETTER IS DIFFRENT IN ANY RECORD MAPPED COLUMN) EX: RAVI != REVI


what is fuzzy grouping ???? please explain

regards
koti




View 3 Replies View Related

Fuzzy Grouping Error

Oct 16, 2005

I am using the Sept CTP, I am doing a fuzzy grouping on 1.5Mil records.

View 7 Replies View Related

Fuzzy Grouping Using Original Key

Nov 14, 2007

I managed to get fuzzy grouping working. The relevant output (_key_in and _key_out) are stored in a new table that is a copy of the old table + fuzzy grouping columns.

How do i get SSIS to store the _key_in and _key_out in the original table?
The new matching column _key_out refers to the new key: _key_in. How could i get SSIS translate that to a matching column that refers to my original key?

View 1 Replies View Related

Fuzzy Grouping In SSIS

Aug 2, 2007


hi focks,


WHAT IS THE USE OF Fuzzy Grouping IN SSIS

and please give me the example

regards
koti

View 1 Replies View Related

Fuzzy Grouping Errors

May 15, 2006

Hi - we have been evaluating using Fuzzy Grouping and Lookup for maintaining our large list of customer records.  Initial testing with Grouping on about 300K records went great but now with a larger sample of 7.3 million records we are running into problems.   It doesn't appear to be system limitation - the index is built reasonably quickly and without errors but when it starts the matching we get these errors:

[Fuzzy Grouping Inner Data Flow : DTS.Pipeline] Error: The ProcessInput method on component "Fuzzy Lookup" (86) failed with error code 0x8000FFFF. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.

[Fuzzy Grouping Inner Data Flow : DTS.Pipeline] Error: Thread "WorkThread0" has exited with error code 0x8000FFFF.

[Fuzzy Grouping Inner Data Flow : DTS.Pipeline] Error: Thread "WorkThread1" received a shutdown signal and is terminating. The user requested a shutdown, or an error in another thread is causing the pipeline to shutdown.

[Fuzzy Grouping Inner Data Flow : OLE DB Source [1]] Error: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.

[Fuzzy Grouping Inner Data Flow : DTS.Pipeline] Error: Thread "WorkThread1" has exited with error code 0xC0047039.

[Fuzzy Grouping Inner Data Flow : DTS.Pipeline] Error: The PrimeOutput method on component "OLE DB Source" (1) returned error code 0xC02020C4.  The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.

[Fuzzy Grouping Inner Data Flow : DTS.Pipeline] Error: Thread "SourceThread0" has exited with error code 0xC0047038.

One thing we did find is that our test server didn't have SP1 installed and that seemed to help a lot (we were getting buffer errors prior to SP1).   One other note - the desination table is populated with all the data but no scoring has been applied to it.

Does anyone have any ideas what could be causing this?

Thanks!

Keith Doyle

View 5 Replies View Related

Fuzzy Grouping In Parallel

May 11, 2007

Hello,

I have created a project to do de-dupification of addresses.



I understand that Fuzzy Grouping will take less time if it has lesser data volume to process.

My source feed file is sometimes huge. So I am splitting the input into multiple branches based on

the first letter of the city. There are 7 branches in the process.


Source File Feed
|
Split data into 7 groups
|
------------------------------------------------------------------------------------------------------------------------------------------
| | | | | | |
FzGrpg FzGrpg FzGrpg FzGrpg FzGrpg FzGrpg FzGrpg
| | | | | | |
Split Split Split Split Split Split Split
| | | | | | |
------------- -------------- -------------- -------------- -------------- -------------- --------------
| | | | | | | | | | | | | |
<- - - - - - - Write the Canonicals and Dupes from each of these splits into database - - - - - - - - ->


When I designed this I was hoping that each of the Fuzzy Grouping tasks will execute in parallel.

But in reality they are processing one after the other.

Is there anyway to make them execute in parallel?



Appreciate your help.



Thanks

KM

View 12 Replies View Related

Extracting The Duplicates Using Fuzzy Grouping

Jul 29, 2006

Hi,

I have an Oracle table called "Party" which contains Party_Id as primary key and have Party_Name, Party_Addr etc., as fields. We have lot more duplicate party details such as (party_name and party_addr) in this table. We are trying to aviod duplicates using FUZZY logic of SSIS.

1. Is any body suggest me how to create package to avoid duplicates using Fuzzy logic for this scenario(Step by step instructions are good for me to understand SSIS).

2. Could you please provide me some samples for FUZZY(Please send me a sample to my email)

View 1 Replies View Related

Fuzzy Grouping Causing Minidump

Oct 18, 2007

I was running a Fuzzy Grouping task on SQL Server Enterprise Edition SP1 without any issues. I then applied SP2 and now that same Fuzzy Grouping is causing a minidump and terminating the process.

First, does anybody know anything about this kind of issue?

Second, I tried to run the minidump file in Visual Studio but I cannot actually run the dump file in Visual Studio as I keep getting the following exception:


Debugging information for 'DtsDebugHost.exe' cannot be found or does not match. No symbols loaded.

Finally, I did obtain a random error on the server itself that displayed the GUID: 58FC39EB-9DBD-4EA7-B7B4-9404CC6ACFAB.

This GUID appears to be tied to a Dr. Watson error but, again, I cannot figure out what process is breaking.

Can somebody please help?

View 1 Replies View Related

Need Help In Address Cleansing-- Fuzzy Grouping?

Oct 4, 2007

Hi,

We do not have any Address Cleansing tools and the requirement is we have to cleanse the data, finding the best possible record which has all info and update other records accordingly.

I am Not sure we can do this Fuzzy Grouping Transformation.

Example:

I have Source table with following info.












Customer_id

Location_Address

Location_City
Location_State
Location_Zip
Location_County

TT101
252 HARVARD RD
ATLANTA
GA
30340
FULTON

TT101



30340


TT101
252 HARVARD RD
ATLANTA




TT102
125TEST
CUMMING




TT102
125 TEST DR
CUMMING
GA
30040
FORSYTH

TT102


GA
30040

Please let me know the solution

Thanks in advance.

View 4 Replies View Related

Dynamically Configuring Fuzzy Grouping

Jun 7, 2007

Hello,



I have been struggling with this for quite awhile so any help would be appreciated.



I need to know if there is away to populate the fuzzy grouping control dynamically. I know you programmatically design a package and customize it in C# but for our purposes we would like to control the SSIS package via database settings. When the settings change the package would then act different. Its a simple a package consisting of an Input - fuzzy grouping - conditional split - output. The connections are setup dynamically using parameters, expressions and a script task. Is there anyway I could do a similar thing for Fuzzy Grouping?

View 13 Replies View Related

Fuzzy Grouping Performance Issue

May 30, 2007

Hello All,

We have a SSIS package which includes Fuzzy Grouping in Data Flow. It takes two columns from source table and saves outputs in different table with match score etc. Following is the way we are doing it:
1. Load required data from table using OLEDB connection (source)
2. Sort the data
3. Apply Fuzzy grouping (using dedicated database instead tempdb and MinSimilarity = 0.6)
4. Send to destination table using OLEDB connection (destination)

In input table we have millions of records. It takes too long to execute and even sometime it fails after running 12 hours. Any suggestions for performance improvement are welcomed.

Appreciate your help.

Thanks and regards,
Ashish Basran

View 1 Replies View Related

Fuzzy Grouping Resource Usage

Nov 21, 2007

I have a few questions about the amounts of resources used by the fuzzy grouping transformation. I am running a little less than 5mil records through a fuzzy grouping that exact matches one column and fuzzy matches one. The server executing the package is a dual-core xeon with 2gb ram, running a default instance of sql 2005 enterprise.

I have been attempting to execute this package for a while now but it keeps erroring out for various reasons. At first, it was from a lack of available memory. I limited the memory usage of sql server to 256mb and set the buffer temp storage path, which alleviated those errors. However, now, my tempdb transaction log is growing significantly. It failed once for not being able to grow and reallocate quickly enough, but enlarging the auto-growth factor fixed that. Then, it filled up the volume the tempdb log was on, so now I have moved it to the san and am about to try again.

I was wondering, does anyone have a general idea on approximate resource usage by fuzzy grouping? Specifically, is there an approximate relation between the number of records grouped and the amount of ram/pagefile required? Also, on the database backend, how big can I expect the tempdb data/log files to get?

View 5 Replies View Related

Fuzzy Lookup / Grouping Design

Apr 7, 2008

Hi,

I need some advice on fuzzy lookup / grouping design.
I have a requirement that, I think, is between lookup and grouping transformations.

In one of our applications, users can enter manually a label for some information in the database.
Every month, I will store all the new data in our OLAP DB, and I want to group these labels with a fuzzy logic.
Historical data (already loaded) have to be grouped, as well as new data coming every month.

I have no predefined canonical data, so Fuzzy Lookup seems not adapted to my pb.
Fuzzy Grouping seems ok, but it would require to put historical data as well as new data as an input of the Fuzzy Grouping Transfo to constitute groups. This seems not efficient to me.

Any clue ?

M.D

View 1 Replies View Related

Keep Duplicate With Highest Score Fuzzy Grouping

Jan 22, 2012

I have recently decided to dedupe my data but i am having a problem after running fuzzy grouping with the query on updating which duplicate to keep

_key_in is unique, _key_out is the duplicates so for example:

_key_in , _key_out , name , score , dedupe
1 , 1 , ron , 10 , purge
2 , 1 , ronn , 15 , keep
3 , 3 , john , 5 , keep
4 , 4 , matt , 15 , keep
5 , 4 , mat , 10 , purge
6 , 4 , matt , 15 , purge

I want to keep the _key_out with the higher score by setting the field de_dupe to 'keep' and the remainder to 'purge'. The score can also be the same within a duplicate so in the case it is the same i just need to keep one it doesnt matter which one. The query i have below nearly works but it marks duplicates with the same score as keep.

Code:
UPDATE b
SET b.dedupe_result = 'keep'
FROM
[BusinessListings].[dbo].[MongoOrganisationACTM1Destination] b
INNER JOIN

[Code] ....

View 2 Replies View Related

Fuzzy Grouping Seemingly Corrupting Data

Jan 10, 2007

I've seen one other post on this topic from October 2005 and I thought I'd bring it up again. I've a Fuzzy Grouping component in my data flow. The output data from it appears to be the result of records spliced into other records. This includes pass-through columns, not merely "clean" or similarity columns. For example (I've added the suffixes for illustrative purposes):

AddressLine1_in: 162 OAKMONT
AddressLine1_out: 162 OAKMONTLAMINATION INC

CityStateZip_in: Alexander, AR 72002-8539
CityStateZip_out: Alexander, AR 72002-8539116-7066

These are just pass-through columns, although "used" columns are seeing something similar (below.) Any others with this experience?

City_in: Alexander
City_out: Alexandertle Rock

View 1 Replies View Related

Fuzzy Grouping - First Name Similarities; Bill = William, Etc...

Aug 14, 2007

Hello,

I was wondering how Fuzzy Grouping deals with and handles first name similarities. Is there a way to configure it so that Anthony = Tony, Bill = William, etc€¦? I created a simple package with several rows containing similar first names and ran the fuzzy grouping on the first name column. I received only one possible duplicate of Will = William which was at 56%. I lowered the threshold down to 1% and still only one match.

Now I understand and appreciate the reasons for this but was wondering if this type of situation was considered and a way of dealing with it is available.

Thanks,
Beac

View 3 Replies View Related

Fuzzy Lookup And Grouping Training Dataset?

Mar 2, 2008


Hi All,

Is there a way the fuzzy lookup or grouping can be trained so that similarities and confidence values rely on previously matched strong links?

For example: I can link 80% of my two datasets using one strong identifier (say phone #) which I trust. My goal then, is to use the probability of matching of the rest of my linking fields (say Name,Address,Gender,DOB) in a "matched by phone number" pair to train a fuzzy lookup task to be done on the unlinked 20% of the datasets.

This "training set" would in theory influence the similarity and confidence values of the fuzzy output since each linking column would carry a different weight or contribution towards a confident match.

Does anyone out there knows how to do this in practice in SSIS?

View 1 Replies View Related

Fuzzy Grouping: Any Success With &&> 3 Million Records?

May 18, 2006

I have tried to process > 3 million Fuzzy grouping records on two different servers with no success. 3 mill works but anything above 4 mill doesn't. Some background:

We are trying to de-dup our customer table on: name (.5 min), address1 (.5 min), city (.5 min), state (exact). .8 overall record min score.
Output includes additional fields: customerid, sourceid, address2, country, phonenumber
Without SP1 installed I couldn't even get a few hundred thousand records to process
Two different servers - same problems. Note that SSIS and SQL Server are running locally on both
The higher end server has 4GB RAM, the other 2.5 GB RAM. Plenty of free disk space on both
SQL Server is configured to use 2 GB of RAM max
The page file is currently at 15GB

After running a number of test on both servers trying different batch sizes etc. the one thing I noticed is that it seems to always error out when SSIS takes over and starts chewing up all the available RAM. This happens after the index is created and SSIS starts "warming caches". On both servers SQL Server uses up about 1.6GB of RAM at this point while SSIS keeps taking over RAM until all physical RAM is used up.

Some questions:

Has anyone been able to process more then 3 million records and if so what is your hardware configuration?
Should we try running SSIS from a different server so it has access to the full amount of physical RAM? (so it doesn't have to fight for RAM with SQL Server)
Should we install Win 2003 Enterprise Server so we can add more RAM?
Any ideas why switching to the page file might be causing errors?

Thanks!!

Keith Doyle





View 17 Replies View Related

Fuzzy Grouping Transform Corrupts Pass-through Data

Aug 2, 2005

We are working with a client and are using Fuzzy Group transform for de-duping, and hierarchy creation for a national account list.

View 4 Replies View Related

Fuzzy Grouping Matching Nulls To Empty Strings/spaces

May 30, 2007

Will the fuzzy grouping task match a null value to an empty string (or spaces)? I've got 5 columns I'm matching on, and one of them may be null for certain rows but an empty string for others. Given the 4 other columns may match, will this difference stop similar columns being grouped together?



(Someone's modified my grouped data since it was deduped, which takes a while, and I'm hoping for a quick answer on this).



Thanks in advance.

Ben

View 3 Replies View Related

Integration Services :: Error Running A Fuzzy Grouping Process

May 26, 2015

I have a table that I need to identify similarities so I'm running a Fuzzy Grouping Process. I'm getting the follow errors and I can't identify the problema since all the fields are varchar, except for the first that is int but not use in the fuzzy.

select
MSSEndCustomerTPID
, orgname
, address1
, cityname
, statename
, countryname
from [sales].[vw_Fact_VolumeSales] a
inner join [GMOFBI].[dbo].[vw_Dim_MSS_Organization] b
on a.EndCustomerOrganizationKey=b.MSSOrganizationKey

[code]...

View 3 Replies View Related

Similarity Searching

Nov 8, 2007

This started in one thread, but since it was for beginners, I didn't want anyones brain to melt...

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=92096

I said this:

quote:
This is getting into a deeper topic beyond just sounding alike.

I am trying to create a similarity search so I can clean up our database... it isn't going to be easy, that's for sure.

Here in lies my data problem, an example:

Let's take something simple, like, gee, a university. A good one is UCLA - University of California, Los Angeles. Do you have any idea how many variations of that can exist on a database? A LOT! UCLA, U.C.L.A., UC,LA, Univ. Cal. LA., the list is literally endless. The same applies for Penn State... PSU, Penn State, P-State... not as many, but you get the idea.

The goal is to be able to bring those similar names back together.

I have been researching the Smith-Waterman algorithm, but they are all rather slow... and prepping a database with a gram based index is just painful... it could take hours to put that together.

For those that don't know, gram is a part of a word. For example, "University," would be stored as UN, NI, IV, and so on... that's a 2 byte gram, a 3 byte gram would be UNI, NIV, IVE, etc.

Think of all the databits that need to be compiled... and then maintained!

Here is a good example of how a search can work:

You all watch CSI, right? Great show. Good fun, serious leaps of faith. They run fingerprint searches, and on their computers, it's these trick graphics with images that are matched. That is how the human mind matches prints, but even then, we use the same basic data the computer uses - reference points on the print.

Take a quick look at a finger, and you will see swirls and junctions and things of that nature (I am not going to get to tech on that, but I used to work in identity theft investigations for a police department on the east coast). These points become part of a mathmatical equation that is then converted to a number. That number is stored.

Whenever a new print is to be tested, it is manually referenced for it's points and then the number is created and compared to everything in that index. Bam, 15 seconds later you have a 99.9% match. Very impressive. It will also preset a top 10 match list.

I do have a question: what is the nature of the similarity searching in SQL 2005?


To save from reading that other thread I hi-jacked, Kristen was kind enough to reply with this:

quote:
Our clients use organisations that specialise in de-duping data for this kind of fuzzy-matching


Which I replied with:

quote:
yea, I was afraid of that.

We are out to create a model that will allow for data quality in real time. For example, a new client comes into our system, let's say from a trade show.

They fill out an info card and we take the contact data and add it to our CRM.

For example, they fill out the card with this info:

John Smith
Some Huge Technical Company
123 Main Street
Anytown, NY 17999 USA

Well... Our search model would take that info and pass it by our RDBMS and return a top ten hit list of matches.

Some Huge Technical Company can become SHTC or Some Huge Tech Co., or any other combo. But because it looks similar, we can then apply the new contact, Mr. Smith, to the right firm, instead of creating three or four (or more) versions of Some Huge Technical Company.

That kinda mass fuzzy matching is easy, we need something that keeps us from needing to do mass updates on a regular basis.

So far, we have determined that a separate database server will be required. It would contain the metrics needed to narrow the search down to the point where the search won't take 30 seconds. We are resisting the idea of grams, simply because of the overhead needed... a single address could create 400 entries in a database table.

I have found a couple of products, but I am terrorified about pricing... I think coding will be my best solution. Amazon listed a couple of very interesting looking books, pricy, yes, but I suspect cheaper than buying the technology.


And finally, Kristen's final reply:

quote:
I did a fair amount of work to just try to match "new accounts".

We looked at Telephone number (unique, so a high indicator if it matches, but then we found lots of addresses had the same "agency phone number" )

Then ZIP code (in the UK our PostCode generally relates to < 20 properties), then lines of address - mixing them around to try to get State / Street matches even if they were switched around in the Address fields, or someone entered an extra address line - like "4th floor" as the first address line.

We had a copy of the "accounts" table that we had cleaned up a bit. Removed all trailing spaces and punctuation. Also all embedded punctuation converted to "space" and adjacent spaces removed, so:

"10, The High Street,"
"10 The High Street"

We made abbreviations consistent:

"10 The High St"
"10 The High Street"

and we probably took out and "the" and other noise words.

Then we tried matching based on that "sanitised" version.

But it took DAYS AND DAYS - of iterative processing - "Gee, look, here's yet-another-variation-of-rule-X" ...

That really tee-d be off, I don't have the stomach for any single job that takes "days" ...

Just in case any of that gives you any ideas

Kristen


The gist:

I need to know what SQL Server 2005 offers in regards to SimSearching. Also, are there any OTC products that will interface with VB.NET 2005 - and not cost a mint.

Thanks!!!

View 2 Replies View Related

Calculation Of A Degree Of Similarity Of Phrases

May 19, 2004

Hi!
i make The Extended stored procedure for MS SQL Server which
Calculation of a degree of similarity of phrases.

Purpose:
One of the most complex and important problems for the developer and the operator of a database is maintenance of uniqueness names in the most important references of system.
Offered function can be used in SQL inquiry as criterion of sorting of the directory according to similarity with a required phrase.

Features:
· incredibly high speed of data processing
· Unique algorithm analyzing similarity of phrases even at significant divergences in required phrases
· is not required installation of additional libraries to each client - library DLL must benn installed only on a server.
· Result is all the table sorted in decreasing order phonetic similarity (probably use of operator TOP for sample only the limited quantity of the most similar variants)
· Use of user server function supposes use in Stored procedures, Views and any SQL expressions
· Spends a minimum of server memory

--------------------------------------------------------------------------
If the decision of the given problem is interesting to you and there is an opportunity desire and an opportunity to test http://kozin1.narod.ru (http://kozin1.narod.ru/newsite/index.html?english.htm)
Dll and sample scripts in rar archive (2,5 KB)
I thank in advance

View 2 Replies View Related

SQL Server 2008 :: Compare Up To 9 String Variables For Similarity?

Mar 6, 2015

We're converting to new student info system. Sometimes registrar entered the same school into the schools table but spelled it differently. Trying to find all student assigned transfer credits from the same school but the school name is different. My db shows a max of 9 different schools students have rec'd transfer credits. Spending too much time trying to figure out best way to do it w/o a ton of IF stmts. Looking at Soundex and Difference functions. Still looks like a lot of coding. how to compare up to 9 string variables in sqlserver 2008?

View 2 Replies View Related

Full Text Indexing :: Document Similarity Search

Jan 28, 2008

Hi

I have a Full Text index on a table with an image field that is successfully indexing .doc, .pdf and .rtf files.

Keyword searching this is no problem.

What i want to be able to do is perform a similarity search. by this i mean pass in a Key_ID (documentID) and have the database return a list of Key_IDs (documents) which are similar.

By similar i mean contain mostly the same keywords in roughly the same quantities

Thanks

View 3 Replies View Related

Fuzzy Search - Exposing SSIS Fuzzy Capabilities Outside Of SSIS?

Apr 15, 2008



I've been looking into ways to accomplish a fuzzy search and SSIS makes that possible if I want to do a bulk import or something like it. But what it I just want to look stuff up at any given time not haveing to run the package?

Is it possible to expose the fuzzy lookup outside of SSIS to for example t-sql?

Here's an example:
I want to lookup the music artist "Notorious BIG" but in the database it is "Notorious B.I.G." if I use the SSIS fuzzy lookup I basically get what I'm looking for. But how would I call this from a web application? So then I tried Full text search but this doesn't really work out as well.

Will I have to re-write the logic that the fuzzy lookup uses to enable it to work? i.e. using Full Text Indexes and FreeTextTable, ContainsTable, SoundEx and the like to somewhat even come close to what the Fuzzy Lookup has?

View 6 Replies View Related

Query Or Grouping Problem (some Kind Of Parallel Grouping?)

Nov 26, 2007

I'm really stumped on this one. I'm a self taught SQL guy, so there is probobly something I'm overlooking.

I'm trying to get information like this in to a report:

WO#
-WO Line #
--(Details)
--Work Order Line Detail #1
--Work Order Line Detail #2
--Work Order Line Detail #3
--Work Order Line Detail #etc
--(Parts)
--Work Order Line Parts #1
--Work Order Line Parts #2
--Work Order Line Detail #etc
WO#
-WO Line #
--(Details)
--Work Order Line Detail #1
--Work Order Line Detail #2
--Work Order Line Detail #3
--Work Order Line Detail #etc
--(Parts)
--Work Order Line Parts #1
--Work Order Line Parts #2
--Work Order Line Parts #etc

I'm unable to get the grouping right on this. Since the line details and line parts both are children of the line #, how do you do "parallel groups"?

There are 4 tables:

Work Order Header
Work Order Line
Work Order Line Details
Work Order Line Requisitions

The Header has a unique PK.
The Line uses the Header and a Line # as foreign keys that together are unique.
The Detail and requisition tables use the header and line #'s in addition to their own line number foreign keys. My queries ends up looking like this:

WO WOL WOLR WOLD
226952 10000 10000 10000
226952 10000 10000 20000
226952 10000 10000 30000
226952 10000 10000 40000
226952 10000 20000 10000
226952 10000 20000 20000
226952 10000 20000 30000
226952 10000 20000 40000
399999 10000 NULL 10000
375654 10000 10000 NULL
etc


Hierarchy:
WO > WOL > WOLD
WO > WOL > WOLR

It probobly isn't best practice, but I'm kinda new so I need some guidance. I'd really appreciate any help! Here's my query:

SELECT [Work Order Header].No_ AS WO_No, [Work Order Line].[Line No_] AS WOL_No,
[Work Order Requisition].[Line No_] AS WOLR_No, [Work Order Line Detail].[Line No_] AS WOLD_No
FROM [Work Order Header] LEFT OUTER JOIN
[Work Order Line] ON [Work Order Header].No_ = [Work Order Line].[Work Order No_] LEFT OUTER JOIN
[Work Order Line Detail] ON [Work Order Line].[Work Order No_] = [Work Order Line Detail].[Work Order No_] AND
[Work Order Line].[Line No_] = [Work Order Line Detail].[Work Order Line No_] LEFT OUTER JOIN
[Work Order Requisition] ON [Work Order Line].[Work Order No_] = [Work Order Requisition].[Work Order No_] AND
[Work Order Line].[Line No_] = [Work Order Requisition].[Work Order Line No_]

View 1 Replies View Related

SQL Calculations

May 18, 2007

Hello all. I am trying to do a calculation within an SQL script, however it doesnt seem to be working and i'm a little bit lost. If anyone could shed some light on where i'm going wring it would be much appreciated. The code I have is:




select
EMPLOYEE.EMPLOY_REF AS EDIT_REF,
SV_EMPLOYEE_CURRENT_HOLIDAY.ENTITLEMENT,
SV_EMPLOYEE_CURRENT_HOLIDAY.CARRIED_FWD,
SV_EMPLOYEE_CURRENT_HOLIDAY.TAKEN,
SV_EMPLOYEE_CURRENT_HOLIDAY.REMAINING,
SV_EMPLOYEE_CURRENT_HOLIDAY.SOLD,
SV_EMPLOYEE_CURRENT_HOLIDAY.PURCHASED,
SV_EMPLOYEE_CURRENT_HOLIDAY.ENTITLEMENT + SV_EMPLOYEE_CURRENT_HOLIDAY.SOLD - SV_EMPLOYEE_CURRENT_HOLIDAY.PURCHASED AS TOTAL_ENTITLEMENT
from
EMPLOYEE
left outer join
SV_EMPLOYEE_CURRENT_HOLIDAY
on
EMPLOYEE.EMPLOY_REF = SV_EMPLOYEE_CURRENT_HOLIDAY.EMPLOY_REF
where
EMPLOYEE.EMPLOY_REF = = 027



Incidentaly SV_EMPLOYEE_CURRENT_HOLIDAY is a view which currently exists.

Thanks in advance people.

View 2 Replies View Related

T-SQL Tiem Calculations

Jul 1, 2007

In order to find out if an event is late or not I need to do some time calculations in SQL as a Stored procedure.
 I have a DateTime variable called Due
I also have an Allowance variable which is an integer and is an extra allowance for that day and a third variable Now which is set with GETDATE()
If I compare Now to Due I can decide if the task is late or not - but I need to take itno account the Allowance.
I tried :
IF @Due + (@Allowance /24) < @Now ......
 However I find that @Allowance/24 always equates to zero so this doesn't work.
I'd appreciate any advice.
 Regards
Clive

View 2 Replies View Related

Calculations In A Datagrid?

Mar 21, 2008

Hello,
I ran into a little problem. My problem is: i need to substract 2 variabeles from 2 different tables in the database 



TitleTimes left todayTimes left


My first excercise!15


My second excercise!19


The fields times left are a calculation... the number of times that the admin entered minus a count in the table scores.
Has anyone an idea how i can solve this?
An example excercise would be great!
Thanks in advance

View 5 Replies View Related

DateTime Calculations

Jun 5, 2008

I am attempting to construct a SELECT statement which incorporates some variables.  The variables begin life as strings (not String objects) looking like :"6/08/2008" and "06/10/2008" for example.  The first is a start date which was retrieved using an AJAX calendar object and the second is an end date retrieved in the same manner.  My records are all timestamped by MS SQL (2003) including the clock time.  I am stumbling on the syntax.  "CallStartTime"  is the record's timestamp.  The "TraversalString" is something else but I am not attacking that yet.  Can anyone make a suggestion or two?
SELECT count(*)FROM RealTime WHERE CallStartTime >= '@starttime' AND CallStartTime <= '@endtime' AND TraversalString LIKE '%1.0%'

View 2 Replies View Related







Copyrights 2005-15 www.BigResource.com, All rights reserved