Fuzzy Matching In Access - How To Use
Jan 7, 2007
Many have had questions on how to use this for their own purposes.
Link to the original post:
http://www.access-programmers.co.uk/forums/showthread.php?t=103279
Download at: http://www.kdkeys.net/forums/thread/6450.aspx
Here is how you can use it - I provide this example:
Tables and queries can be created in the MDE database.
Create a table with known good reference strings. I created this one - REF_LIST.
It has one field, REF_STRING (Text) with a length of 50, and indexed (No Duplicates). The field length can be set to a length that suits your requirements.
This is the content:
REF_STRING
Claw Hammer
Cold Chisel
Monkey Wrench
Nail Gun
Create another table with strings to match. I created this one - TEST_LIST.
It has one field, TEST_STRING (Text) with a length of 50, and indexed (Either No Duplicates or Duplicates Ok depending on the data). The field length can be set to a length that suits your requirements.
This is the content:
TEST_STRING
Claw Hamer
Claw Hammr
Clew Hammer
Clw Hammer
Cold Chisil
Cold Chisle
Cold Chissel
Cole Chisel
Monkey Wrnech
Monkie Wrench
Monky Rench
Nail Gn
Nail Gunn
Naill Gun
Nial Gun
Then create another table for the results. I created this one - RESULTS.
It has four fields, REF_STRING (same properties as in table REF_LIST), TEST_STRING (same properties as in table TEST_LIST), MATCH_VALU (Single, Fixed, 2 decimal places), and GOOD_MATCH (True/False).
This is the content from the results of the Match_Lists query.
REF_STRINGTEST_STRINGMATCH_VALUGOOD_MATCH
Nail GunNial Gun0.92No
Monkey WrenchMonky Rench0.94No
Monkey WrenchMonkie Wrench0.94No
Claw HammerClew Hammer0.94No
Cold ChiselCold Chisle0.94No
Cold ChiselCold Chisil0.94No
Cold ChiselCole Chisel0.94No
Nail GunNail Gn0.95No
Monkey WrenchMonkey Wrnech0.95No
Nail GunNail Gunn0.96No
Nail GunNaill Gun0.96No
Claw HammerClaw Hamer0.96No
Claw HammerClaw Hammr0.96No
Claw HammerClw Hammer0.96No
Cold ChiselCold Chissel0.97No
SQL from the Match_Lists query:
INSERT INTO RESULTS ( REF_STRING, TEST_STRING, MATCH_VALU )
SELECT REF_LIST.REF_STRING, TEST_LIST.TEST_STRING, IsSimilar([REF_STRING],[TEST_STRING]) AS Expr1
FROM REF_LIST, TEST_LIST
WHERE (((IsSimilar([REF_STRING],[TEST_STRING]))>0.79));
Using this example you can populate the two tables, REF_LIST and TEST_LIST with strings that you need to compare and run the Match_Lists query.
The GOOD_MATCH field in the RESULTS table is for you or another human to determine if anything questionable is a good match for your purposes.
If it is found that any match with a value of at least .95 is a good match then an update query could be created to update the GOOD_MATCH field with true for all those with a value of >= .95.
Then a select query could be created to look at those matches that do not have a GOOD_MATCH to determine if they may be good matches.
Naturally the two tables may need a unique ID for the strings for better tracking and comparing.
If so, create them and have them appended to the RESULTS table as well in the Match_Lists query.
OpnSeason
View Replies
ADVERTISEMENT
Mar 5, 2006
Hello All,
For those who are interested in Approximate String Matching or those who could use these algorithms; I have a complete
suite of Approximate String Matching algorithms written in Visual Basic in an Access database.
In 2004 I decided to jump into the world of Fuzzy Matching with both feet.
As it is, I am working for a company that deals with names, addresses, etc. very intensely. It is a fair sized company
that uses Access on a grand scale. Since I am an Access programmer, I work in an Access gold mine!
I knew that if I could get a good handle on Fuzzy Matching, that when I hit the right person at the right time, the
company could greatly benefit from my research on Fuzzy Matching. The right time and the right person are not here yet.
Nevertheless, since I have reaped much free source code and information from the Web, it is now time to return the
favor.
I developed a package that is sort of a demo/tutorial on Approximate String Matching algorithms in Access that is very
robust in Fuzzy Matching. It would overtax the post in this forum for me to include it in a post.
To summarize, it works with the basic name - Last, First, and Middle. It has a user interface that allows a user to
type in what would be a good name and what would be a questionable name to resemble the good name. The weighted results of all the various algorithms can be chosen, or an individual algorithm can be chosen to display how closely the names match.
In addition, it has a table of 17,295 known good names with unique ID numbers as a reference table, and table of 1200
morphed names that are typical of names entered in a database with no input conventions. These morphed names have typos, transpositions, variations on maiden names, etc. 1200 good names were selected for alteration and the unique ID of each original good name was stored in the table with the altered names to determine the accuracy of the matching
process.
The morphed names were compared to the known good names in a query with an approximate join using the suite of
algorithms to determine match percentage. The altered names, the ID number of the original good name, the ID number of the name it matched to, and the match percentage were stored in a results table to determine the results of the matching run.
These tables were used to test and tweak the algorithms by comparing the morphed names with the known good names. The results of 1322 names were saved to a results table with match scores.
The matching process was executed in a query with an approximate join using the suite of algorithms.
The match results:
Total Approximate Matches: 1188
(Recall) Precision Pct: 99.00%
Total Unmatched Names: 12
Unmatched Pct: 1.00%
Total Other Matches: 134
Other Matches Pct: .77%
The tables are accessible in the database, so anyone can run their own tests. The interface is set up to accommodate this as well.
The algorithms used: Dice coefficient as a threshold algorithm, Levenshtein Distance algorithm, Longest Common
Subsequence, and the DoubleMetaphone. The names were passed to the algorithms by way of the bigram model.
I will email it to anyone who requests it.
It is in two platforms, Office 97 and Office 2000 as FuzzyMatching97.zip (692 KB) and FuzzyMatching2k.zip (721 KB).
The zip files include ApprxStrMatchingEngine97.pps or ApprxStrMatchingEngine2k.pps respectively, StrMatching97.mde or StrMatching2k.mde respectively, IEEESoundexV5.pdf, and VBAlgorithms.txt.
IEEESoundexV5.pdf is an abstract about Approximate Sting Matching that fired my curiosity about the subject, and
pertains to the package.
VBAlgorithms.txt contains the entire suite of algorithms in Visual Basic extracted from the MDB modules.
The PowerPoint presentations describe the workings of the MDE and give a good overview of Fuzzy Matching.
View 14 Replies
View Related
Mar 4, 2006
Hello All,
For those who are interested in Approximate String Matching or those who could use these algorithms; I have a complete suite of Approximate String Matching algorithms written in Visual Basic in an Access database.
In 2004 I decided to jump into the world of Fuzzy Matching with both feet.
As it is, I am working for a company that deals with names, addresses, etc. very intensely. It is a fair sized company that
uses Access on a grand scale. Since I am an Access programmer, I work in an Access gold mine!
I knew that if I could get a good handle on Fuzzy Matching, that when I hit the right person at the right time, the company could greatly benefit from my research on Fuzzy Matching. The right time and the right person are not here yet.
Nevertheless, since I have reaped much free source code and information from the Web, it is now time to return the favor.
I developed a package that is sort of a demo/tutorial on Approximate String Matching algorithms in Access that is very
robust in Fuzzy Matching. It would overtax the post in this forum for me to include it in a post.
To summarize, it works with the basic name - Last, First, and Middle. It has a user interface that allows a user to type in
what would be a good name and what would be a questionable name to resemble the good name. The weighted results of all the various algorithms can be chosen, or an individual algorithm can be chosen to display how closely the names match.
In addition, it has a table of 17,295 known good names with unique ID numbers as a reference table, and table of 1200
morphed names that are typical of names entered in a database with no input conventions. These morphed names have typos, transpositions, variations on maiden names, etc. 1200 good names were selected for alteration and the unique ID of each original good name was stored in the table with the altered names to determine the accuracy of the matching process.
The morphed names were compared to the known good names in a query with an approximate join using the suite of algorithms to determine match percentage. The altered names, the ID number of the original good name, the ID number of the name it matched to, and the match percentage were stored in a results table to determine the results of the matching run.
These tables were used to test and tweak the algorithms by comparing the morphed names with the known good names. The results of 1322 names were saved to a results table with match scores.
The matching process was executed in a query with an approximate join using the suite of algorithms.
The match results:
Total Approximate Matches: 1188
(Recall) Precision Pct: 99.00%
Total Unmatched Names: 12
Unmatched Pct: 1.00%
Total Other Matches: 134
Other Matches Pct: .77%
The tables are accessible in the database, so anyone can run their own tests. The interface is set up to accommodate this
as well.
The algorithms used: Dice coefficient as a threshold algorithm, Levenshtein Distance algorithm, Longest Common Subsequence, and the DoubleMetaphone. The names were passed to the algorithms by way of the bigram model.
I will email it to anyone who requests it.
It is in two platforms, Office 97 and Office 2000 as FuzzyMatching97.zip (692 KB) and FuzzyMatching2k.zip (721 KB).
The zip files include ApprxStrMatchingEngine97.pps or ApprxStrMatchingEngine2k.pps respectively, StrMatching97.mde or StrMatching2k.mde respectively, IEEESoundexV5.pdf, and VBAlgorithms.txt.
IEEESoundexV5.pdf is an abstract about Approximate Sting Matching that fired my curiosity about the subject, and pertains to the package.
VBAlgorithms.txt contains the entire suite of algorithms in Visual Basic extracted from the MDB modules.
The PowerPoint presentations describe the workings of the MDE and give a good overview of Fuzzy Matching.
To match is divine....
View 10 Replies
View Related
Aug 22, 2006
Hi all,
I've got two vast tables of data which I need to link, however the field unique to each was, at it's source, a typed field, and as such both have errors, typos, formatting problems, known deviations etc.
An example would be something like this:
Table1: SFOC0912JB3
Table2: F0CO9I2JB3
(These are harware serial numbers for what it's worth). I could do with creating a link between the two tables which would return a true based on a number of possibilities, such as:
Match if:
- String matches with prepended 'S', 'C' and/or
- String matches with substituted 'I' and '1' in any or all positions and/or
- String matches with substituted 'O' and '0' and/or
etc.
I think Levenshtein had the right answer from what I've been reading, but I haven't yet found an implementation for access (freely) available.
Any ideas?
Thanks,
Alex
View 1 Replies
View Related
Jun 1, 2005
I have a form that has code tied to the 'on open' event that is going to be accessed by users where we want them to only have access to certain fields which we want them to fill out. The fileds that will be locked will change based on the field called 'Item Number'. The code will be long because there are 30 different Item Numbers and about 10 to 20 fields that we will disable based on the Item Number. The code is like:
Dim Item_Number As String
If Me.Item_Number = "32000" Then
Me.Batch_Lot_Number.Enabled = False
End If
This is all great except that the disable makes the field kind of obscure by the color it gives it. I don't want to use the lock property because that doesn't give you a visual clue that its locked.
Is there a way to change the color of the field background using VBA?
View 5 Replies
View Related
Apr 14, 2005
I need to link two tables on the Name Field. The trouble is that the names are not enterred the same in each table, so I can't do a direct = comparison.
For instance, one table might have "The Heart Center of Indiana", while the other has "Heart Center of Indiana" Or one might have "St. John's Medical Center" and the other has "St Johns Medical Center" (or, god help me, "St John's Hospital")
My only thoughts are somehow building a matching rank by saying that 85% of the characters in "The Heart Center of Indiana" match "Heart Center of Indiana". There are thousands of names in each list, and I would very much not like to have to manually try to spot them.
I doubt there is a direct solution to my problem, so any tips on how I can make a translation table is aoppreciated.
Thanks,
David
View 4 Replies
View Related
Feb 2, 2005
Hello,
My problem is rather complicated and I am not sure if Access is even capable of addressing it. As I said, this is a bit tricky and I understand if no one is willing to tackle it, however, I would really appreciate it if someone could tell me it is impossible if that is the case. Thanks in advance.
I have attached a table to better explain my dilemma.
I would like to use the information in the “Category”, “Range Start” and “Range Stop” fields to generate new identifiers for each record in the table. The simplest criteria would be to assign the same novel identifier to two records if they have the same values in all of “Category”, “R. Start”, “R. Stop” (This is the case for the first two records.).
I am able to use this approach but would much rather use a more sophisticated set of criteria. Specifically, to be assigned the same ID two(or more) records must:
A) Belong to the same category.
B) Their ranges must overlap by more than (x)(Where x is some amount of overlap). E.g. Records 3 and 4 should be assigned the same ID because their categories are the same and their ranges overlap by 333 (13222-12889).
C) Finally, if A is satisfied and B is not then the then records could be assigned the same identifier if the difference between their ranges is less than some value (y). E.g. Records 5 and 6 should be grouped because A) is satisfied and their ranges are only 5 apart -> (119-300)…(305-700)
There is no limit to the number of records that may be assigned the same identifier, provided they satisfy criteria A+B or A+C.
Many thanks,
Matt
View 12 Replies
View Related
Dec 22, 2004
A very elementary question - but I'd be grateful for an answer.
I have two tables (or perhaps two queries) each with a key field. If all is well, there should be complete correspondence between the two sets of records. That is, if there's a record with key 12345 in one table, there should also be a record with key 12345 in the other table.
I'm looking for the simplest way of checking whether or not this is the case, and, if it's not, detecting which records in one table are unmatched by any record in the other.
Will
View 1 Replies
View Related
Sep 7, 2005
Hello,
I am using this query to get matching data:
SELECT NewMyEstartChild.yordob, NewMyEstartChild.firstname, NewMyEstartChild.surname, NewMyEstartChild.postcode
FROM NewMyEstartChild INNER JOIN For2003 ON (NewMyEstartChild.postcode = For2003.POSTCODE) AND (NewMyEstartChild.yordob = For2003.MyDOB);
but It's giving me more data so to minimise data I was thinking to add another filed. which is Firstname field. but some children's name are spell incorrectly.
How can I match data with first letter only ? I have tries following query but it doesn't work. Please help!!!!!!!!
SELECT NewMyEstartChild.yordob, NewMyEstartChild.firstname, NewMyEstartChild.surname, NewMyEstartChild.postcode
FROM NewMyEstartChild INNER JOIN For2003 ON (NewMyEstartChild.postcode = For2003.POSTCODE) AND (NewMyEstartChild.yordob = For2003.MyDOB) AND (NewMyEstartChild.Firstname = For2003.Firstname);
In last part of this query (NewMyEstartChild.Firstname = For2003.Firstname); How can I get the name of children whose firstname's letter is similar.
Thank you
Viral
View 3 Replies
View Related
Nov 15, 2005
My problem is this:
I have a large table with about (8000+) records and a smaller table (2000+) records.
The large table has been exported from an ACT! database.
The smaller table has 4 fields that i need to add to the larger table, and then i need to import the updated records back into the Act! database.
I created a simple select query and matched the tables with the only 2 criteria that match the 2 tables, this was "Company" and "PostCode".
This should have been ok, but instead of updating 2000+ records it only updated 1000. The reason for this is because some of the company names weren’t an exact match, "company ltd" and "company limited" etc.
If i just linked "postcode" to "postcode" there are quite a few different companies that have the same postcode.
Is there a way of trying to match just the first 5 characters of the company name but leaving the "company" field intact?
Or is there a different way to go about this?
Thanks
Darren
View 3 Replies
View Related
May 9, 2006
Im trying to work something out on Access at the moment to score some brownie points with my boss and am hoping someone will be able to help me. Im relatively confident about using access but when it comes to tricky queries i get a bit confused. Basically my situation is that i need to do some matching. Using a PO number and a unique ID and updating a column in one of my files with the unique ID. I have done it this way so far....
Linked the the two files together and matched them on the PO number and then updated the field with the ID where they matched. This seems to work ok, but the problem is that about 10 of the PO numbers contain between 2 to 6 different ID's. Therefor how can i make it so that if there is a PO number 6 times in the file it will match with all the ID's. I dont think that this is to hard to do, but each line has a different cost and they have to be matched to the write one. The problem with this is the cost is normally different as it flucuates with the exchange rate. I can only think that the best way to do this is to use a function that looks at the cost and if its say $20.00 more or less then assume its that. But have no idea how to implement it.
Does that make sense? Is it likely that its going to be easy to do. Im relatively ok with SQL if it would be easier to use that.
If anyone has any suggestions it would be greatly appreciated...
Thanks :)
View 4 Replies
View Related
Jun 19, 2007
Hi All,
Wondering if you can help I have a table called "example" which has field "a" "b" "c" "d"...
I then create another table called "importtable" with field "a" (imported in from Excel)...
I would then like to create a query which matches any and shows all the records which I imported in from Excel to my current table called "example" I looked at joining via "relationships" using a Select Query but it doesn't quite show what I'm after...
Should add I would like Field "a" in both tables to show but only if the number exists in the "importtable" if not then don't show..
Any tips :confused:
View 1 Replies
View Related
Aug 6, 2007
We have two databases that I am tryin to match it one variable, we get it to to match and take from the first database and enter it in the second but I was wondering how I can get a report on the ones that didnt match. Im sorry if this is a simplistic problem but I am kind of new to Access and didnt know where else to turn to help.
View 2 Replies
View Related
Oct 24, 2007
I have two tables that have fields set to a text so that the ClientID is their name.
When I query, my queries don't take into account the case. So "K Smith" is the same as "K SMITH" as "k sMIth"
I am trying to write an unmatched query between two tables based on this ClientID but it will turn up no unmatched because it is not taking into account the case.
Any suggestions on how to match the cases, othere then changing the table?
Thanks.
View 1 Replies
View Related
May 3, 2005
Hi. I am in the process of loading nursing license numbers into my database. The spreadsheet that I am importing from does not use the exact same names as the ones in my database, i.e. Smith, Deb in my database is Smith, Debora in the spreadsheet and I can't figure out the code or procedure to use to tell the database that these names are actually for the same record. Is it possible to do this and if so, how?
Thank you!
KellyJo
View 3 Replies
View Related
Feb 23, 2007
Hey all
I have a function to display the opening hours for the next 7 days for a client. I have also included the functionality whereby you can create a "special day" with non - standard opening times which are stored in the database.
I have a simple SQL to check to see if the current day is a special day:
Code:SELECT * FROM SpecialDays WHERE SpecialDate = #" & currentDate & "#"
I have run into this problem tho. There is a special date in the database for 03/01/2007 (UK date format 3rd January) but this is being retrieved for the 1st of March (01/03/2007)
therefore this query:
Code:SELECT * FROM SpecialDays WHERE SpecialDate = #01/03/2007#
retrieves the record with the date 03/01/2007
Any ideas?
Thanks
View 1 Replies
View Related
May 4, 2007
I have three tables: tblProducts1, tblProducts2 and tblProductSales.tblProducts1Code CostABC 20BVC 35ABC 30tblProducts2Code CostABC 10BVC 55ABC 20tblProductSalesCode RevABC 70BVC 25ABC 20BVC 15DCC 33I want to produce a query that looks like this:Code Rev Cost ProfitABC 90 80 10BVC 40 90 -50DCC 33 0 33How can I do this?Thanks,Jon
View 1 Replies
View Related
Oct 21, 2007
Hi,I have two tables with these set of data:Table1Filed1 Field21000 A1001 B1002 C1003 D1004 ETable2Filed1 Field21000 A1002 C1003 D1005 F1006 GI need to create 3 Tables with following out put.1. Data that are common to both Table1 & Table22. Data that are in Table1 but do no exist in Table23. Data that are in Table2 but do not exist in Table1 Can anyone help me to find the answer please.CheersBud
View 1 Replies
View Related
Aug 6, 2005
I have a database called LettersDatabase this databse holds all the letters that have been made including the path to the doc. I use SSN to ID the letters to customers on the Contacts Database.
Contacts database also uses the SSN to id the contacts
I have a form that creates new letters for customers in this form I have listbox that queries the LettersDatabase for all matching records based on the forms contact SSN to see how many letters have been made for that customer.
The problem is that my listbox only shows the first record matching that SSN but there are more records in that LettersDatabase with the same SSN that I need to have diplayed on the listbox as well.
I may be writing the query incorrectly.
Here what I have for the query on the listbox
Like[Forms]![LetterMaker]![txtSSN]
I try adding (&"*") to the end of the query but that does not help.
If anyone out there has the solution to this problem it would be greatly appreciated
Thank you
View 3 Replies
View Related
Feb 27, 2006
Hi can anyone please help me out. How can I delete records from one table, where matching in a second table?
View 1 Replies
View Related
Nov 13, 2006
In a database I have two tables, one is linked to an excel sheet (our customers order) and the other is created via a "make-table query" from our business system.
In both tables I have the customers part numbers and neither contain a primary key.
What I need to do is compare the part numbers in both tables to find if a part number is present in the order but not in our business system.
I just cannot figure out how to do this.
Any ideas are greatly appreciated
/twallstr
View 2 Replies
View Related
Jul 8, 2005
Hello,
I am new to Access and have a question in regards to the combo box function. I have it setup I think correctly but the problem I have is how its storing the data.
What I have done is I have a Table that I created with a field that has set responses that someone can pick from when using the forms to put in data. I then have the combo box to store the answer into another field in that same table. What is happening is when a answer is selected and stored into the seperate field it only puts in the answer field a number.
IE
My Options are:
Day 1
Day 2
Day 3
Day 4
If someone to use the pull down menu and choose Day 1 it would put a 1 for their answer. I would like to see if there is a way to where if you chose Day 1 for an option it would put the name in the answer field. I hope that I am explaining this correctly. Any help would be greatly appreciated.
Thanks
View 2 Replies
View Related
Dec 18, 2004
I have 3 tables that all contain Car registration numbers.
Table 1 contains just Reg numbers. Table 2 contains Registration numbers with an additional 2 columns of data. Table 3 also contains Registration numbers with an additional 2 columns of data.
I need to compare the reg numbers in Table 1 with Tables 2 and 3 and where the same Reg number appears in either of Table 2 or 3 display the results in a new table / query.
ie, Table1 Reg, Table2 data , Table3 data Note there are some Reg numbers that will appear in all 3 tables.
any help appreciated.
Malc
View 2 Replies
View Related
Oct 2, 2005
Hello,
I have a form with buttons on them, every button opens a new form.
I have used an ID number to match the data for each form to the main.
How can I make the ID wich is an auto number automatically be entered in the new form when the button is pushed.
Thanks
View 1 Replies
View Related
Jun 16, 2013
I have a table with sales in (TBL_Sales) as well as another (TBL_Key_Customers) which lists information about specific customers, in particular if they are part of a group e.g I would categorise Dave's Cars, Dave's Bikes and Dave's Coaches as being part of the Dave group.I would like to query the TBL_Sales to see how many sales were made to the Dave group but then also what else was sold. e.g if Factory 1 sold 100 items of which 60 went to Dave's group then the remaining 40 would be shown as "Other".
View 4 Replies
View Related
Jun 4, 2014
I have two tables, one has two fields:student ID and student name.its kind of like this:
1 Mark
2 Tom
3 Franklin
the other table has three fields: student name, student classes, it goes like this:
Mark calculus
Mark Biology
Tom Statistics
Franklin Calculus
Tom Chemistry
what I want is for the second table to have its related id from the first table so it could be like this:
1 calculus
1 biology
2 statistics
3 calculus
2 chemistry
I cant simply make find and replace because the records are a lot is there another way should I relate the tables of something how will it work?
View 8 Replies
View Related