How To Pick Nearby Text Of Lookup Terms With Help Of Term Extraction/Term Lookup
Oct 4, 2007
I am designing a ssis package,This is intends to mine text data(Data extracted from websites).
Term lookup/Term extraction has been used as tools for mining.
I have lookup terms defined with me for reference table,but the main problem lie in extracting the nearby text/number/charcters to these lookup terms during mining.
For example :
I found noun "Email" 200 (frequency score) times in my text,Now I want to extract nearby email address(this is also true for PhoneNumber,Address attributes also).so how can I achieve this with SSIS.
If u have some idea/suggestion to carry out this challenge with or without Term Extraction/Term Lookup,plz do write here.
SQL Server Data Mining comes with "Term extraction" and "Term Lookup" for phrase detection. Rather than using the GUI tool, how to utilize these two features in coding? Please assist! Thanks!
Although this SP intends to sorround a search text in double quotes, it seems that when called from Management Studio it throws a Syntax Error even before entering the SP.
createproc fts(@t nvarchar(1000)=null) as begin
select @t = '"' + @t + '"'
select @t
select * from dbo.products where CONTAINS(name, @t)
end
GO
exec fts @t = 'my product name'
GO
Msg 7630, Level 15, State 3, Procedure fts, Line 4
Syntax error near 'product' in the full-text search condition 'my product name'.
-------------------------
If I pass the string in double quotes I get a different error:
exec fts @t = '"my product name"'
Go
Msg 7630, Level 15, State 3, Procedure fts, Line 4
Syntax error near 'my' in the full-text search condition '""my product name""'.
Now, if I remove the quotes again and make the original call:
I've just started using the SSIS and i would like to know if it's possible to change or update the dictionary of the term extraction tool. That's important to me because i may have to look for words that don't exist in the defaut english dictionary of the tool.
when I extract nouns from a text with the Term Extraction Transformation, the destination indeed keeps the correct nouns but they are in a random order.
Is it possible to keep the order of these nouns and maybe already keep double occurrences?
Is there a way to perform term extraction on each row in a table, and have the term extraction denote the frequency of the term on a per row basis rather than a per table basis?
Otherwise, is there a way I can take extracted terms and apply a sql function that returns the occurance of that term in each row?
Hello, I want to search a column with all the words deliminate by underscore. E.g. User_id, Community_name, author_id and etc. It seems like freetext only deal with string with blank deliminator. How should I do the rull text search on column like this? Here is the code.declare @var varchar(2000) set @var = 'id' select [name], definition,version_code from dbo.base where freetext([name],@var) thx
Hi, I need to categorize a lot of html or text files according to a list of terms and I wonder if terms lookup is adequate for this. The problem is that terms lookup can only take an Oledb source as input. My files can be up to 80 Kb big and aren't columns structured.
Should I import my files in a table ? But if so, how can I import a column with more than 8000 characters ?
Is it possible for the terms lookup function to manage the differences between US and english spelling ? For example if I search for the terms "color" and "categorization", I'd would like that the terms lookup also count the "colour" and "categorisation" occurences in the text.
what use reason of 'weighted-term' ?explain it. SELECT ID, firstname, lastnameFROM [contain-1]WHERE CONTAINS(firstname, 'ISABOUT(mohsen weight(.8),yaser weight(1.0))') table [contain-1] information: ID FIRSTNAME 1 mohsen 2 mohsen 3 yaser 4 mehdi
We did some "at scale" fuzzy lookup tests today and were rather disappointed with the performance. I'm wanting to know your experience so I can set my performance expectations appropriately.
We were doing a fuzzy lookup against a lookup table with 25 million rows. Each row has 11 columns used in the fuzzy lookup, each between 10-100 chars. We set CopyReferenceTable=0 and MatchIndexOptions=GenerateAndPersistNewIndex and WarmCaches=true. It took about 60 minutes to build that index table, during which, dtexec got up to 4.5GB memory usage. (Is there a way to tell what % of the index table got cached in memory? Memory kept rising as each "Finished building X% of fuzzy index" progress event scrolled by all the way up to 100% progress when it peaked at 4.5GB.) The MaxMemoryUsage setting we left blank so it would use as much as possible on this 64-bit box with 16GB of memory (but only about 4GB was available for SSIS).
After it got done building the index table, it started flowing data through the pipeline. We saw the first buffer of ~9,000 rows get passed from the source to the fuzzy lookup transform. Six hours later it had not finished doing the fuzzy lookup on that first buffer!!! Running profiler showed us it was firing off lots of singelton SQL queries doing lookups as expected. So it was making progress, just very, very slowly.
We had set MinSimilarity=0.45 and Exhaustive=False. Those seemed to be reasonable settings for smaller datasets.
Does that performance seem inline with expectations? Any thoughts to improve performance?
I'm working with an existing package that uses the fuzzy lookup transform. The package is currently working; however, I need to add some columns to the lookup columns from the reference table that is being used.
It seems that I am hitting a memory threshold of some sort, as when I add 3 or 4 columns, the package works, but when I add 5 columns, the fuzzy lookup transform fails pre-execute:
Pre-Execute Taking a snapshot of the reference table Taking a snapshot of the reference table Building Fuzzy Match Index component "Fuzzy Lookup Existing Member" (8351) failed the pre-execute phase and returned error code 0x8007007A.
These errors occur regardless of what columns I am attempting to add to the lookup list.
I have tried setting the MaxMemoryUsage custom property of the transform to 0, and to explicit values that should be much more than enough to hold the fuzzy match index (the reference table is only about 3000 rows, and the entire table is stored in less than 2MB of disk space.
Hi, Can you search a column of a database table to find all the rows that have a wor in it? example: I have a row that contains 'adventure st north', there are other columns in that table that are suppose to be the same but read 'adventure street N.' or 'adventure st. N.' or 'North Adventure st.' could I search for rows that contain 'adventure' in the column searched (lets call it columnA). I tried: select * from tbl_test where columnA LIKE 'adventure' and got no results. what is the way to do this?
I'm not really sure how to explain this, so please bear with me. I have a SQL statement, such as: SELECT TOP (10) FROM chartTracks This works with SQL Server Express 2005, but when I moved my site over to work with a MSSQL Server 2000, the statement had to be changed in order for it to work: SELECT TOP 10 FROM chartTracks I was just wondering if there was a technical term for this, and if possible, the locations of anymore sourecs of information regarding the above? I'm just writing a report and would like to include this, if possible. Thanks in advance!
Just learning full-text searching in SQL Server 2005 and have questions about the proximity term "near".
1. How near is near? Measured in characters, words, or whatever? 2. How do you know? Is this documented? Can't find it anywhere. 3. Can it be adjusted? at design time? at runtime?
I have used a program called Sonar which has powerful proximity options that allow the user to specify proximity in terms of words at runtime. Would like to be able to do that but can't find much on "near" in the documentation other than it seems to relative, provides for left and right nearness, and allows for chaining of multiple search terms.
Say I want to lookup a value in another dataset, but there is a grouping that requires you to know what the values for each level is in order to get to the correct detail record. Can you still use the lookup function with more than one field to compare against? So for example
Department \___SalesPerson \___Measure
I want to be able to add a new row at the Measure level, but lookup each field from another dataset. In order to do that I will need the Department AND SalesPerson values to do the lookup, but I dont think the Lookup function will let us do that will.
I have a field called URL in my table. I want to get the SEARCH TERM from a given URL and create a report based on that information. I'm getting difficulties, because the URL have different format depending up on the search engine that the users use to browse. Some of the search engines are "google",".excite.com", "search.msn.","search.netscape", "search.lycos", "altavista", "search.yahoo" and many more.
Examples of the URLs from google :
http://www.google.com/search?q=S26+Collet+Chuck&hl=en&client=firefox-a&rls=org.mozilla:en-USfficial&start=30&sa=N -- The search term is S26 Collet Chuck http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=GGLG,GGLG:2006-02,GGLG:en&q=kt21+kia -- The search term is kt21 kia http://www.google.com/search?hl=en&q=Slagger+burning+Tables -- The search term is Slagger burning Tables
Does anybody have a sql query or used a CLR functions to get the SEARCH TERM from different search engine (URL).
THIS IS THE ONE.TXT FILE Customer called to complain that the ice maker on her fridge has stopped working model XXYY-3 Door to refrigerator is coming off model XX-1 Ice maker is making a funny noise XXYY-3 Handle on fridge falling off model XXZ-1 Freezer is not getting cold enough XX-1 Ice maker grinding sound fredge XXYY-3 Customer asking how to get the ice maker to work model XXYY-3 Customer complaining about dent in side panel model XXZ-1 Dent in model XXZ-1 Customer wants to exchange because of dent in door model XXZ-1 Handle is wiggling model XXZ-1 Customer happy with us. Best fridge yet!
i created the table term_result(term_id varchar2(50)); ( termid ==xxyy-3 like)
now i want to find the no of times repeat the xxyy-3 posted queries
ERROR : DT_NTXT OR DT_WSTR TYPES ONLY ALLOWS HERE ERROR I AM GETTING SO
Flat File Source-----------> Data Conversion --------->Term Lookup -------->Oledb data source one.txt DT_stR I choosen error occur here
SO WHAT IS THE DATA TYPE I HAVE TO GIVE FOR THAT MATCHING LOOKUP
Actually this is in regard to SCD Type 2 Dimension, Scenario is like that I am moving Fact table from some old source and I have dimensionA description value in fact which I want to replace with appropriate id from Dimension Table and that Dimension table is SCD Type 2 based on StartDate and EndDate and Fact Table doesn't contains direct date value rather there is timeId in Fact so to update the value in Fact table I have to Join Time Dimension table and other Dimension Table to replace fact Description with proper Id.
I am doing a lookup that requires mapping 2 columns in the column mapping section. When I do this, I get the error "Row yielded no match during lookup" . The SQL that I captured in SQL profiler does find the record when I run it in Management Studio. I have already tried trimming everything to no avail.
Why is this happening?
I tried enabling memory restrictions but then I my package hangs and I get a SQLDUMPER_ERRORLOG.log file with the following logged:
I got a problem, i'd like to update my DB in production in terme of design and data from the DB test. In other words, I added many tables to my design.
Therefore, i'd have to update the real DB on the server. Is there a way to do so with MSSQL? without doing it manually.
I'd like to get some ideas for the following: I am writing a quick mini-application that searches for records in a database, which is easy enough. However, if the search term comes up empty, I need to return 10 records before the positon the search term would be in if it existed, and 10 records after. (Obviously the results are ordered on the search term column) So for example, if I am searching on "Microsoft", and it doesn't exist in my table, I need to return the 10 records that come before Microsoft alphabetically, and then the 10 that come after it. I have a SP that does this, but it is pretty messy and I'd like to see if anyone else had some ideas that might be better. Thanks!
hi, I have a question regarding calling sql table columns dynamically? workflow would go as:1. user enters search term into a textbox2. user checks a checkbox to search by column in sqldb (eg.. firstname or surname) pseudo sql would go like......SELECT +%column1(checkbox1.value)%+ OR +%column2(checkbox2.value)%+ OR +%column3(checkbox3.value)%+WHERE column1 = +%TextBox.Text%+ OR column2 = +%TextBox.Text%+ 3. display results in gridview my sql needs to improve greatly so any code insight(good book or link) would be terrific . thanks
Hi, I test the following sql statement, finding that an error ocurs:
Msg 7630, Level 15, State 2, Line 3 Syntax error near '"' in the full-text search condition '"dsg SDRGDG " OR "sdfsdfsdfsdafdsafdsfds'.
DECLARE @searchTerm NVARCHAR(40) SET @searchTerm = '"dsg SDRGDG " OR "sdfsdfsdfsdafdsafdsfdsafdsafdsafsafdfdsafdf"'; SELECT [JobTitle], [JobDes], [OpenDate], j.[URLRef], c.[CompanyName], c.[URLRef], c.[URLSource] FROM JobWanted AS j INNER JOIN Company AS c ON c.CompanyID = j.CompanyID WHERE CONTAINS((JobTitle, JobDes), @searchTerm)
It seems too lengthy string will cause an error for full-text engine. I find the sdfsdfsdfsdafdsafdsfdsafdsafdsafsafdfdsafdf is truncated as shown in error message.
How to avoid this issue? Could I configre this limination?
I have a Conditional Split with 3 outputs. On the first output I have a lookup, when I execute the package I have 56 rows going through the Conditional Split, all rows are then going to the 2nd and 3rd output but the lookup on the first output generates an error "Row yielded no match during lookup".
I don't understand why the lookup is generating an error while there is no row going through it.
In a calculated column I am trying to get a scalar text value from a lookup to another table. This works quite well when getting numerical values with the following formula:
But as soon as I substitute the numerical column by a string column, #error results.
I also want to mention that the above query yields only one row as a result. It should be simple to return the value of one of the columns but after searching for quite some time, I could not find any function for that.
I am currently designing a SSIS package to integrate data into a data warehouse fact table. This fact table has about 70 columns among which 17 are foreign keys for dimension tables.
To insert data in that table, I have to make several transformations and lookups. Given the fact that the lookups I have to make are a little complicated, I have about 70 tasks in my Data Flow. I know it's a lot, but I can't find a way to make it simpler. It seems I really need all these tasks.
Now, the problem is that every new action I try to make on the package takes a lot of time. At design time, everything is very slow. My processor is eavily loaded each time I change a single setting in one of the tasks, and executing the package in debug mode takes for ages. If I take a look at the size of my package file on disk, it's more than 3MB.
Hence my question : Are there any limitations in terms of number of columns or number of tasks that can be processed within a Data Flow ?
If not, then do you have any idea why it's so slow ?
Does any kind person have a simple example Package for undertaking 'Text Extraction' on one or two columns of text data in a SQL Server table (the data is not in Unicode)?
I want to lookup values from a database into another database both of which are in the same sql server 2000. One databases is called GamingCommissiondb the other is called LicensingActions I need some of the tables to communicate with each other, to look values from one another. Example I need the Termination table to look up values from the Revocations table. Would using LinkedServers suffice??