Can Terms Lookup Take Into Account The US/english Spelling Differences ?
Feb 15, 2007
Is it possible for the terms lookup function to manage the differences between US and english
spelling ? For example if I search for the terms "color" and "categorization", I'd would like that the terms lookup also count the "colour" and "categorisation" occurences in the text.
I am designing a ssis package,This is intends to mine text data(Data extracted from websites). Term lookup/Term extraction has been used as tools for mining. I have lookup terms defined with me for reference table,but the main problem lie in extracting the nearby text/number/charcters to these lookup terms during mining. For example : I found noun "Email" 200 (frequency score) times in my text,Now I want to extract nearby email address(this is also true for PhoneNumber,Address attributes also).so how can I achieve this with SSIS. If u have some idea/suggestion to carry out this challenge with or without Term Extraction/Term Lookup,plz do write here.
Hi, I need to categorize a lot of html or text files according to a list of terms and I wonder if terms lookup is adequate for this. The problem is that terms lookup can only take an Oledb source as input. My files can be up to 80 Kb big and aren't columns structured.
Should I import my files in a table ? But if so, how can I import a column with more than 8000 characters ?
During install of SQL Server 2005, we can of course use a domain account or the built-in system account for running the services. I lean toward domain for obvious reaons but would like to know a +/- to each option and why I'd choose one over the other and what consequences or limitations one may encounter if I choose one over the other.
There are several terms using ms Server that I don't know and cannot find in my books. Does MS provide that, besides BOL where it is difficult to find good explanations or even find definitions?
thx,
Kat
ps. would be a nice feature if they don't have it currently.
Hopefully im asking this in the right place, sorry if its not, maybe you could point me in the right direction
I have been informed that use of MDF Files (SQL Server Express) Databases on the net was restricted as this was classed as multi connections and therefore was outside the free license agreement.
I am looking at commercially developing and marketing a web based system for with a relatively small database footprint (well under 1gb) with ASP.NET 2.0 and like the look of SQL Server Express.
Could anyone clear up whether or not this is allowed under the SQL Server Express terms of use, or point me in the direction of somewhere i can find information.
I have a table that contains 10 millions records. The following 2 statements, which one provide better performance? Frankly, i have no idea how to compare the execution plan...
I have been running a script in SQL Server 2000 as sa also as a Active Directory user who has administrator rights (I tested both approaches SQL Server then Windows Authentication) in Query Analyser which grants execute rights to the stored procedures within the database instance and Query Analyser does not give any errors when I run the script. I have made sure that each transaction has a go after it. I then return to Enterprise Manager, check the rights (I apply them to roles so that when we create another SQL Server user we just grant him/her rights to the role) and discover that the role has not been granted the rights. I seems to be occurring only with 2 of the procedures. Is there a known bug that might be causing this?
basically, what I am trying to achieve to 2 types of search functions...
Search for All terms (easy and complete) and search for Any Terms...
the way I have gone about this so far is to in my asp.net app, split the search string by spaces, and then search for each word, and merging the resulting dataset into the main return dataset.
this, however has a few problems. the result dataset will contain duplicate values, and i am running queries in a loop.
What i am looking for is a one-stop-shop stored procedue that will split the search string, loop through each word, and add the results to a return table, ONLY if it does not exist already within the return table.
Can anyone point me in the right direction... basically with the splitting of the string and the looping through the words...the rest i think i can handle...
or any other hints/tips/tricks would also be helpful.
I have several DTS jobs that runs well as a job with my nt login account for the SQL agent service startup account, but if I use the System account they fail with this error. " Error opening datafile: Access is denied. Error source: Microsoft Data Transformation Services Flat File Rowset Provider"
The data has change access to the System account under the NT security.
Basically a dts package has been setup that pulls in data from another companies server, this data requires to be on-demand i.e individual users can pull in updates of the data when they require it.
I am using xp_cmdshell and dtsrun to pull in the data. This obviouly works fine for me as i am a member of sysadmin.
Books online quotes " SQL Server Agent proxy accounts allow SQL Server users who do not belong to the sysadmin fixed server role to execute xp_cmdshell"
So i went to the SQL Server Agent Properties 'Job System' tab and unchecked 'Non-sysadmin job step proxy account' and entered a proxy account.
The proxy account has been setup as a Windows user with local administrator privilages and even a member of the sysadmin server role - just in case.
Now when i log onto the db with my test account - a non-sysadmin - and attempt to run the stored proc to import the data i recieved the message 'EXECUTE permission denied on object 'xp_cmdshell', database 'master', owner 'dbo' '
hmm... so basically i have either misunderstood BoL or there is something not quite right in my setup.
I have search the net for a few days now and yet i can find no solution.
First and foremost, thanks for reading and responding!Does it matter how big a stored procedure is if you do things in the stored procedure such as:declare the parametersIF @Parm_Select = '<ALL>'do a select IF @Parm_Select <> '<ALL> and @Parm_Report = '1'do a selectIF @Parm_Select <> '<ALL> and @Parm_Report = '2'do select This goes on and on and on and I have written a couple of stored procedures that are about 1500 lines of code based upon parameters passed I do not create any tables - they are just all select statements based upon the parameters passed.I thought I was doing the right thing cause I did not want to have to write a procedure that called a procedure, (I read this and got confused on the return prarmeters cause there is a lot of data being returned from the select ----- I don't think I said that correctly! . I am just learning this SQL stuff and I it is cool and I am excited - but I don't want to develop any bad habits in the beginning - and I try to look these things up on the www - but I just don't get explicit answers from reading all of this stuff. Thank to all in advance!
We did some "at scale" fuzzy lookup tests today and were rather disappointed with the performance. I'm wanting to know your experience so I can set my performance expectations appropriately.
We were doing a fuzzy lookup against a lookup table with 25 million rows. Each row has 11 columns used in the fuzzy lookup, each between 10-100 chars. We set CopyReferenceTable=0 and MatchIndexOptions=GenerateAndPersistNewIndex and WarmCaches=true. It took about 60 minutes to build that index table, during which, dtexec got up to 4.5GB memory usage. (Is there a way to tell what % of the index table got cached in memory? Memory kept rising as each "Finished building X% of fuzzy index" progress event scrolled by all the way up to 100% progress when it peaked at 4.5GB.) The MaxMemoryUsage setting we left blank so it would use as much as possible on this 64-bit box with 16GB of memory (but only about 4GB was available for SSIS).
After it got done building the index table, it started flowing data through the pipeline. We saw the first buffer of ~9,000 rows get passed from the source to the fuzzy lookup transform. Six hours later it had not finished doing the fuzzy lookup on that first buffer!!! Running profiler showed us it was firing off lots of singelton SQL queries doing lookups as expected. So it was making progress, just very, very slowly.
We had set MinSimilarity=0.45 and Exhaustive=False. Those seemed to be reasonable settings for smaller datasets.
Does that performance seem inline with expectations? Any thoughts to improve performance?
I'm working with an existing package that uses the fuzzy lookup transform. The package is currently working; however, I need to add some columns to the lookup columns from the reference table that is being used.
It seems that I am hitting a memory threshold of some sort, as when I add 3 or 4 columns, the package works, but when I add 5 columns, the fuzzy lookup transform fails pre-execute:
Pre-Execute Taking a snapshot of the reference table Taking a snapshot of the reference table Building Fuzzy Match Index component "Fuzzy Lookup Existing Member" (8351) failed the pre-execute phase and returned error code 0x8007007A.
These errors occur regardless of what columns I am attempting to add to the lookup list.
I have tried setting the MaxMemoryUsage custom property of the transform to 0, and to explicit values that should be much more than enough to hold the fuzzy match index (the reference table is only about 3000 rows, and the entire table is stored in less than 2MB of disk space.
Say I want to lookup a value in another dataset, but there is a grouping that requires you to know what the values for each level is in order to get to the correct detail record. Can you still use the lookup function with more than one field to compare against? So for example
Department \___SalesPerson \___Measure
I want to be able to add a new row at the Measure level, but lookup each field from another dataset. In order to do that I will need the Department AND SalesPerson values to do the lookup, but I dont think the Lookup function will let us do that will.
Hi there,BOL notes that in order for replication agents to run properly, theSQLServerAgent must run as a domain account which has privledges to loginto the other machines involved in replication (under "SecurityConsiderations" and elsewhere). This makes sense; however, I waswondering if there were any repercussions to using duplicate localaccounts to establish replication where a domain was not available.Anotherwords, create a local windows account "johndoe" on both machines(with the same password), grant that account access to SQL Server onboth machines, and then have SQL Server Agent run as "johndoe" on bothmachines. I do not feel this is an ideal solution but I havecircumstances under which I may not have a domain available; mypreliminary tests seem to work.Also, are there any similar considerations regarding the MSSQLSERVERservice, or can I always leave that as local system?Dave
Actually this is in regard to SCD Type 2 Dimension, Scenario is like that I am moving Fact table from some old source and I have dimensionA description value in fact which I want to replace with appropriate id from Dimension Table and that Dimension table is SCD Type 2 based on StartDate and EndDate and Fact Table doesn't contains direct date value rather there is timeId in Fact so to update the value in Fact table I have to Join Time Dimension table and other Dimension Table to replace fact Description with proper Id.
I am doing a lookup that requires mapping 2 columns in the column mapping section. When I do this, I get the error "Row yielded no match during lookup" . The SQL that I captured in SQL profiler does find the record when I run it in Management Studio. I have already tried trimming everything to no avail.
Why is this happening?
I tried enabling memory restrictions but then I my package hangs and I get a SQLDUMPER_ERRORLOG.log file with the following logged:
I have a situation that I have discovered in our QA database that I need to resolve. When I looked at the Activity Monitor for our server, I discovered that a process is running under a domain user account for one of our .Net applications. The problem is that that domain user account has not been created as a SQL login account on the server. I am trying to figure out how someone can log in to the database server with a domain user account that has not been added to SQL Server as a login account.
Does anyone have any insight on this? I don't like the idea of someone being able to create domain account that can access the database without me granting them specific access.
I have a Conditional Split with 3 outputs. On the first output I have a lookup, when I execute the package I have 56 rows going through the Conditional Split, all rows are then going to the 2nd and 3rd output but the lookup on the first output generates an error "Row yielded no match during lookup".
I don't understand why the lookup is generating an error while there is no row going through it.
Hi, Im installing a software which use SQL Server 2000 in english, but currently I have in all computers another program that use a ODBC for SQL 2000 in Spanish.If I install the components for the client of SQL in English then the another application got an error ODBC somehting related with the driver.
So there are any way to have a client for use SQL in spanish and english at the same time.So I mean I got two server one in spanish one in english, and just one computer client with twwo applications.
Hello all,We are developing an add-on for GoldMine, however my problem is ageneric SQL Server problem. The situation is this; we have a database,with collation set to SQL Latin. When we connect to database viaGoldmine, Enterprise Manager or Delphi (through ADO) we cannot seeTurkish characters. When I set SQL Server machine's "Language ForNon-Unicode Programs" setting to Turkish, GoldMine works fine, howeverthere is still problem in EM and Delphi.Does anybody know how to solve this issue?
Hi,We are developing a small web interface to a local ERP software, whichuses SQL Server 2000 as database. The database uses SQL_Latin1_CP1collation, and the fields are varchar (not nvarchar), however, the mainprogram inserts and reads non-English (Turkish) characters into thesecolumns. However, when we connect to database with ADO.NET, thesecharacters are not read correctly. (The situation is same when I checktables with Enterprise Manager and Query Analyzer)In a past situation (which was about a Win32 application), I have heardabout character conversion behaviour of ADO (and many other DBlibraries) and solved that problem using BDE instead of ADO, so thatthe connection is made via DB-Library instead of OLEDB.But this way cannot be applied to my ASP.NET situation, and there is NOway to change database collation. Must I use a ADO.NET property, or useanother provider, or maybe another library? Any advices? Thanks...
I have developed a tool to allow project developers to easily re-create the entire schema for our base product. The current issue involves setting the correct collation for the customers' region. Our brother company in Germany uses the same db creation tool and scripts, and we here in the US also have customers in South America. My ultimate question is "what subset of collation names would be necessary to provide the project developer?" I could query the database to get all the collation names, but I think it was around 1000 names. Can I query to get a smaller subset of the most relevant collation names?
I have this dumb problem with dates in my mobile application. The problem is when saving a short date in English format ex 20/08/2006 the sql server 200 windows CE Edition display an error stating
THERE IS AN ERROR IN THE DATEPART FORMAT. [,,,Expression,,]
When changing the date in American format the problem is solved ex 08/20/2006
The problem is that I require the dates in English Format only!