Fulltext Large (500.000) Count Query Performace Too Slow
Feb 14, 2008
Hi,
I am with the response time for a simple count on a fulltext search that is too slow.
Even using the most simple query on a good server (64 bit Dual Opteron 4GB Ram with high speed 16 raid disk storage)):
select count(*) from content_books where contains(searchData,'"english"')
Takes 4 seconds to count the avg 500.000 resultsI have removed all the joins with real table data so that the query is only inside the fulltext engine..
I would expect this to be down to 4 milli seconds. Isn't it just getting the size of the "english" word result index?
It seems the engine is going through all the results because if a do a more complex search that returns less results the performance is better.
Any clues of how to do this faster? I never read the thousands of records BUT i need to count them...
I currently have a large table (35 million rows, over 80GB). I have one varchar(max) column on the table that is used in the fulltext index.
To query the complete index is fast, for example:
SELECT 'ipod', COUNT(*)
FROM CONTAINSTABLE(MyDB.dbo.Contents, [Body], 'ipod') CT
This took 70 seconds (which I can live with). However, I seldom run queries like this, most are more like:
SELECT 'ipod', COUNT(*)
FROM CONTAINSTABLE(MyDB.dbo.Contents, [Body], 'ipod') CT
JOIN Pages ITP ON ITP.PageID = CT.[Key]
JOIN Feeds ITF ON ITP.IPID = ITF.IPID
JOIN Buyers ITB ON ITB.IBID = ITF.IBID
WHERE ITB.ID IN (1342,246)
These queries are much slower (this example took 17 minutes). I understand that FT searches the index and returns all rows that match the query to SQL. SQL then performs the joins and counts only the correct results. (Correct me if I'm wrong here).
One solution I've seen to this to put data or "tags" into the FT column - so my Body column would become something like:
'{ID:1342}' + [Body]
That sounds like a very good idea. I could then change the 2nd query above to be:
SELECT 'ipod', COUNT(*)
FROM CONTAINSTABLE(MyDB.dbo.Contents, [Body], '("ID:1342" OR "ID:246") AND "ipod"') CT
That all works well until I want to select 1000 different ID's because the FT query will become very long and complex. Also I'm only including one column (ID) in this example - but I have about 7 or 8 columns that I would need to include in these "tags". Quering multiple columns become very complex quickly and no doubt I will reach a query limit at somepoint.
If anyone has any other suggestions to the above I'd love to hear them. Another thought I'm having is to partition the table. I can find very little online about how FT behaves on partitioned tables - I fear it behaves exactly the same, what I'd like to think is that I could partition the table on an ID say 100 per partition or something, and then fulltext would only search the relevant partitions. If it behaves like this it may work. If no-one knows then I'll give it ago, but this will take me a while due to the table size - so I'm hoping one of you clever lot know!
I have tried many things and even the worst case thing which I was trying to avoid i.e. uninstalled MSDE SP3 and Analysis Services and service packs but the problem is still not solved...
Whenever I open a DTS in design view and double click on the link (the line connecting the two) between the source and destination servers the PC goes to sleep and comes back after a long time and then the same problem occurs when I press on the transformation tab...
I know this may sound weird but that really is the case :o
I have tried many things and even the worst case thing which I was trying to avoid i.e. uninstalled MSDE SP3 and Analysis Services and service packs but the problem is still not solved...
Whenever I open a DTS in design view and double click on the link (the line connecting the two) between the source and destination servers the PC goes to sleep and comes back after a long time and then the same problem occurs when I press on the transformation tab...
I know this may sound weird but that really is the case :eek:
I am asking this question on behalf of a friend. I have little knowledge of SQL 2005 but my friend is quite knowledgeable, although this is the first time he is dealing with large database for a client. So here's the story.
His client has a database containing 1.5 million books. Now he is setting up a website which will enable users to search books. Searching by ISBN is no problem as it only takes 1 seconds. The problem is, searching by Title takes more than 20seconds, which is unacceptable. My friend has only done smaller database and he just recently thought of implementing indexing and now looking for other ideas.
Each row contains book details such as Title, Author1, Author2, Author3, Publisher, Publication Date, ISBN, etc.
Can anyone who are more experienced in doing large database share with me some design ideas? His client is aiming for 8seconds or less.
I am having performance issues on a SQL query in Access. My query isaccessing and joining several tables (one very large one). The tables arelinked ODBC. The client submits the query to the server, separated byseveral states. It appears the query is retrieving gigs of data from thetable and processing the joins on the client. Is there away to perform moreof the work on the server there by minimizing the amount of extraneous tabledata moving across the network and improving performance (woefully slowabout 6 hours)?
Dear MS SQL Experts,I have to get the number of datasets within several tables in my MSSQL2000 SP4 database.Beyond these tables is one table with about 13 million entries.If I perform a "select count(*) from table" it takes about 1-2 min toperform that task.Since I know other databases like MySQL which take less than 1 sec forthe same taskI'm wondering whether I have a bug in my software or whether there areother mechanisms to get the number of datasets for tables or the numberof datasets within the whole database.Can you give me some hints ?Best regards,Daniel Wetzler
I am Using a Full Text Search Engine to Search Members of My Site so getting a Problem in Paging Concept I Basically want FullText Search Should Return me the Row he has Found I Dont want to write a new Query to get total resulset Please can anyone Help Me
We are inserting into a table, which includes an identity primary key column. When the table gets really large (i.e. 1.5 million records), the performance of the inserts reduce.
I noticed that when we insert into the table an exclusive lock on the table is obtained. Do inserts into tables with identities always lock the table?
Given the table size is unavoidable, does anyone have a suggestion to improve the performance?
I'm working with a table with about 60 million records. This monster is growing every minute of the day as well, by 200,000 - 300,000 records/day. It's 11 columns wide, and has one index on a datetime column. My task is to create some custom reports based on three of these columns, including the datetime one.
The problem is response time. Any query executed on this table takes forever--anywhere between 30 seconds and 4 minutes. Queries such as this one below, as simple as it is, can take a minute or more:
select count(dt_date) as Searches from SearchRecords where datediff(day,getdate(),dt_date)=0
As the table gets larger and large, the response time is going to get worse and worse. Long story short, what are my options to get the speed of queries down to just a few seconds with a table this big? So far the best I can come up with is index any other appropriate columns (of which there is one for sure, maybe two).
Hi,I'm doing a search function for recipe database and have the query:1 SELECT K.RANK, tRecipe.sHeadline, tRecipe.sIngredients, tRecipe.sImagePath 2 FROM tRecipe 3 INNER JOIN 4 FaktaRecipe ON tRecipe.iRecipeID = FaktaRecipe.iRecipe 5 INNER JOIN 6 CONTAINSTABLE(tRecipe, *, 'ISABOUT (chick* WEIGHT(0.2))') AS K 7 ON tRecipe.iRecipeID = K.[KEY] 8 WHERE (FaktaRecipe.iRecipeFakta = 5) 9 ORDER BY RANK DESC I want to return records like 'chicken pie' etc, hence using the wildcard in chick* BUT the wildcard doesn't work! It works fine if I use the whole word 'chicken' but of course a user won't always do that... I am using SQL server 2000. Any ideas? - I'm tearing my hair out! Thanks,Paul
so async cursor population is supposed to create the cursor and return the cursor id quickly, while the server works on async populating the results. For a keyset-driven cursor, SQL Server stores the key sets in tempdb, which it then uses to fetch data for cursor results. Anyway, this works fine for smaller tables, but I'm finding for large result sets, the async cursor population is very slow and indeed seems to approximate synchronous time. The wait stat I get while it is running (supposedly asynchronously) is TRANSACTION_MUTEX.
Example: --enable async cursor exec dbo.sp_configure 'cursor threshold', 0; reconfigure; declare @cursor int, @stmt nvarchar(max), @scrollopt int, @ccopt int, @rowcount int; --example of giant result set set @stmt = 'select * from sys.all_objects o1, sys.all_objects o1';
[code]...
Note that using the SQL "select * from sys.all_objects o1" is much faster than "select * from sys.all_objects o1, sys.all_objects o2". However, if cursor population is async, I'd expect the time to return a cursor id to be similar between the two.
I have a distribution database that has grown over 50g. Distribution cleanup takes over 15 hours to run and there are always very few rows in msrepl_Commands. Usually only a few hundred. Everything is being replicated fine, but I fear a storm coming. why distribution would be so large with so few rows? Also, Ghost Cleanup has been running all week, so I'm thinking maybe the records marked for deletion aren't being deleted?
I have a table with a couple hundred billion records (sql server 2005). When I do a select count(*) from tblx -- it takes this side of forever. Is it possible to count partitions and then add them up to make it faster?
How I could improve the performance for count(*) of this huge table. Note: if the partition idea sounds viable -- what would that look like?
For demonstration I created a fulltext index on table employee in Northwind database.
The following query gives an error:
SELECT * FROM employees WHERE CONTAINS (FirstName, 'Barbe')
Replacing 'Barbe' by 'Barb' or other words it works fine.
The error message is (I have a french version of SQL installed, here the translation: "A clause in the query contains only ignored words" Une clause de la requête ne contient que des mots ignorés)
Language for wordbreak in fulltext index is French and the error happens only with French, with English it works.
I have a task to count distinct records in a big table with roughly 30M records, performance is an important factor. Query is to be written to calculate weekly stats, weekly record number could be as high as 1M.
The actual result is like: ID Policy 350235744Credit Cards 350235744PCI 350235744PCI Audit
So the final number for this particular Policy is 3
I can write the query like:
select count(distinct Incident_id) policy_name from Reporting_DailyDlpDetail Where (year(INSERT_DETECT_TS)=2015) and (month(INSERT_DETECT_TS) =6) and (day(INSERT_DETECT_TS) between 2 and 9) This returns 526254 and costs 11 seconds to complete
or a query like: Select distinct Incident_id, policy_name from Reporting_DailyDlpDetail Where (year(INSERT_DETECT_TS)=2015) and (month(INSERT_DETECT_TS) =6) and (day(INSERT_DETECT_TS) between 2 and 9) This returns 749687 and costs roughly 1 minute to complete.
Result is different from the two queries, I believe the later gives correct number. How can I count the distinct based on a combo?Considering the size of data, what is the best and most efficient way to run the stats calculation against over 30 different scenarios (different policies and alert types) and not timeout?
All,I have a problem regarding SQL Server 2000 SP3,I have SP that calls other SP and it inserts about 30,000 records as atime,in the development environment (MS Windows 2003 Enterprise, 256 RAM,3.0 GHz Intel Processor) takes about 6 seconds to run this SP.But, with the same Software but, 2.6 GHz Intel and 1 GB Ram, it runsvery slow it takes more than 135 Seconds to run,I have read a lot of articles about expanding the SQL Memory and giveit a higher process privilege but, with no use,I don't know where the problem is, do you have any idea about what isthe problem?Thank you in advance,MAG
I am able to run a query which runs FAst in QA but slow in theapplication.It takes about 16 m in QA but 1000 ms on theApplication.What I wanted to know is why would the query take a longtime in the application when it runs fast on SQL server?How should we try debugging it?Ajay
Can anyone give me some guidelines as to when to chose JOINS over returning multiple resultsets in a strored procedure.. For eample, I have two tables, Orders and OrderDetails, which are linked by a primary key field. There can be orders w/o a corresponding record in orderdetails. 1.) I can return all orders and their details using a stored preocedure that has: SELECT o.order_id as OrderId, o.customername, od.order_id, od.orderdate FROM orders AS o LEFT OUTER JOIN orderdetails AS od ON (o.order_id=od.order_id) 2.) I can do the same by returning two results sets in a different stored procedure: SELECT order_id, customername FROM orders SELECT order_id, orderdate FROM orderdetails I think the client processing time for the second option will be slightly less, because the resultset I need to filter will only be as big as the orederdetails table (it will not include records for orders that have no details). Regardless, I think this would only make for a small performance gain, so if Option 1 is better in Database performace, I would probably go with that. I assume the method to choose also depends on table size and # of JOINS. Any guidance would be appreciated. Thanks, Al
I have a openquery query like this Select * from openquery([db2],'Select * from tableA')
To only return and process records for a given date range I changed it to be something like this
Select * from openquery([db2],'Select * from tableA') where datefield > '06/06/2001'
While this works fine, my question is that does it copy all the records from the db2 server to the sql server before filtering them. I think it does. The db2 table will have over 1,000,000 records eventually, and the sql server will use records for a given day/date range only.
When using Named Pipes the issue is no longer there.
On massive batch inserts we sometimes get a long pause at the end of one insert and before begining the next one. Example:
1000 inserts in the same table and then repeat. This will work fine for 3 or 4 iterations, then pause during the 5th iteration for up to 40 seconds and then simply continue.
When this exact same procedure is done using Named Pipes as the connection method this never happens.
While this is happening neither the server or the workstation is doing anything, 0% CPU, 0% network, it just sits there.
All this using the SQL Native Client 2005 and ADO.
I created two tables one is based on partition structure and one is non-partition structure.
File Groups= Jan,Feb.....Dec Partition Functions='20060101','20060201'......'20061201' I am using RIGHT Range in Partition function. Then I defined partition scheme on partition function.
I have more than 7,00,000 data in my database. I checked filegroups and count rows. It works fine.
But When I check the estimation plan time out for query it is same for both partition table and non partition table.
The following query returns a value of 0 for the unit percent when I do a count/subquery count. Is there a way to get the percent count using a subquery? Another section of the query using the sum() works.
Here is a test code snippet:
--Test Count/Count subquery
declare @Date datetime
set @date = '8/15/2007'
select -- count returns unit data Count(substring(m.PTNumber,3,3)) as PTCnt, -- count returns total for all units
(select Count(substring(m1.PTNumber,3,3))
from tblVGD1_Master m1
left join tblVGD1_ClassIII v1 on m1.SlotNum_ID = v1.SlotNum_ID
Where left(m1.PTNumber,2) = 'PT' and m1.Denom_ID <> 9
and v1.Act = 1 and m1.Active = 1 and v1.MnyPlyd <> 0
and not (v1.MnyPlyd = v1.MnyWon and v1.ActWin = 0)
and v1.[Date] between DateAdd(dd,-90,@Date) and @Date) as TotalCnt, -- attempting to calculate the percent by PTCnt/TotalCnt returns 0 (Count(substring(m.PTNumber,3,3)) /
(select Count(substring(m1.PTNumber,3,3))
from tblVGD1_Master m1
left join tblVGD1_ClassIII v1 on m1.SlotNum_ID = v1.SlotNum_ID
Where left(m1.PTNumber,2) = 'PT' and m1.Denom_ID <> 9
and v1.Act = 1 and m1.Active = 1 and v1.MnyPlyd <> 0
and not (v1.MnyPlyd = v1.MnyWon and v1.ActWin = 0)
and v1.[Date] between DateAdd(dd,-90,@Date) and @Date)) as AUPct -- main select
from tblVGD1_Master m
left join tblVGD1_ClassIII v on m.SlotNum_ID = v.SlotNum_ID
Where left(m.PTNumber,2) = 'PT' and m.Denom_ID <> 9
and v.Act = 1 and m.Active = 1 and v.MnyPlyd <> 0
and not (v.MnyPlyd = v.MnyWon and v.ActWin = 0)
and v.[Date] between DateAdd(dd,-90,@Date) and @Date
HelloI have a VB6 application using classic ado (MDAC 2.8) for connectingms sql 2000 server. Application uses a lot of server side cursors. NowI want to switch to ms sql 2005 server but I have noticed very seriousperformance problem. Sql profiler results of execution of followingcommands:declare @p1 intset @p1=180150131declare @p3 intset @p3=1declare @p4 intset @p4=16388declare @p5 intset @p5=22221exec sp_cursoropen @p1 output,N' Select ... from ... where .... orderby ...',@p3 output,@p4 output,@p5 outputselect @p1, @p3, @p4, @p5on sql server 2000:CPU: 234Reads:82515Writes:136Duration:296and on sql server 2005:CPU: 4703Reads:678751Writes:1Duration:4867Both databases are identical, the servers runs on the same machine(Pentium 2,8 Ghz, 2 GB RAM) with only one client connected. On forumsI've read that Microsoft doesn't recommend using server side cursorson sql 2005 but is there any way to increase performance to someacceptable level?thanks in advanceszymon strus
I am new to SQL Server 2000. I am eager to learn what factors/parameters are key for obtaining good retrieval performance of SQL Server 2000 (prompt response to user query).
I recall that someone told me that a recordset with asOpenStatic cursor type has higher speed than that of a recordset with other cursor types.
Is this true or false. Are there really some key parameters for perfomance tuning .
I have an update query running which to just now has been running for 22 hours running on two tables 1 a lookuptable that has just been created within the batch the other a denormalised table for doing data analysis on
the query thats causing teh problem is
--//////////////////////////////////// this is the one thats running
Print 'Update Provider 04-05 EmAdmsCount12mths : ' + CAST(GETDATE() AS varchar) GO Update Provider_APC_2004_05 set EmAdmsCount12mths = (Select COUNT(*)-1 from Combined_Admissions where ((Combined_Admissions.NHSNumber = Provider_APC_2004_05.NHSNumber) or (Combined_Admissions.PASNUMBER = Provider_APC_2004_05.PDDISTNO)) and (Combined_Admissions.AdmDate BETWEEN DateAdd(yyyy,-1,Provider_APC_2004_05.AdmDate) AND Provider_APC_2004_05.AdmDate) AND Combined_Admissions.AdmMethod like 'Emergency%')-- and -- CA.NHSorPrivate = 'NHS')) FROM Provider_APC_2004_05, Combined_Admissions
any help in improving speed would be most welcome as there are 3 more of these updates to run right after this one and the analysis tables are almost double the size of this one
I have a VB6 application using classic ado (MDAC 2.8) for connecting ms sql 2000 server. Application uses a lot of server side cursors. Now I want to switch to ms sql 2005 server but I have noticed very serious performance problem. Sql profiler results of execution of following commands:
declare @p1 int set @p1=180150131 declare @p3 int set @p3=1 declare @p4 int set @p4=16388 declare @p5 int set @p5=22221 exec sp_cursoropen @p1 output,N' Select ... from ... where .... order by ...',@p3 output,@p4 output,@p5 output select @p1, @p3, @p4, @p5
on sql server 2000:
CPU: 234 Reads: 82515 Writes: 136 Duration: 296
and on sql server 2005:
CPU: 4703 Reads: 678751 Writes: 1 Duration: 4867
Both databases are identical, the servers runs on the same machine (Pentium 2,8 Ghz, 2 GB RAM) with only one client connected. On forums I've read that Microsoft doesn't recommend using server side cursors on sql 2005 but is there any way to increase performance to some acceptable level?
However, as you can see, the original select query is run twice and joined together.What I was hoping for is this to be done in the original query without the need to duplicate the original query.
This sounds like a pretty easy one. I have a SQL 2000 database with 2-3.4GHZ CPUs and 1GB of RAM. I have one database on it. I go in Query Analyzer on another machine and run a simple query like 'SELECT * FROM USERS' which should return 15,000 rows.
IT takes 30 (thirty) seconds to finish this query. OMG
Where do I start to decipher why on Earth this takes more than .01 seconds?
Hi, I have a query which has suddenly started responding slow. CAn anyone tell me what could be the possibilities? I tried update stats(I am on sql 70-though it's done auto but i did it manually again) I used union all in place of union but had no big effect.any othe thought? Thanks!
I have a query that takes minutes to execute, even through there are about 300,000 records are being processed. I would appreciate any help with optimizing that query. I have two tables: User and Usage. Table user has two fields: User_Id and Date_Created and a non-clustered index on User_Id. Table usage has two fields also: User_Id and Date_Used and non-clustered index on both fields. The User table is populated when the user registers. The Usage table is populated every time the user opens a document.
Here is what I need to do: get the number of users from the Usage table who opened a document at least once after they have registered during the last 30 days for each day in the time frame, where the time frame varies. For example, if the time frame is 8/01/00 - 8/31/00, I need to get the following data:
date returns ---- ------- 8/01/00 10 (10 users returned to the document between 7/2/00 and 8/1/00) 8/02/00 15 (15 users returned between 7/3/00 and 8/02/00) . . . 8/31/00 20 (20 users returned between 8/1/00 and 8/31/00)
Here is my query:
SELECT [date], (SELECT count(distinct user_id) FROM usage u JOIN [user] ON u.[user_id] = [user].[user_id] WHERE u.[date] BETWEEN usage.[date]-30 AND usage.[date] AND u.[date]>[user].date_created GROUP BY usage.[date])returns FROM usage WHERE [date] BETWEEN @date1 AND @date2
This query works fine, but too slow. We use MS SQL server 7.0.