In Terms Of Performance: Select Top .. OR Set Rowcount ?
Apr 29, 2008
I have a table that contains 10 millions records. The following 2 statements, which one provide better performance? Frankly, i have no idea how to compare the execution plan...
hi, how to retrieve row count from an insert statement in dynamic sql?
CREATE PROCEDURE [dbo].[procTest] AS BEGIN SET NOCOUNT ON;
DECLARE @nsql NVARCHAR(MAX) DECLARE @i INT SET @nsql = 'select * into tbl_user_test from tbl_user; select @i = @@rowcount;' exec sp_executesql @nsql; END
I have a vb.net application that executes a simple flat file to sql table dtsx package. I want to capture the rowcount to display back to the user to verify the number of rows that were inserted or updated to the table. I have a Row Count component placed between the flat file source(without errors) and the destination component. I have assigned a variable named RecordCount to the Row Count component. So far so good I hope : )
Now, I also use a variable to "feed" the package the flat file source. This works fine, but I cannot figure out how to retrieve the row count information and how to assign that to the variable RecordCount.
Also, if anyone has any insight on the way to work with the OnProgress method in SSIS I would appreciate that as well. In SQL 2000 using DTS I create a "PackageEventsSink" that I had found online, and it worked great for monitoring the progress of the DTS. Can't seem to figure out how to get it to work in SSIS.
I've created a Stored Procedure which performs a Select against my table, and displays the rows returned via these stmts - @RowCount int Output SELECT @rowcount = @@RowCount This Works fine when Executed from SQL Server, but when trying to invoke the SP from my ASP page it complains that the SP expects parameter '@RowCount' which was not supplied. I don't need to supply it when invoking the SP directly, why do I need to supply it from ASP? I tried defining it as NULL within my SP, but can't seem to get it to accept both the NULL & Output parms. And while I'm at it, how do I get my ASP page to display this @RowCount value?
In sql I perform the following SELECT * FROM xlsdci x LEFT OUTER JOIN fffenics f ON f.[derived deal code] = x.[manual dcd id]
which gives me a row count of 2709 rows
In SSIS I have a merge join component (left outer) left input = xlsdci with a sort order of 1 ASC on [manual dcd id] (OLE DB source component) right input = fffenics with a sort order of 1 ASC on [derived deal code] (OLE DB source component)
which when run in the IDE gives me a rowcount of only 2594 rows
Why is this so?
Also if I change the join to INNER in the merge join, the number of rows drops dramatically to only 802. Fair enough, I hear you cry, maybe there are IDs in the 'xlsdci' table that are not in the 'fffenics' table. Ok. But the following SQL reveals that there are only 14 rows(IDs) in 'xlsdci' that are not in 'fffenics'
SELECT * FROM xlsdci WHERE [manual dcd id] NOT IN (SELECT [derived deal code] FROM dbo.fffenics)
I have a problem where my users complain that a select statement takes too long, at 90 seconds, to read 120 records out of a database. The select statement reads from 9 tables three of which contain 1000000 records, the others contain between 100 and 250000 records. I have checked that each column in the joins are indexed - they are (but some of them are clustered indexes, not unclustered). I have run the SQL Profiler trace from the run of the query through the "Database Engine Tuning Advisor". That just suggested two statistics items which I added (no benefit) and two indexes for tables that are not involved at all in the query (I didn't add these). I also ran the query through the Query window in SSMS with "Include Actual Execution Plan" enabled. This showed that all the execution time was being taken up by searches of the clustered indexes. I have tried running the select with just three tables involved, and it completes fast. I added a fourth and it took 7 seconds. However there was no WHERE clause for the fourth table, so I got a cartesian product which might have explained the problem. So my question is: Is it normal for such a type of read query to take 90 seconds to complete? Is there anything I could do to speed it up. Any other thoughts? Thanks
There are several terms using ms Server that I don't know and cannot find in my books. Does MS provide that, besides BOL where it is difficult to find good explanations or even find definitions?
thx,
Kat
ps. would be a nice feature if they don't have it currently.
Hopefully im asking this in the right place, sorry if its not, maybe you could point me in the right direction
I have been informed that use of MDF Files (SQL Server Express) Databases on the net was restricted as this was classed as multi connections and therefore was outside the free license agreement.
I am looking at commercially developing and marketing a web based system for with a relatively small database footprint (well under 1gb) with ASP.NET 2.0 and like the look of SQL Server Express.
Could anyone clear up whether or not this is allowed under the SQL Server Express terms of use, or point me in the direction of somewhere i can find information.
basically, what I am trying to achieve to 2 types of search functions...
Search for All terms (easy and complete) and search for Any Terms...
the way I have gone about this so far is to in my asp.net app, split the search string by spaces, and then search for each word, and merging the resulting dataset into the main return dataset.
this, however has a few problems. the result dataset will contain duplicate values, and i am running queries in a loop.
What i am looking for is a one-stop-shop stored procedue that will split the search string, loop through each word, and add the results to a return table, ONLY if it does not exist already within the return table.
Can anyone point me in the right direction... basically with the splitting of the string and the looping through the words...the rest i think i can handle...
or any other hints/tips/tricks would also be helpful.
Hi, I need to categorize a lot of html or text files according to a list of terms and I wonder if terms lookup is adequate for this. The problem is that terms lookup can only take an Oledb source as input. My files can be up to 80 Kb big and aren't columns structured.
Should I import my files in a table ? But if so, how can I import a column with more than 8000 characters ?
First and foremost, thanks for reading and responding!Does it matter how big a stored procedure is if you do things in the stored procedure such as:declare the parametersIF @Parm_Select = '<ALL>'do a select IF @Parm_Select <> '<ALL> and @Parm_Report = '1'do a selectIF @Parm_Select <> '<ALL> and @Parm_Report = '2'do select This goes on and on and on and I have written a couple of stored procedures that are about 1500 lines of code based upon parameters passed I do not create any tables - they are just all select statements based upon the parameters passed.I thought I was doing the right thing cause I did not want to have to write a procedure that called a procedure, (I read this and got confused on the return prarmeters cause there is a lot of data being returned from the select ----- I don't think I said that correctly! . I am just learning this SQL stuff and I it is cool and I am excited - but I don't want to develop any bad habits in the beginning - and I try to look these things up on the www - but I just don't get explicit answers from reading all of this stuff. Thank to all in advance!
Is it possible for the terms lookup function to manage the differences between US and english spelling ? For example if I search for the terms "color" and "categorization", I'd would like that the terms lookup also count the "colour" and "categorisation" occurences in the text.
I have a table which when i execute a select statement the execution lucks performance. The table currently has 1000 rows.
Table: CREATE TABLE [dbo].[Listings]( [ListingID] [int] NOT NULL, [UserID] [int] NOT NULL, [NodeID] [int] NOT NULL, [Title] [varchar](100) NOT NULL, [Description] [varchar](500) NOT NULL, [Keywords] [varchar](200) NOT NULL, [URL] [varchar](150) NOT NULL, [DateSubmited] [datetime] NOT NULL ) ON [PRIMARY]
Even a simple sql statement without any where clause will take 3 seconds to execute.
select LISTINGID, TITLE, DESCRIPTION from listings
I have a table of products which contains a parent-child hierarchy (a self-relationship, creating a built-in recursive structure using a foreign key). The point is: I must create a query that shows only products which are not parents. I can achieve this using a "not in" or a "full join".
Can anybody tell me which of the 2 below is the best option, and why?
Code Snippet SELECT ProductID FROM CRP.Product WHERE ProductID NOT IN (SELECT ProductFatherID FROM CRP.Product WHERE ProductFatherID is not NULL)
Execution plan:
Select (cost 0%) <--- Merge join (25%) <--- Clustered index scan (54%) ^--- Stream aggregate (1%) <--- Index seek (19%)
Code SnippetSELECT PC.ProductID FROM -- Product Child CRP.Product PC FULL JOIN -- Product Parent (SELECT ProductFatherID ProductID FROM CRP.Product WHERE ProductFatherID is not NULL) PP ON (PC.ProductID = PP.ProductID) WHERE PP.ProductID is NULL
I have only been working with sql server for sever months so I would consider myself a novice. I am trying to return 3 different datasets from the same query information. for example a person will do a search for articles on "flowers red annuals" I first want to return a datase with item details. Name of book, copyright, author etc...
I then want to populate 2 drop down lists to help narrow down the search
1. Article type Newspaper(count) Magazine(count) Hardback Book(count) Softback book(count) Etc...
I am currently running 3 separate stored procedures against the master file to return this data. 1. select ... for the detail 2. Select ...count() group by for the other 2 Is there a better way to get this information without having to run the query 3 times? Is there a way to run the (select..group by) against initial detail information to populate the 2 drop down controls. Is there a way to return more that one dataset of information from one stroed procedure? Because the data source has millions of records I would assume that this is inefficient or does it matter? Thanks in advance for any advice.
I want to determine the performance impact caused by the extensive use of the 'select into #' statement in a production environment. The current situation is that our reports team extensively uses the 'select into #' statement to build smaller subsets of data. These subsets are then used as the basis to create summary style reports and exports. All this is accomplished via the use of SQL pass-through.
After these reports/exports are completed and tested, they are then released to our operations department and the users. The reports/exports then can be run against the production server at the discretion of the user, provided they have the appropriate permissions. These reports/exports target the live data on the primary production server that already has been designated for the use of the application software.
Now I know that reporting against a transactional-based server, where the users run the application, is not a very good idea. (Inherited) I am currently migrating all reports/exports to a reporting server. Although it will still be transaction-based, the reports/exports will be isolated from user activity. Eventually we will be moving toward a warehouse scenario.
I also know that the extensive use of the 'select into #' statement is not a coding practice for use in production. I provided several alternatives to this practice
1) insert..select 2) insert..execute - from stored procedure
I have read that in the in sql 6.5 that this may cause severe performance and locking behaviors in system db's and tempdb. However, in the following document on the Microsoft Knowledge Base, it indicates that SQL 7.0 may have corrected this issue.
Q153441 - FIX SELECT INTO Locking Behavior.htm
Despite the indication of being corrected, I am still not convinced. I am frequently seeing drastic performance hits, especially when several of the reports are running. (which is very common) My concern is that moving these reports/exports to a reporting server may save the users; I believe that it may be migrating the problem to another location. I will be working with the developers to optimize their code and will investigate index issues.
** To make a long story short. I would like someone who has experience with this provide me with the top 5+ reasons not to use the 'select into #' methodology in a production environment. Further, if anyone has any documentation, I would surely like the info.
We use an ASP/MS SQL 2000 system to send out our mass e-mailing to about 3,500 subscribers (and the list is growing). There are some duplicate entries in the DB and I was thinking about using this code
SELECT DISTINCT email FROM Subscribers
to remove the duplicates (at least until we can get around to cleaning up the data and then putting up new subscriber form to prevent duplicate entries).
I was wondering, though... Will this have a significant impact on our performance? I mean, that's a lot of e-mail addresses to process and I don't want to bog our system down unnecessarily.
I am running SQL 2000 SP4. I have a stored procedure which calls a function in an update statement. Previously, this SP which is pretty intensive and used to build a report would run in appox 10 minutes. Now it is taking 3 hours to execute. I have tracked the performance issue to the function. The function is taking 3 to 4 seconds to execute for each row and it is updating 2,600 rows.
I am including the select statement causing problems.
SELECT TOP 1
@m_ECodeMsg = tev.BillMsg
FROM tblUsageHistoryDetail tuhd
INNER JOIN tblMeterReadEcode tmre
ON tmre.MeterReadID = tuhd.CurrentMeterREadID
INNER JOIN tblUsageHistory tuh
ON tuh.UsageHistoryID = tuhd.UsageHistoryID
INNER JOIN tblTran tt
ON tt.TranID = tuh.TranID AND tt.ServiceLocationID = tuh.ServiceLocationID
INNER JOIN tblServiceLocation tsl
ON tsl.ServiceLocationID = tt.ServiceLocationID
INNER JOIN tblRateUsage tru ON tru.RateID = tsl.RateID
INNER JOIN tblECodeValue tev
ON tev.ECodeValueID = tmre.ECodeValueID AND tev.ECodeTypeID = tmre.ECodeTypeID
WHERE tt.StatementID = @i_StatementID AND tru.AltServiceID = tsl.ServiceID AND tt.TranTypeID = 2 AND tev.ECodeTypeID = 6
The function has been in use for over a year and SP4 has been installed on this server for serveral months now. If I run this on an old box with SP3a it runs fine. I have rewrote the statement without the "TOP 1" and execution of the function is less than 1 sec.
Any ideas as to why my performance has taken such a hit?
Hello -- thank you for taking the time to read this.
I have a very large table that is used both for archives and new information. To get the current information, the table is queried by many different users at various polling periods. The SELECT required includes about fifteen JOINS, and only returns about 200 rows at any given time.
So I got to thinking if it might be faster to periodically run the big query as a SELECT INTO into a smaller table and letting the polling clients query the smaller table with SELECT *. Periodically, the smaller table would be DROPPED and refereshed with another SELECT INTO.
Trouble is, the data would have to be updated once every 30 seconds, and there are inbound polls coming at the rate of about 200 per minute. It got me to thinking what might happen if a client attemtped to query the smaller table when it was in the process of being dropped and refilled.
So my question is three-part:
1) assuming a larger table of about 500,000 records and only 500 pertinent at any given time, is there any real potential of performance enhancement by switching to a SELECT INTO table?
2) if so, is there a chance of a client failing a query if the inbound query somehow collides with the DROP/SELECT INTO procedure?
3) if so, is there any way to prevent it or a better way of doing this?
Thanks again for reading, and in advance for any help you can provide. I apologize if I sound like a dummy - it's hard to fake intelligence!
Hi All,I am getting slower performance of select statements in MS SQL. I amfinding select statements in MS SQL are even slower than MS ACCESS. Isthere any way to improve the performance of select statements in MSSQL by tuning the database of anything else??Thanks in advance!Hoque
I've just inherited a system and have some concerns about the speed ofconnections to a remote server (SQL2000). If I do a simple selectstatement on the table below, it takes 14 minutes to retrive 6 millionrows across a 2Mb line. Obviously it's a reasonable amount of data toretrieve, but I would have thought this would be quicker if I'm honest.Run locally, this is 50 seconds.My thoughts are that there may be some issues with our connection (weget general network errors sporadically, which are being looked at),but wanted some thoughts if the performance is acceptable for what itis doing with what is available. I don't think there is a SQL issue,but want to check if this sounds about right.It's early days, so I'm after a general impression of the speed ofretrieval for the amount of data on the available bandwidth. Assuming abest performance scenario, what is the minimum time it should take as abest guess ?ThanksRyanCREATE TABLE [FIELD_VALUES] ([DEALER_DATA_ID] [int] NOT NULL ,[FIELD_CODE] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOTNULL ,[FIELD_VALUE] [numeric](15, 5) NULL ,[CHANGED_TYPE] [int] NULL ,CONSTRAINT [PK_FIELD_VALUES] PRIMARY KEY CLUSTERED([DEALER_DATA_ID],[FIELD_CODE]) WITH FILLFACTOR = 90 ON [PRIMARY]) ON [PRIMARY]GO
We are now working with SQL2000sp4, planning migration to SQL2005 in few months though.
I've faced performance issues with large tables (200-500 mln rows, 50-100Gb of data+indexes)
New data are uploaded into tables once a day, around 1mln rows. Thats the only time of inserting data, during daytime tables are used for SELECTs only.
The problem that daily INSERTs are taking too much time now, because of rebuilding few indexes for the table.
I noticed that partitioning solution looks like solving this problem well. So i splitted master data table into 4 tables:
old master table: CREATE TABLE [dbo].[DTB] ( [report_date] [smalldatetime] NOT NULL , [param] [char] (10) COLLATE Cyrillic_General_CI_AS NOT NULL , [param_value1] [decimal](18, 2) NULL , [param_value2] [decimal](18, 2) NULL , [param_value3] [smallint] NULL ) CREATE INDEX [IX_DTB_DT_ACC] ON [dbo].[DTB]([report_date], [param])
new partition1: CREATE TABLE [dbo].[be_data_DTB_part_2007_q1] ( [report_date] [smalldatetime] NOT NULL , [param] [char] (10) COLLATE Cyrillic_General_CI_AS NOT NULL , [param_value1] [decimal](18, 2) NULL , [param_value2] [decimal](18, 2) NULL , [param_value3] [smallint] NULL , CONSTRAINT [CK_be_data_DTB_part_2007_q1_report_date] CHECK ([report_date] >= '2007-Jan-01' and [report_date] <= '2007-Mar-31') ) CREATE CLUSTERED INDEX [idc_be_data_DTB_part_2007_q1_report_date_param] ON [dbo].[be_data_DTB_part_2007_q1]([report_date], [param])
Similar are definitons for other partitions - q2, q3 and q4.
And here is partitioned view itself: create view dbo.data_DTB as select * from dbo.be_data_DTB_part_2007_q1 union all select * from dbo.be_data_DTB_part_2007_q2 union all select * from dbo.be_data_DTB_part_2007_q3 union all select * from dbo.be_data_DTB_part_2007_q4
I want users to access data SELECTing from view data_DTB, while I perform daily inserts right into be_data_DTB_part_2007_q4.
In general, this solution works well. For example:
Code Block set statistics profile on go select * from data_DTB where report_date = '2007-Apr-16' go set statistics profile off go
1290674 1 SELECT * FROM [data_DTB] WHERE [report_date]=@1 1290674 1 |--Concatenation 0 1 |--Filter(WHERE:(STARTUP EXPR(Convert([@1])<='Mar 31 2007 12:00AM' AND Convert([@1])>='Jan 1 2007 12:00AM'))) 0 0 | |--Clustered Index Seek(OBJECT:([MYDB].[dbo].[be_data_DTB_part_2007_q1].[idc_be_data_DTB_part_2007_q1_report_date_param]), SEEK:([be_data_DTB_part_2007_q1].[report_date]=Convert([@1])) ORDERED FORWARD) 1290674 1 |--Filter(WHERE:(STARTUP EXPR(Convert([@1])<='Jun 30 2007 12:00AM' AND Convert([@1])>='Apr 1 2007 12:00AM'))) 1290674 1 | |--Clustered Index Seek(OBJECT:([MYDB].[dbo].[be_data_DTB_part_2007_q2].[idc_be_data_DTB_part_2007_q2_report_date_param]), SEEK:([be_data_DTB_part_2007_q2].[report_date]=Convert([@1])) ORDERED FORWARD) 0 1 |--Filter(WHERE:(STARTUP EXPR(Convert([@1])<='Sep 30 2007 12:00AM' AND Convert([@1])>='Jul 1 2007 12:00AM'))) 0 0 | |--Clustered Index Seek(OBJECT:([MYDB].[dbo].[be_data_DTB_part_2007_q3].[idc_be_data_DTB_part_2007_q3_report_date_param]), SEEK:([be_data_DTB_part_2007_q3].[report_date]=Convert([@1])) ORDERED FORWARD) 0 1 |--Filter(WHERE:(STARTUP EXPR(Convert([@1])<='Dec 31 2007 12:00AM' AND Convert([@1])>='Oct 1 2007 12:00AM'))) 0 0 |--Clustered Index Seek(OBJECT:([MYDB].[dbo].[be_data_DTB_part_2007_q4].[idc_be_data_DTB_part_2007_q4_report_date_param]), SEEK:([be_data_DTB_part_2007_q4].[report_date]=Convert([@1])) ORDERED FORWARD)
As far as i see it checks filter parameter fitting CHECK constraint for each partition. Then it peforms clustered index seek for partition actually containing data and avoids using 3 other partitions. Thats great! This example just illustraits that partitioning actually works for me.
Unfortunately, there is another query with just awful performance on partitions comparing to single table. Lets try to get few rows entered last day:
Code Block set showplan_text on go select top 10 * from DTB order by report_date desc go set showplan_text off go
As one can see that€™s just awful . When I make graphical execution plan with Ctrl+L it says costs for Sort operations are thousands. I didn€™t run this query to check statistics profile, because on our server it will run for hours.
I found a topic regarding this problem: http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=1270479&SiteID=17
Mostafa Elhemali describes exactly my problem in the last post. I also though that getting top 10 * from partitioned view shouldn€™t be a problem €“ it€™s quite obvious just to grab top 10 from each partition and then find top 10 amongst them. Looks like it doesn€™t work this way though.
So the question is. Is there any new workarounds for this problem? Or maybe it is already solved in latest patches for SQL2005? I know that SQL2005 introduces new way of partitioning tables, maybe the problem will go if using SQL2005 partitioned tables instead of oldstyle partitioned views?
Thank you.
p.s. Upon reviewing my post i noticed that issued ORDER BY report_date DESC against unpartitioned table, and ORDER BY report_date against partitioned view. Well, specifying ORDER BY report_date DESC for partitioned view gives similar results, except for few ASCs are replaced with DESCs.
the cursor at the bottom  iterates only to print the number of rows.The problem is in the select. This takes 30 seconds to iterate through 1242 records.But if I add a TOP 1000000 or whatever number to the select, the same iteration takes less than a 1 second.I've tested each query without cursor, and  both have the same cost and performance. (Not exactly the same plan)Note that I got the same performance improvement declaring the cursor as STATIC.Why the top is affecting the cursor iteration so much?
SQL 2000 Connection String:user id=MyUserName;password=MyPassword;initial catalog=MyDB;server=MyServer;Connect Timeout=30 This SELECT statement returns its 10 results nearly instantly:SELECT * FROM MyTableDitto from above, but completes in 30-40 seconds:SELECT * FROM [dbo].[MyTable]Ditto from above, but completes nearly instantly: SELECT TOP 1000 * FROM [dbo].[MyTable] Obviously I have stopped using the [dbo] syntax in my SqlCommand's (SELECT's and EXECUTE's) but still would like to know why this is.vr, Rich
For some reasons I need to access the same field of the same table twice in a query, and each give out a diferrent valueLike this:"SELECT myTable.id, myTable1.id as subID FROM myTable INNER JOIN ... INNER JOIN myTable as myTable1 ..."The question is, when I write it as myTable as myTable1 will it affect the query performance if myTable is a large table? will it create another so big copy of myTable? or I should create a view like "CREATE VIEW myTable1 AS SELECT id FROM myTable" to reduce the side of myTable1?Thank you.
I have the following queries that run on a view called EntrySummary:
1)
exec sp_executesql N'SELECT COUNT (1) FROM [dbo].[EntrySummary] WHERE [EntrySummary].[SubmissionStatusID] = @SubmissionStatusID0 AND [EntrySummary].[CreatedBy] = @CreatedBy1',N'@SubmissionStatusID0 int,@CreatedBy1 nvarchar(20)',@SubmissionStatusID0=4,@CreatedBy1=N'domainaperson'
2) exec sp_executesql N'SELECT COUNT (1) FROM [dbo].[EntrySummary] WHERE [EntrySummary].[CreatedBy] = @CreatedBy1 AND [EntrySummary].[SubmissionStatusID] = @SubmissionStatusID0',N'@SubmissionStatusID0 int,@CreatedBy1 nvarchar(20)',@SubmissionStatusID0=4,@CreatedBy1=N'domainaperson'
(The only difference between the two queries being the order of the where clauses)
Both return the correct answer (4144), but Query 2 takes 10-15 seconds whereas Query 1 takes < second. If the same query is resubmitted several times, this doesn't affect the times.
(The vast majority of the records have a status of 4, but only about 3% will be created by the person).
Replacing Count(1) with count(*) makes both queries return quickly.
Is sp_executesql creating poor execution plans ? Can anyone explain this behaviour?