DB Engine :: Huge Difference Between Estimated And Actual Rows
Aug 21, 2015
There is a stored procedure. It uses linked server. As we will be migrating to amazon cloud, our architect instructed not to replace linked server with openquery.
I have found execution plan with significant difference between actual and estimated number of rows (roughly actual/2=estimated) in non-clustered index seek.Statistics are updated.
I have a stored procedure that will execute with less than 1,000 reads onetime (with a specified set of parameters), then with a different set ofparameters the procedure executes with close to 500,000 reads (according toProfiler).In comparing the execution plans, they are the same, except for the actualand estimated number of rows. When the proc runs with parameters that producereads that are less than 1,000 the actual and estimated number of rows equal1. When the proc runs with parameters that produce reads are near 500,000 theactual rows are approximately 85,000 and the estimated rows equal 1.Then I run:DBCC DROPCLEANBUFFERSDBCC FREEPROCCACHEIf I then reverse the order of execution by executing the procedure thatinitially executes with close to 500,000 reads first, the reads drop to lessthan 2,000. The execution plan shows the acutual number of rows equal to 1,and the estimated rows equal to 2.27. Then when I run the procedure thatinitially executed with less than 1,000 reads, it continues to run at lessthan 1,000 reads, and the actual number of rows is equal to 1 and theestimated rows equal to 2.27. When run in this order, there is consistency inthe actual and estimated number of rows and the reads for both executionswith differing parameters are within reason.Do I need to run DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE on productionand then ensure that the procedure that ran close to 500,000 reads is runfirst to ensure the proper plan, as well as using a KEEP PLAN option? Or,what other options might you recommend?I am running SQL 2000 SP4.--Message posted via SQLMonster.comhttp://www.sqlmonster.com/Uwe/Forum...eneral/200609/1
Having a SQL Server 2012 Enterprise (x64) on a Windows 2012 R2. We need to know, a reliable way, the number of processor sql server is using at a give time. We already know how many total processor are available to sql by getting info from sys.dm_os_sys_info.
For instance, a server has 40 processors, we want to know how many of those are being used at a given time. Since the load on the server may not be that high, we would like to know how many processors we can eliminate and the load will still be unaffected.
After watching the server performance for a while, we are predicting we may only need 16. But we would like to get some statistics before we reduce it to this number.
I have a Stored Procedure (SP) that creates the data required for areport that I show on a web page. The SP does all the work and justreturns back a results set that I dump in an ASP.NET DataGrid. The SPtakes a product area and a start and end date as parameters.Here are the basics of the SP.1.Create temp table to store report results, all columns are createdthat will be needed at this point.2.Select products and general product data into the temp table.3.Create a cursor that loops through all the products in the temptable, running a more complex query with each individual product.4.The results of that query are updated on the temp table based on thecurrent product of the cursor.5.A complex "totals" query is run and the results from that areinserted into the temp table as the last 3 rows.In all we are talking about 120 rows in the temp table with 8 columnsthat are mostly numbers.I originally wrote this report SP about a month ago and it worked fine,ran in about 10 - 20 seconds based on server traffic and amount ofdata in the temp table. For the example I'm running there are the120 products.Just yesterday the (SP started timing out and when I ran the SPmanually from Query Analyzer (QA) (exec SP_NAME ... ) with the sameparameters as it was getting in the code it took 6 minutes to complete.I was floored. I immediately copied the SQL out of the SP and pastedinto another QA window, changed the variables to be hard coded valuesand ran it. It completed in 10 seconds.I'm really confused now. I ran a Profiler on the 2 when I ran themagain. The SQL code in QA executed again in ~10 seconds with 65,000reads. When the SP finished some 6 minutes later it had completed witthe right results but it needed 150,000,000 reads to do its job.How can the exact same SQL code produce such different results (time,disk reads) based on whether its in a SP or just run from QA but stillgive me the exact same output. The reports both look correct and havethe same numbers of rows.I asked my Sys Admin if he had done anything to anything and he saidno.I've been reading about recompiles and temp table indexes and allkinds of other stuff that could possibly be affecting it but havegotten nowhere.Any ideas are appreciated.
I created a CLR UDF that returns a large number of rows, when I run it from my VPC (XP, SQL Server Developer Edition and 1GB Memory) it takes approx 2 min and 30 secs to start displaying the rows (Using Management Studio), when I run the same query in our development server (Win 2003, SQL Server Enterprise Edition, 8 GB Memory and 8 Processors) it takes more than 15 min to start displaying the results, does anybody have an idea why is this happening?
I have encountered a problem with a specific set of tables. The same select yields slightly differing execution plans in two different environments (instances). But the slight variation seems to contain a huge differences in stats. I don't know the significance of these stats. The two tables have the exact same indices.
This is the selcet statement:
SELECT 'xx' FROM DUKS.dbo.Profiler WHERE DNA_Løbenummer IN (SELECT DNA_Løbenummer FROM DUKS.dbo.Effektregister WHERE Sagsnummer = '2015-00002')
I have two queries yielding the same result that I wanted to compare for performance. I did enter both queries in one Mangement Studio query window and execute them as one batch with the actual query plan included.Query 1 took 8.2 seconds to complete and the query plan said that the cost was 21% of the batchQuery 2 took 2.3 seconds to complete and the query plan said that the cost was 79% of the batch.The queries were run on my local development machine. I was the only user. No other programs were running at the time of this test. The results are repeatable.I understand that the query with the lowest cost is not necessarily the fastest query. On the other hand, the difference is quite big. The query that has approx. 80% of the cost takes 20% of the time and the other way around. I have two questions:
Is such a discrepancy normal?Can conclusions be drawn from the cost distribution? For instance, does the query that takes 8.2 seconds but only costs 21% scale better?
The benefit of the actual execution plan is that you can see the actual number of rows passing through each step - compared to the estimated number of rows.But what about the "cost percentages" ?I believe I've read somewhere that these percentages is still just an estimate and is not based on the real execution.Does anyone know this and preferable have a link to something that documents it?Thanks
Question A : Â I need to truncate a table, it has 21 millions of rows and it has a size of 14 GB. Â Â Â Â Â Â Â
                1-  How do I find out if this table is not being referenced by a FOREIGN KEY?                 2-  Does it Participates in a indexed view?                 3- Is being published by using transactional replication or merge replication?
Question B: Â How do I safely truncate that table?Â
I have a view in SQLServer 2005. It took 30 sec. to finish. Then I deleted 4500 records from one table that is used in view. It took 90 sec. to finish now. I did a comparison on Actual Execution Plan between before I deleted data and after I deleted data, they are almost same, only different is Actual Number Rows become less after deleted data. So, I wonder why data become less but time become more. When I look closely on the Actual Execution Plan, the ridiculous thing is, there are only Estimated Operation Cost on each step, no Actual Operation Cost. I guess there are something wrong with optimizer because reuse same Execution Plan, but how can I tell which step wrong without Actual Operation Cost.
I want to compare two tables and log the difference in new table with the fields as (old value,new value, column name). The column name should be the changes value column.
I have a CTE query against a table with 32K rows that runs fine in 2008R2. I am running it in 2014 Std Ed. against the same data and it runs very slowly. Looking at the execution plan I think I see what's contributing to the slowness.
Note that the "actual number of rows" is some 351M...how is this possible?
the query:
declare @amts table (claim int,allowed decimal(12,2),copay decimal(12,2),deductible decimal(12,2),coins decimal(12,2)); ;with unpaid (claimID) as (select claimID from claim where amt+copay + disct+mm + ded=0) insert @amts select lineID, sum(rc), sum(copay), sum(deduct), case when sum(mm)>0 and (sum(mm)<sum(mmamt)) then sum(mm) else 0 end from claimln where status is null and lineID not in (select claimID from unpaid) group by lineID
it's like there's some massively recursive process going on?
The values in the final table are the days used by each ID transferring from status i to status i-1. E.g., ID uses 8 days (10-May-13 - 2-May-13) to go to status 3 from status 4.
It is hard for me to come up with a table like the final table, although I know that the difference between two adjacent rows can be computed by using self-join and timediff().
For displaying data on the report I am using the following query
SELECT ReferenceNumber, ActivityID, ActivityTimeStamp, ActivityType, ActivityPerformedBy FROM ActivityDetails ORDER BY ReferenceNumber, ActivityID
The result set is
Issue Reference #
Activity ID
Activity Date/Time
Activity Type
100819
4521404
11/4/07 2:06 PM
INIT
100819
4521405
11/4/07 2:07 PM
LOG
100819
4521406
11/4/07 2:07 PM
LOG
100819
4521473
11/4/07 2:28 PM
TR
100819
4521501
11/4/07 2:33 PM
WIP
100819
4521839
11/4/07 3:25 PM
RE
100819
4521844
11/4/07 3:27 PM
RE_Method
100819
4522575
11/4/07 8:53 PM
CL
100820
4521412
11/4/07 2:10 PM
INIT
100820
4521419
11/4/07 2:13 PM
ATTACHTDOC
100820
4525856
11/5/07 2:49 PM
ATTACHTDOC
100820
4525859
11/5/07 2:49 PM
LOG
100820
4525869
11/5/07 2:49 PM
CL
100821
4521423
11/4/07 2:14 PM
INIT
100821
4521425
11/4/07 2:14 PM
LOG
100821
4521429
11/4/07 2:14 PM
TR
100821
4521432
11/4/07 2:14 PM
ACK
100821
4522219
11/4/07 4:58 PM
RE
100821
4522221
11/4/07 4:58 PM
RE_Method
100821
4522447
11/4/07 6:51 PM
CL
On the report I have used the grouped by clause on 'Issue Reference #'. I want one more column which would calculate the difference between two consecutive Activity Date/Time of the same reference #.
e.g. Time difference between 4521404 and 4521405, 4521405 and 4521406, 4521406 and 4521473 etc. Please note that the difference between 4521412 and 4522575 will NOT be calculated since they are from different Reference Numbers.
I have a table named Orders and this table has two relevant fields: CustomerId and OrderDate. I am trying to construct a query that will give me the difference, in days, between each customer's order so that the results would be something like: (using Northwind as the example)
At the moment, I have the following query that I think is on the right track: €¦ SELECT dbo.Orders.CustomerID, dbo.Orders.OrderDate AS LowDate, Orders_1.OrderDate AS HighDate, DATEDIFF([day], dbo.Orders.OrderDate, Orders_1.OrderDate) AS Difference FROM dbo.Orders INNER JOIN dbo.Orders Orders_1 ON dbo.Orders.CustomerID = Orders_1.CustomerID AND dbo.Orders.OrderDate < Orders_1.OrderDate GROUP BY dbo.Orders.CustomerID, dbo.Orders.OrderDate, Orders_1.OrderDate, DATEDIFF([day], dbo.Orders.OrderDate, Orders_1.OrderDate) ORDER BY dbo.Orders.CustomerID, dbo.Orders.OrderDate, Orders_1.OrderDate €¦
So, do any of you have any ideas how I might achieve this? I know how to do it using a stored procedure, but I am trying to avoid that; I€™d like to do this in a single query.
i have a matrix, and in that matrix i need to have one column which calculates the percentage change between a value on the current row and the same value on the previous row.
Is this possible? The RunningValue() function isn't of help as it can't help me calculate the change between two rows, and Previous() doesn't work in a matrix (why???!!!!!). Also calculating this as part of the query isn't possible as there is a single row group on the matrix, and the query is MDX.*
Thanks,
sluggy
*for those who are curious, the matrix is showing data an a per week basis, the row group is snapshot date, i am trying to measure the change in sales at each snapshot.
I have here a query which delivers me the user data from the last month. The problem what I have is, if employee have more then one rows in this month, they will be also deliverd. But exactly this is not needed. I need only the last record from last month.
SELECT a.FIRMA, a.PSNR, a.FELDNR, a.PFLFDNR, a.INHALT AS FTE, a.PFGLTAB,
As you can see, PSNR=364 has two rows and i need only the row from last month and last date.Maybe we can use Field PFLFDNR as counter. get only one row for every employee?
how to measure a change in inventory over various stores.  My sql2008R2 express db gets a new row of data everyday from each store(about 40 stores) for a single product stock count "OnHand" and if there is any new stock on order.  When the new stock arrives it is added to the "OnHand" count.  I want to measure the delta change per day,per store.  I'm stuck on how to separate the stores and how to query the delta of stock.My data base looks like this                 TimeStamp Store OnHand OnOrder 2015/04/22 18   1 - Concord 12    0 2015/04/23 11   1 - Concord 11  [code]....
I am trying to create an exception report that will show the difference between two versions of the same row. (Combination of two different sources in sql, with source 1 having childID = 0 and the other source having childID = 1; parentID is the link between them)
The results are as follows:
ParentID - ChildID - Col1 - Col2 - Col3 1 - 0 - AA - BB - CC 1 - 1 - AA - BF - CC 2 - 0 - GG - NN - TT 2 - 1 - DE - NN - TA 3 - 0 - etc 3 - 1 - etc 4 - etc
id     type    timestamp 1001   start1   10:34:23:545 1001   start2   10:34:24:545 1001   end2   10:34:24:845 1001   end1   10:34:25:545 1002   start1   10:34:25:645 1002   start2   10:34:25:745 1002   end2   10:34:25:945 1002   end1   10:34:25:965
I need the result as follows
id        millisecond diff start1end1               millisecond diff start2end2 1001   end1 timestamp-start1 timestamp   end2 timestamp-start2 timestamp 1002   end1 timestamp-start1 timestamp  end2 timestamp-start2 timestamp
Given a table that has three columns that together create a key and two columns that together define NameValue pairs, how can the difference between instances of values be calculated and displayed?One table is used to contain periodic dumps of data from various sources. Because this is an early stage of development for the software project instead of having explicit columns that contain specific data the table contains name/value pairs. This allows the software to export anything to the database table. When this data is imported, earch row shares the same key (three columns containing a machine type, serial number and a timestamp), a name that identifies the data and a string that contains the actual data. While this arrangement makes it trivial to support the addition of any data that the software developers want to export, it makes it less obvious as to how to generate reports.Let's make an example. Lets assume that there are two vending machines, each of which has just 3 snacks and each of which generates two separate reports.
Type Sn Timestamp Name Value A 1 2015-08-15 12:34 Snick 5 A 1 2015-08-15 12:34 Mars 10 A 1 2015-08-15 12:34 MandM 0B 2 2015-08-15 15:31 Snick 1 B 2 2015-08-15 15:31 Mars 9 B 2 2015-08-15 15:31 MandM 0A 1 2015-08-21 09:12 Snick 11 A 1 2015-08-21 09:12 Mars 18
[code]...
So, the names of the values become the report's columns. The reports are sorted by timestamp, then by type, then by serial number.The value associated with the previous row that shares the same name is subtracted from the value of the next row in which the same name occurs and that becomes the displayed value in the report.
Basically I want to calculate the time spent by S_Users on a particular S_ACTV_CODE:
- S_ACTV_CODE_PREV means the previous active records.
- S_START_TIME is the time of S_DATETIME when a S_ACTV_CODE starts
- S_END_TIME is the time before a S_ACTV_CODE changes to another S_ACTV_CODE
- For the first record, S_ACTV_CODE is null, so there is no S_ACTV_CODE_PREV, so S_ACTV_CODE_PREV is NULL
- For the second record S_ACTV_CODE has some value, but S_ACTV_CODE_PREV is NULL for first record. So second record S_ACTV_CODE_PREV is also NULL
- For the last record (means S_ACTV_IND = 1), the user is currently working on it and S_ACTV_CODE is not changed. So S_END_TIME is a open time and we want to keep it as NULL
I have the table with the similar set of records which mentioned below, find the time difference between two rows of record. By Using the MsgOut column i have to find time taken b/w PS & PV and some record doesnt have PV .
equipmentid downtimestartdate downtimeenddate  dowtime a3er 2015-03-15 02:00 2015-03-17 23:00       69 b6e4 2015-03-18 13:00 2015-03-20 04:00       39
i have many rows(in our production table, thousands of rows are there) like above in a table and i want like below output(in output total 6rows only)
equipmentid downtimestartdate downtimeenddate dowtime a3er      2015-03-15 02:00 2015-03-15 24:00       22 a3er      2015-03-16 00:00 2015-03-15 24:00       24 a3er      2015-03-17 00:00 2015-03-15 23:00       23