DB Engine :: Forcing INSERT / SELECT To Use A Parallel Plan
Jul 13, 2015
I’ve been looking at a bug in an application stored proc. This merely inserts 7 million rows from one table into another (empty) table.
(Id, SinaiTrade)
AS varbinary(MAX))
FROM dbo.TradePrimary ts
Both tables have a clustered index with the same key order (on the first column) but the query optimizer insisted on placing a SORT step in the query plan. This use a lot of tempdb which I wanted to avoid.
I discovered that the bug was a data type mismatch. Fixing the bug and/or dropping the clustered index from the target table before the insert resulted in the plan I expected, that is, with no sort.
I was hoping it would run faster but, unfortunately, as can be seen from the new plan below, a serial plan is used. This results in the insert taking nearly three times as long as the original.
I’ve been looking at Paul White’s article: [URL] ....
and Adam Machanic’s: [URL] ....
Both are excellent articles but neither approaches i.e. trace flag 8649 or make_parallel () give me a parallel plan. Am I missing a trick here? How I can force parallelism? I know there are no features that require a serial zone in the plan otherwise the plan with the sort would not be parallel..
This is SQL Server 2012 Enterprise 11.0.5522.0, 512 GB of RAM and 16 procs.
SELECT acct.USERNAME, SUM(trans.CHARGES) - SUM(trans.CREDITS) AS [Charges - Credits], MAX(trans.ENDPERIOD) AS [Billed Through], acct.FULLNAME, bill.COMPANY, bill.BILLTOCOMPANY, bill.firstname, bill.lastname, bill.STREET1, bill.STREET2, bill.CITY, bill.STATE, bill.ZIPCODE, bill.COUNTRY, acct.PHONE1, acct.PHONE2, bill.EMAIL, acct.BILLPERIOD, acct.PLAN FROM TRANS trans, ACCTS acct, BILLING bill WHERE trans.ACCTNUM = acct.ACCTNUM and bill.ACCTNUM = acct.ACCTNUM and bill.ACCTNUM = trans.ACCTNUM AND acct.CLOSED = 0 AND acct.SUSPENDED = 0 GROUP BY acct.USERNAME, acct.FULLNAME, bill.COMPANY, bill.BILLTOCOMPANY, bill.firstname, bill.lastname, bill.STREET1, bill.STREET2, bill.CITY, bill.STATE, bill.ZIPCODE, bill.COUNTRY, acct.PHONE1, acct.PHONE2, bill.EMAIL, acct.BILLPERIOD, acct.PLAN HAVING SUM(trans.CHARGES) - SUM(CREDITS) > 0 ORDER BY [Billed Through] DESC
SELECT acct.USERNAME, SUM(trans.CHARGES) - SUM(trans.CREDITS) AS [Charges - Credits], MAX(trans.ENDPERIOD) AS [Billed Through], acct.FULLNAME, bill.COMPANY, bill.BILLTOCOMPANY, bill.firstname, bill.lastname, bill.STREET1, bill.STREET2, bill.CITY, bill.STATE, bill.ZIPCODE, bill.COUNTRY, acct.PHONE1, acct.PHONE2, bill.EMAIL, acct.BILLPERIOD, acct.PLANFROM TRANS trans, ACCTS acct, BILLING billWHERE trans.ACCTNUM = acct.ACCTNUM AND bill.ACCTNUM = acct.ACCTNUM AND bill.ACCTNUM = trans.ACCTNUM AND acct.CLOSED = 0 AND acct.SUSPENDED = 0GROUP BY acct.USERNAME, acct.FULLNAME, bill.COMPANY, bill.BILLTOCOMPANY, bill.firstname, bill.lastname, bill.STREET1, bill.STREET2, bill.CITY, bill.STATE, bill.ZIPCODE, bill.COUNTRY, acct.PHONE1, acct.PHONE2, bill.EMAIL, acct.BILLPERIOD, acct.PLANHAVING SUM(trans.CHARGES) - SUM(CREDITS) > 0ORDER BY [Billed Through] DESC
Incorrect syntax near the keyword 'PLAN'.
If i take out SELECT & GROUP BY acct.plan, it works fine.
I've googled a bit and found 'EXPLAIN PLAN' command, I assume it's parsing the 'PLAN' as a command and screwing stuff up. I don't get why it'd take it for a command instead of a column. How does one select a keyword as a column name? Brackets & single quotes didn't do the trick.
I have been trying to the query optimizer to generate a parallel execution plan but no matter the MaxDOP (0) or Cost Threshold (5) settings I use it will only execute in serial.
UPDATE [dbo].[Targus_201412_V7_B] SET [URBAN] =( CASE WHEN [METRO_STATUS] = 'Urban' THEN 1 ELSE 0 END)
I have a scenario where i have to run update task on multiple servers in parallel and once all of them are completed (success or failure) another task is to be run on another server
1. in maintenance plan, if we add tasks which are not joined, will they run in paralled at the same time 2. if we link the last task to all the tasks with link type 'completed' will the last task complete after all tasks are completed or when any one of them is completed (i have big doubt here)
the business requirement behind this is to bring data from multiple servers into shadow copies locally and then process them together. its ok if some server data transfer fails, but its not ok to start processing centrally while data transfer is going on. further, we want to run data transfer from multiple servers in paralleled to save time.
I have one maintenance plan for full backup to run at midnight daily, but somehow it runs another one at 11:40PM which I don't have plan for it. I can see it happened twice by opening job history. They all use same maintenance plan.
The only difference, I can see is in the message, one is "The job succeeded. The job was invoked by Schedule 112(Daily Backup.subplan_1)", the one I did not expect has message "The job succeeded.
The job was invoked by user sa". How to find this job that invoked by user sa and delete it? Again I can only see one job for full backup , but I can see it happened twice from view job history.
how to eliminate a key lookup from the execution plan
1. I created a maintenance plan using Visual Studio 2013 (nothing fancy pretty basic) 2. Using ssms 2014 I imported it (the dtsx file) under the Integration Services and it appeared there successfully 3. I connected to the Database Engine again using ssms 2014 - my expectation was to see it under the Management > Maintenance Plans folder but it was not present.
I have two queries yielding the same result that I wanted to compare for performance. I did enter both queries in one Mangement Studio query window and execute them as one batch with the actual query plan included.Query 1 took 8.2 seconds to complete and the query plan said that the cost was 21% of the batchQuery 2 took 2.3 seconds to complete and the query plan said that the cost was 79% of the batch.The queries were run on my local development machine. I was the only user. No other programs were running at the time of this test. The results are repeatable.I understand that the query with the lowest cost is not necessarily the fastest query. On the other hand, the difference is quite big. The query that has approx. 80% of the cost takes 20% of the time and the other way around. I have two questions:
Is such a discrepancy normal?Can conclusions be drawn from the cost distribution? For instance, does the query that takes 8.2 seconds but only costs 21% scale better?
We know we can use the event lock_deadlock and xml_deadlock_report to capture the deadlock info, however I also want to capture the execution plans for all of the SPIDs in the deadlock graph, how to output the execution plans to the extended events trace results either ? such as if there is an action for execution plan or workaround for it ?If there is no built in action for execution plan , may I know if we can add the customized info to the extended events results file also ? Such as when the deadlock related event happens , then we can run a query to get some info ,then added the info along with other info such as sql_text, dbname etc to the events trace results file either ? The reason is if we also know the execution plans when the deadlock happens, it is useful to turning the query based on the execution plans to reduce deadlock happening .
We have installation of Dbase Engine and SSIS that is PRODUCTION, and want to replace with newer hardware. In "the old days", we built "boxname_new" and installed SQL with "sqlname_new", took PROD users off-line, and quickly renamed original boxes/SQL and new boxes/SQL to original name, copied data and off we went with upgrade.
NOW, the "renaming" option for SQL tools is not supported, but with re-installation.
Has anyone developed game plan steps for accomplishing hardware upgrade, including SQL environment swap with MINIMAL downtime for PRODUCTION environment? Can you share?
I am unable to the access on table even after providing the SELECT permission on table.
Used Query by me :
Here Test is schema ; Card is table ; User is Satish
To grant select on Table
GRANT SELECT ON TEST.Card TO satish Even after this it is not working, So provided select on schema also. used query : GRANT SELECT ON SCHEMA::TEST TO Satish.
We have found deadlocks in our application. Deadlocks occure between SELECT and UPDATE. I get deadlock graph using profiler and find that SELECT makes SIU lock. Below you'll find SELECT statement:
select t1.* from MyTable t1 --self join on field1 and field2 left outer join (select field1, field2
What are the optimal values for this parameters? How it depends from queries characteristics?I create an application that insert some data in database. It'll work on different servers with different load and performance. I want to prevent timeout exceptions.
I have a full backup scheduled at 12.00AM ET and have a batch import on 11.55PM(5 min before full backup) which takes 30min to complete .Will the backup cover the data which is being imported?
I get this error when inserting data..The INSERT statement conflicted with the FOREIGN KEY constraint FK_Participant_ Log_BiometricInstance_ Participant_ Activities". The conflict occurred in database "ProvantCustomerPortal", table "dbo.Activities", column 'Id'.The statement has been terminated.
My query looks like this :
insert into [dbo].[Participant_BiometricInstance](ParticipantId, ActivityId, ProviderTypeId, Fasting, ExternalSystemId, ResultsDate, ModifiedBy, ModifiedDate) select participantID,'','','',NULL,getdate(),NULL,getdate() from [dbo].[Participant_Profile]
the cursor at the bottom iterates only to print the number of rows.The problem is in the select. This takes 30 seconds to iterate through 1242 records.But if I add a TOP 1000000 or whatever number to the select, the same iteration takes less than a 1 second.I've tested each query without cursor, and both have the same cost and performance. (Not exactly the same plan)Note that I got the same performance improvement declaring the cursor as STATIC.Why the top is affecting the cursor iteration so much?
I have a fundamental problem with how CDC works for bulk updates.When CDC enabled table is updated for single row - My CDC system tables its recording it as update (3 & 4) which is perfect and what it should be. No Complains!But when I do a bulk update in the same CDC enabled tables for the same columns - My CDC system tables its recording as delete and then insert (1 & 2). This is not correct and this is what my problem is. We used triggers before CDC we did not face this problem with triggers every thing was fine with triggers other than performance.The way how the CDC is handling the bulk update is a big problem for me because based on the output of CDC system tables we are doing some migration work to legacy system.
It will be impossible for me to go and change my migration logic scripts because we have 100's or procedures in it.Is it a know problem with CDC? Is there any solution in CDC when a bulk update happens on a table the CDC system tables record it as updates. I don't think CDC 'net changes' in this situation because the net change would show as single inserted row.If this can't be done with CDC then I have to completely abandon CDC and go back to triggers..
I have a datetime field used to store a date of birth. When inserting the date of birth into the table it works fine. But when the date of birth is in the month of October, it takes an hour off. For example, if the date is 04-OCT-1993, it inserts as 03-OCT-1993 23:00.
It only happens for dates in October. It's being inserted via a .Net table adapter. It just so happens that October is the month that daylight savings kicks in parts of Australia (an hour is added), but I think this must be a coincidence.
I have an Excel file with .csv extension . it has on sheet with name Sheet1.
Now, I'm trying to insert this Excel data into one #temp table. I tried with syntax:
---------------- Exec sp_configure 'show advanced options', 1; RECONFIGURE; GO Exec sp_configure 'Ad Hoc Distributed Queries', 1; RECONFIGURE; GO EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0' , N'AllowInProcess' , 1;
[Code] ...
But, I'm getting error:
The OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)" reported an error. The provider did not give any information about the error.
Cannot initialize the data source object of OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)".
If I'm executing for .xls file this statement is working finr and rows are inserted into #temp table. How to take excel file of .csv extension??
At my customer's site they get this error trying to run a stored procedure I wrote that does BULK INSERT.
-2147217900 [Microsoft ODBC SQL Server Driver][SQL Server] You do not have permission to use the bulk load statement. upImportFromICPMSRaw 'GSADC1CompanyInstrumentOutputFilesICPMSNew185367.csv', tblFromICPMSRaw
The customer has SQL Server 2008 R2 Express installed
The connection string to the database works on everything else and it is the sa account with password
On my own development system with SQL Server 2008 R2 Standard, it works perfectly OK.
Hi All, I am new to SQL Server and having trouble using SQL Server Management Studio. I am unable to select the Database Engine in management studio. I am able to see the instance of default database engine (MSSQLServer) running in Reporting Services manager as well as in Surface area configuration manager, but it is not visible in the drop down list in Management Studio's "Select Database Engine" menu. I had removed Sql server 2005 earlier ( I was able to select the database engine in Management Studio then). But when I installed it again, I was unable to install the Sql Server Tools (it said that my Upgrade is blocked). So, I cleaned the Windows Registry of all keys containing 'Sql'. After this I tried installing it again and successfully installed Sql Server 2005 + ALL TOOLS. But this time I am unable to select the database engine in management studio. Thanks and Regards to ALL
i have a column with mulitple ids stored with commas . i want to pass ids and get data along with name from the table..how to get multiselect value in a variable in sql server function
I'm doing a INSERT...SELECT where I'm dependent on the records SELECT:ed to be in a certain order. This order is enforced through a clustered index on that table - I can see that they are in the proper order by doing just the SELECT part.
However, when I do the INSERT, it doesn't work (nothing is inserted) - can the order of the records from the SELECT part be changed internally on their way to the INSERT part, so to speak?
Actually - it is a view that I'm inserting into, and there's an instead-of-insert trigger on it that does the actual insertions into the base table. I've added a "PRINT" statement to the trigger code and there's just ONE record printed (there should be millions).
I have an Itanium 64bit server to run SSIS packages on. I have one package with three parralell streams. When I run the package in 64 bit mode using dtexec, it runs through validation and exits with no reported errors, when I run it from a job, the job fails and says to see job log, which has no errors.
When I run it in 32 bit mode using the GUI, it runs all the way through.
Does anyone know how to launch SSIS in 32 bit mode from a job on an Itanium?
This is a really wide spread - more than a time discussed - on SQL CE MSDN Forums - Issue !!! Is there any way i can commit changes which happens during runtime (when i am developing the application) such as inserts/updates and deletes to the .sdf DB on the machine ?????
As our DB has no primary keys or indexes ive taken a copy of all populated tables and tried to force primary keys within a new DB.
the problem is all off the tables have multiple datasets within them, a dataset for each year. This causes all instances of ID numbers to not be unique as they are replicated for every year they are active.
Its a school database so a student who has been here for 3 years will have 3 instances of his ID number, one for each years' data set.
So how do i force primary keys if there is no unique identifier? ive been highlighting both data set and ID columns and setting that combination as the primary key.
Essentially i need to analyse the relationships between the tabls in a diagram and also run some speed tests to see how fast the db works when it has indexes and primary keys.
the reason im writing is that ive done this on ten tables and with another 160 to do im just checking im doing the right thing?
CASE WHEN CAST(wo.start_date AS TIME) BETWEEN '00:00:00' AND '00:59:59' THEN 0 WHEN CAST(wo.start_date AS TIME) BETWEEN '01:00:00' AND '01:59:59' THEN 1 WHEN CAST(wo.start_date AS TIME) BETWEEN '02:00:00' AND '02:59:59' THEN 2 WHEN CAST(wo.start_date AS TIME) BETWEEN '03:00:00' AND '03:59:59' THEN 3 WHEN CAST(wo.start_date AS TIME) BETWEEN '04:00:00' AND '04:59:59' THEN 4
The purpose is to take a row and set it to the hour of the day that it occurred in. This works fine, however I would like to force it to display every hour 0-23 regardless of whether or not it has a corresponding row.
So, if no row exists for 0, display 0 with null values for the rest of the columns.
In the following procedure i write the results to a temp table called #temp1I now want to count the results of #temp1, if the count of #temp1 = 0 I want to insert 'No Records Found' into #temp.ERRORMSG else return what is in the table
any idea on how to do this?
ALTER PROC [dbo].[SPU_RPT_Savings_AnomalyDispatches] 40,'04/01/07|06/30/07' @PropertyID varchar(4000), @DropDown varchar(50)
AS SELECT Client.CLIENT, Client.CLIENTID, ErrorEmailLog.ID, ErrorEmailLog.SITEID, ErrorEmailLog.PROPID, ErrorEmailLog.DISTINCTERRORS, ErrorEmailLog.ERRORMSG, ErrorEmailLog.ERRORDATETIME, ErrorEmailLog.EMAILRECIPIENTS, Property.PROPERTY, Property.STREET, Property.CITY, Property.STATE, Property.ZIP, Property.PHONE INTO #TEMP1 FROM ErrorEmailLog INNER JOIN Property ON ErrorEmailLog.PROPID = Property.PROPID INNER JOIN Client ON Property.CLIENTID = Client.CLIENTID WHERE (ErrorEmailLog.ERRORDATETIME BETWEEN SUBSTRING(CONVERT(VARCHAR(12), @DropDown), 0, 9)