SQL Server 2014 :: Left Join With A Large Partitioned Table?
Aug 3, 2015
I have a query that has a left join with a large partitioned table. The partitioned table has 10s of millions of records, and each partition has about 100,000 records.
The left join is part of an insert that gets a column from the partitioned table, if the column exists. The query contains the partition ID and all other joined columns are part of a non-clustered index.
Through the profiler, I found that there were millions of reads and the execution plan was giving me a table scan on the partitioned table.
I changed the query to do the insert followed by an update with inner join. That did the trick, but it worries me that SQL Server 2014 behaves differently from 2012 or 2008R2, which can make upgrading very time consuming.
Looking to improve performance of the following code.
It basically generates future days for each dog. So there is a dog table and a day table with every day.
These 2 table cross join and then fill in missing rows. As time moves i will fill in further future dates but will need the initial insert to be a reasonable query.
All columns are covered by index's but the queries at the end take quite a long time. I would hope for index scan to just point out the missing rows especially on the final query.
How to make the last query as fast as possible.
IF OBJECT_ID('dbo.[AllDates]', 'U') IS NOT NULL DROP TABLE dbo.[AllDates] CREATE TABLE dbo.[AllDates] ( [Date] date not null PRIMARY KEY ) ;WITH Dates AS
I need to join 2 tables but the join needs to account for 2 seperate columns.
for example:
select a. type a. prod_code a. prod_type b. division
from table1 a left join table2 b on a. prod_code = b. prod_code and a. prod_type = b. prod_type
The issue is that you may have only the prod_code or prod_type and null value for the other in table1.
Ideally I want it to check for both then if 1 isn't available then it draws the division of the available. having both or one or the other determines the division it falls under.
SELECT * FROM a LEFT OUTER JOIN b ON a.id = b.id instead of
SELECT * FROM a LEFT JOIN b ON a.id = b.id
generates a different execution plan?
My query is more complex, but when I change "LEFT OUTER JOIN" to "LEFT JOIN" I get a different execution plan, which is absolutely baffling me! Especially considering everything I know and was able to research essentially said the "OUTER" is implied in "LEFT JOIN".
I was writing a query using both left outer join and inner join. And the query was ....
SELECT S.companyname AS supplier, S.country,P.productid, P.productname, P.unitprice,C.categoryname FROM Production.Suppliers AS S LEFT OUTER JOIN (Production.Products AS P INNER JOIN Production.Categories AS C
[code]....
However ,the result that i got was correct.But when i did the same query using the left outer join in both the cases
i.e..
SELECT S.companyname AS supplier, S.country,P.productid, P.productname, P.unitprice,C.categoryname FROM Production.Suppliers AS S LEFT OUTER JOIN (Production.Products AS P LEFT OUTER JOIN Production.Categories AS C ON C.categoryid = P.categoryid) ON S.supplierid = P.supplierid WHERE S.country = N'Japan';
The result i got was same,i.e
supplier country productid productname unitprice categorynameSupplier QOVFD Japan 9 Product AOZBW 97.00 Meat/PoultrySupplier QOVFD Japan 10 Product YHXGE 31.00 SeafoodSupplier QOVFD Japan 74 Product BKAZJ 10.00 ProduceSupplier QWUSF Japan 13 Product POXFU 6.00 SeafoodSupplier QWUSF Japan 14 Product PWCJB 23.25 ProduceSupplier QWUSF Japan 15 Product KSZOI 15.50 CondimentsSupplier XYZ Japan NULL NULL NULL NULLSupplier XYZ Japan NULL NULL NULL NULL
and this time also i got the same result.My question is that is there any specific reason to use inner join when join the third table and not the left outer join.
I've been using partitioned views in the past and used the check constraint in the source tables to make sure the only the table with the condition in the where clause on the view was used. In SQL Server 2012 this was working just fine (I had to do some tricks to suppress parameter sniffing, but it was working correct after doing that). Now I've been installing SQL Server 2014 Developer and used exactly the same logic and in the actual query plan it is still using the other tables. I've tried the following things to avoid this:
- OPTION (RECOMPILE) - Using dynamic SQL to pass the parameter value as a static string to avoid sniffing.
To explain wat I'm doing is this:
1. I have 3 servers with the same source tables, the only difference in the tables is one column with the server name. 2. I've created a CHECK CONSTRAINT on the server name column on each server. 3. On one of the three server (in my case server 3) I've setup linked server connections to Server 1 and 2. 4. On Server 3 I've created a partioned view that is build up like this:
SELECT * FROM [server1].[database].[dbo].[table] UNION ALL SELECT * FROM [server2].[database].[dbo].[table] UNION ALL SELECT * FROM [server3].[database].[dbo].[table]5. To query the partioned view I use a query like this:
SELECT * FROM [database].[dbo].[partioned_view_name] WHERE [server_name] = 'Server2'
Now when I look at the execution plan on the 2014 environment it is still using all the servers instead of just Server2 like it should be. The strange thing is that SQL 2008 and 2012 are working just fine but 2014 seems not to use the correct plan.
To join the table but MUST follow the condition as bitActiv = TRUE: select emp.nvcEmpName, emp.nvcEmpAddress, ety. nvcEmployeeType from cst_EmpProfile emp left join cst_EmpType on emp.intEmployeeTypee = ety.intEmpType and emp.bitActiv = 1.
But, the sql statement doesnt output the my expected result. Because the data row return must be 1st and 2nd row as it bitActiv = true. So, how's I going achieve what i want. tq.
I am trying to use an indexed view to allow for aggregations to be generated more quickly in my test data warehouse. The Fact Table I am creating the indexed view on is a partitioned clustered columnstore index.
I have created a view with the following code:
ALTER view dbo.FactView with schemabinding as select local_date_key, meter_key, unit_key, read_type_key, sum(isnull(read_value,0)) as [s_read_value], sum(isnull(cost,0)) as [s_cost] , sum(isnull(easy_target_value,0)) as [s_easy_target_value], sum(isnull(hard_target_value,0)) as [s_hard_target_value] , sum(isnull(read_value,0)) as [a_read_value], sum(isnull(temperature,0)) as [a_temp], sum(isnull(co2,0)) as [s_co2] , sum(isnull(easy_target_co2,0)) as [s_easy_target_co2] , sum(isnull(hard_target_co2,0)) as [s_hard_target_co2], sum(isnull(temp1,0)) as [a_temp1], sum(isnull(temp2,0)) as [a_temp2] , sum(isnull(volume,0)) as [s_volume], count_big(*) as [freq] from dbo.FactConsumptionPart group by local_date_key, read_type_key, meter_key, unit_key
I then created an index on the view as follows:
create unique clustered index IDX_FV on factview (local_date_key, read_type_key, meter_key, unit_key)
I then followed this up by running some large calculations that required use of the aggregation functionality on the main fact table, grouping by the clustered index columns and only returning averages and sums that are available in the view, but it still uses the underlying table to perform the aggregations, rather than the view I have created. Running an equivalent query on the view, then it takes 75% less time to query the indexed view directly, to using the fact table. I think the expected behaviour was that in SQL Server Enterprise or Developer edition (I am using developer edition), then the fact table should have used the indexed view. what I might be missing, for the query not to be using the indexed view?
I am using stored procedure to load gridview but problem is that i am not getting all rows from first table[ Subject] on applying conditions on second table[ Faculty_Subject table] ,as you can see below if i apply condition :-
Faculty_Subject.Class_Id=@Class_Id
Then i don't get all subjects from subject table, how this can be achieved.
Sql Code:- GO ALTER Proc [dbo].[SP_Get_Subjects_Faculty_Details] @Class_Id int AS BEGIN
I have an problem with the order of the results after a join.
My first query works fine and the order of field Name ist correct.
Select * FROM (SELECT * FROM dtree A1 WHERE A1.Subtype=31356 AND A1.DataID IN (select DataID from dtreeancestors where AncestorID=9940974)) t
When I do a join the order of the left table changes
Select * FROM (SELECT * FROM dtree A1 WHERE A1.Subtype=31356 AND A1.DataID IN (select DataID from dtreeancestors where AncestorID=9940974)) t, llattrdata A4 WHERE t.DataID = A4.ID
How can I do a join and keep the order of the left table?
I have a query that include a single LEFT JOIN. I would like to be able to select at most 1 row of the second table (providing that the JOIN represents a one to many relationship).
Does anyone knows how to do that? Thanks in advance, Joannès
I am still new to SQL and I am having trouble obtaining the results I need from a query. I have worked on this command for some time and searched the internet but cannot seem to still get it correct.
I have a table called Patient. It's primary key is pat_id.
I have a second table called Coverage. It has no primary key. The foreign keys are pat_id, coverage_plan_id, and hosp_status.
I have a third table called Coverage_History. It has a primary key consisting of pat_id, hosp_status, copay_priority, and effective_from.
I want to get the pat_id and all the coverage information that is current. The coverage table contains specific insurance policy information. The coverage_history table will indicate the effective dates for the coverage. So the tables could contain something like this:
Patient (pat_id and lname) P123 Monto P124 Minto P125 Dento P126 Donto
Coverage (pat_id, coverage_plan_id, hosp_status, policy_num) P123 MED1 OP A1499 P123 ACT4 OP H39B P124 MED1 OP C90009 P124 RAC OP 99KKKK P124 RAC OP 99KKKK P124 MED1 OP C90009 P125 ARP OP G190 P126 BCB OP H88
Coverage_History (pat_id, hosp_status, copay_priority, effective_from, coverage_plan_id, effective_to) P123 OP 1 20150102 MED1 NULL P123 OP 2 20150102 ACT4 NULL P124 OP 1 20150203 RAC 20150430 P124 OP 2 20150203 MED1 20150430 P124 OP 1 20150501 MED1 NULL P124 OP 2 20150501 RAC NULL P125 OP 1 20150801 ARP NULL P126 OP 1 20150801 BCB 20160101
select p.pat_id, p.lname, ch.coverage_plan_id, ch.hosp_status, ch.effective_from, ch.effective_to, ch.copay_priority, from patient p left join ( coverage_history ch left join coverage c on ch.coverage_plan_id = c.coverage_plan_id and ch.patient_id = c.patient_id and (ch.effective_to is NULL or ch.effective_to >= getdate() ) ) on ch.patient_id = p.patient_id
where ( ch.effective_to is NULL or ch.effective_to >= getdate() )
So I want to see:
P123 Monto MED1 OP 20150102 NULL 1 P123 Monto ACT4 OP 20150102 NULL 2 P124 Minto MED1 OP 20150501 NULL 1 P124 Minto RAC OP 20150501 NULL 2 P125 Dento ARP OP 20150801 NULL 1 P126 Donto BCB OP 20150801 20160101 1
I am trying to tweak some code which is used to display the newest comments left on photos created by my members.
The existing code is this:
SELECT top 15 pnumber,pcomment,puser FROM photocomments order by pdate DESC
So the latest comment left was for photo #210879 from user "Cla" (redacted user names). The 2nd newest comment would be for photo #211072 from a member named "mo". pdate is a date field
However for the script I have coded I don't want all of the photo comments to show up. This is because I use access levels based on the type of location (higher levels mean more restricted galleries). I check the access levels as I go through the recordsets.
I use this method to get the top 15 comments:
SELECT top 15 pnumber,pcomment,puser FROM photocomments order by pdate DESC
Now I have to use two other tables to determine the access level. Since PHOTOCOMMENTS is just a list of photo #'s and the people who left comments for those photos, I need to:
a) determine what location the photo is from and b) determine the access level of that location
I use: select creator,access from locations where id=(select dir from photos where id="&pnumber&")"
This is a two step process as you can see. The first part is:
select dir from photos where id=(pnumber)
ID is the same value as pnumber seen in PHOTOCOMMENTS. That is to say PHOTOS.ID = PHOTOCOMMENTS.PNUMBER
If I haven't confused you yet, the executed code for the first example would be:
select dir from photos where id=210879
which would get me a value for DIR. DIR is the location number which would be:
select creator,access from locations where id=(dir value)
Just to simplify it a bit....
There are three tables (shown below)
PHOTOCOMMENTS PHOTOS LOCATIONS
I need to: SELECT top 15 pnumber,pcomment,puser FROM photocomments order by pdate DESC (first table shown)
but then also
select creator,access from locations (The last table shown) where id=(select dir from photos where id="&pnumber&")"
So the first table PHOTOCOMMENTS has to also join PHOTOS table where PHOTOS.DIR = PHOTOCOMMENTS.PNUMBER in order to get the value of "DIR" and then DIR is joined to the LOCATIONS tables where PHOTOS.DIR = LOCATIONS.ID
Here is the actual code, which I am trying to make into a single SQL command
strSQL = "SELECT top 15 pnumber,pcomment,puser FROM photocomments order by pdate DESC" set ors = oconn.Execute(strSQL) tl = 0 do until ors.eof or tl > 15 ' until we have 15 results because not every recordset will be of the proper security level
[Code] ....
Bonus points if you can also get it to select from LOCATIONS only WHERE userlevel >= 2
I have a query that joins two large partitioned tables and depending on the values in the where clause, I can get dramatically different performance results.
The first query completed in around 7s and has 47,000 logical reads.
select mo.monitor_id,
mo.site_id,
mo.testtime,
sum(mo.NumBytes),
sum(mo.DNSTime),
sum(mo.ConnectTime),
sum(mo.FirstByteTime),
sum(mo.ContentTime),
sum(mo.RelocTime)
from monitor_raw mr(nolock), monitor_object mo(nolock)
where mr.monitor_id in (5339, 5341, 5342, 943842, 943866)
and mr.testtime between 'Oct 31 2007 3:00:00:000PM' and 'Nov 30 2007 3:00:00:000PM'
and mo.returncode = 200
and mr.site_id in (101,102,105,109,110,112,115,117,119,122,126,151,132,139,129,135,121,138,143,142,159,148,128,171,176,177,178,111,113,116,118,120,127,133,131,130,174,179,185,205,200,202,203,204,210,211,208,209,212,213,216,199,214,224,225,229,230,232,235,241,245,247,250,254,261,267,264,265,266,268,269)
and mr.escalationlevel = 0
and mr.monitor_id = mo.monitor_id
and mr.testtime = mo.testtime
and mr.site_id = mo.site_id group by mo.monitor_id, mo.site_id, mo.testtime
The second query takes 188s to complete and has 1.8m logical reads. The only difference between the two is the value of the monitor_ids in the where clause.
select mo.monitor_id,
mo.site_id,
mo.testtime,
sum(mo.NumBytes),
sum(mo.DNSTime),
sum(mo.ConnectTime),
sum(mo.FirstByteTime),
sum(mo.ContentTime),
sum(mo.RelocTime)
from monitor_raw mr(nolock), monitor_object mo(nolock)
where mr.monitor_id in (152682, 5339, 5341, 5342, 268080)
and mr.testtime between 'Oct 31 2007 3:00:00:000PM' and 'Nov 30 2007 3:00:00:000PM'
and mo.returncode = 200
and mr.site_id in (101,102,105,109,110,112,115,117,119,122,126,151,132,139,129,135,121,138,143,142,159,148,128,171,176,177,178,111,113,116,118,120,127,133,131,130,174,179,185,205,200,202,203,204,210,211,208,209,212,213,216,199,214,224,225,229,230,232,235,241,245,247,250,254,261,267,264,265,266,268,269)
and mr.escalationlevel = 0
and mr.monitor_id = mo.monitor_id
and mr.testtime = mo.testtime
and mr.site_id = mo.site_id group by mo.monitor_id, mo.site_id, mo.testtime
The two tables have clustered indexes on monitor_id, testtime and site_id. Comparing the execution plan, I can see why there is such a difference in performance. The second query performs a clustered index seek on the monitor_object table starting at the lowest monitor_id, testtime & site_id through the highest monitor_id, testtime & site_id. The first query performs a clustered index seek where the monitor_id, testtime and site_id equals the same values from the monitor_raw table.
My question is, how can I force the second query to use the same execution plan as the first so that I can get better performance?
One possible workaround that I could use is to execute five individual queries, one for each monitor_id and then union the results together but this would require significant code changes to my stored procs.
I want to find a way to get partition info for all the tables in all the databases for a server. Showing database name, table name, schema name, partition by (maybe; year, month, day, number, alpha), column used in partition, current active partition, last partition (for date partitions I want to know if the partition goes untill 2007, so I can add 2008)
all I've come up with so far is:
Code Block
SELECT distinct o.name From sys.partitions p inner join sys.objects o on (o.object_id = p.object_id) where o.type_desc = 'USER_TABLE' and p.partition_number > 1
I am getting different results with LEFT outer join operator and *= operator. With *= I am getting the expected results. Can anyone look at SQL and tell what I am doing wrong?
SQL with Left Outer join operator:
select CurrentWeekFinMetrics.[Hub+], WeeklyMetricsFormat.line#, WeeklyMetricsFormat.MetricsType, WeeklyMetricsFormat.Metrics, WeeklyMetricsFormat.Measure, WeeklyMetricsFormat.jobs, case when dataformatchar is not null then case when IsPrefix = 'Y' then dataformatchar + convert (varchar, CurrentWeekFinMetrics.displayCol ) else convert (varchar, CurrentWeekFinMetrics.displayCol ) + dataformatchar end else convert (varchar, CurrentWeekFinMetrics.displayCol ) end from WeeklyMetricsFormat LEFT JOIN CurrentWeekFinMetrics on (WeeklyMetricsFormat.Line# = CurrentWeekFinMetrics.Line#) where CurrentWeekFinMetrics.WeekEndingDate = '10/09/04' and CurrentWeekFinMetrics.[Hub+] = 'Amstelveen' order by CurrentWeekFinMetrics.[Hub+], WeeklyMetricsFormat.Line#
SQL with *= operator select CurrentWeekFinMetrics.[Hub+], WeeklyMetricsFormat.line#, WeeklyMetricsFormat.MetricsType, WeeklyMetricsFormat.Metrics, WeeklyMetricsFormat.Measure, WeeklyMetricsFormat.jobs, case when dataformatchar is not null then case when IsPrefix = 'Y' then dataformatchar + convert (varchar, CurrentWeekFinMetrics.displayCol ) else convert (varchar, CurrentWeekFinMetrics.displayCol ) + dataformatchar end else convert (varchar, CurrentWeekFinMetrics.displayCol ) end from WeeklyMetricsFormat , CurrentWeekFinMetrics where CurrentWeekFinMetrics.WeekEndingDate = '10/09/04' and CurrentWeekFinMetrics.[Hub+] = 'Amstelveen' AND (WeeklyMetricsFormat.Line# *= CurrentWeekFinMetrics.Line#)
For Left outer join operator, I am getting 54 rows, *= I am getting 69 rows.
OLEDB source 1 SELECT ... ,[MANUAL DCD ID] <-- this column set to sort order = 1 ... FROM [dbo].[XLSDCI] ORDER BY [MANUAL DCD ID] ASC
OLEDB source 2 SELECT ... ,[Bo Tkt Num] <-- this column set to sort order = 1 ... FROM ....[dbo].[FFFenics] ORDER BY [Bo Tkt Num] ASC
These two tasks are followed immediately by a MERGE JOIN
All columns in source1 are ticked, all column in source2 are ticked, join key is shown above. join type is left outer join (source 1 -> source 2)
result of source1 (..dcd column) ... 4-400-8000119 4-400-8000120 4-400-8000121 4-400-8000122 <--row not joining 4-400-8000123 4-400-8000124 ...
result of source2 (..tkt num column) ... 4-400-1000118 4-400-1000119 4-400-1000120 4-400-1000121 4-400-1000122 <--row not joining 4-400-1000123 4-400-1000124 4-400-1000125 ...
All other rows are joining as expected. Why is it failing for this one row?
We are in the middle of re-designing few tables (namely transaction tables) that would store very large data and would be hosted on cloud (Azure). The old design of this product breaks transaction tables into monthly tables. i.e. say ORDERS Table would be physically broke into twelve monthly tables over a year like ORDERS0115 (mmyy), ORDERS0215 and so on.
We are in the opinion that keeping the entire transactions in one Table is better. Would like to know what's the best practices for transaction tables like the one mentioned above? Is it better to use one table with partitions. I read somewhere that partitions can slow down SELECT queries if not designed and thought properly.Since this would be hosted on cloud (Azure), do you think some additional things are to be taken care? How a site like Amazon keeps their transactions tables?
SELECT m.lID FROM Message m inner join Message_Cc mCC on m.lID=mCC.lMessage and mCC.lOfficeRecipient = 200321 INNER JOIN UserRole d on mCC.szRecipient=d.szRoleName inner Join Map_UserAtOfficeToRole a2 on a2.lUserRole = d.lid AND d.nRecordStatus = 1
[Code] ....
If I run this without the LEFT OUTER JOIN and the is null statement I get 648 rows. But If I include it I get 0 rows. I can't understand why I get 0 rows with the outer join.
Why would I use a left join instead of a inner join when the columns entered within the SELECT command determine what is displayed from the query results?
I want to analyze procedure cache, to find inefficient plans and parameter issues.
I do it trow DMV But my requests to DMV are very slow and demand resources because procedure cache is about several GB Actually I dont need on-line analysis.
Is it possible to have fast snapshot of procedure cache?
I have a 2 node cluster having 4 cores each wherein having 3 instances of SQL 2008 R2 enterprise comprising of 60 databases, 20 on each instance. I need to setup mirroring for each of the databases to a secondary server having 4 cores and 3 instances. What i understand is that in this case the mirror server will be providing max of 512 worker threads and the 60 mirror databases would consume 240 threads.what all needs to be checked for looking into the feasabilty of going ahead with a async mirror setup as mentioned above.