I have a table that contains log data, usually around a million records. The
table has about 10 columns with various attributes of the logged data,
nothing special. We're using SQL Server 2000.
Some of the columns (for example "category") have duplicate values
throughout the records. We have a web page that queries the table to show
all the unique columns, for example:
select distinct CATEGORY from table TEST
Obviously the server has to scan all rows in order to get all unique columns
which takes quite a while, especially since that web page contains several
of these types of queries. We also have a MAX(DATE) and MIN(DATE) query that
also add to the load.
I already created indexes on the CATEGORY (actually on all categories)
column which might help a little but I'm pretty sure that there has got to
be a better way.
I also create a view (select distinct CATEGORY from table TEST) and tried to
index it, but it won't let me index a query that contains a DISTINCT
statement.
Isn't there a way to create an index that contains only the distinct values?
Is there another way to speed this up?
I am having a heck of a time figuring out what controls how/when the generated SQL for a report puts a DISTINCT clause in front of it.
For instance, not that this report makes any sense, but I have 58 rows in my fact table/entity €“ If I pull in a lookup field and execute, the distinct is put in the query and I basically get a list of the possible domain values. It runs the whole joined table query to get them, but it does list (in this case) just 4 records. Now I put in the primary ID of the fact entity and the distinct goes away and I get my 58 rows. If I put two lookup fields, the DISTINCT is back. If I pull in the description field (text string just a direct source field mapping not part of the identifying attributes), the distinct is there. If I pull in the Company Name field on a different entity (which is essentially the same as pulling in Description only it is part of the identifying attributes), there is no DISTINCT. I can pull in all my fields on this entity and none of them drive the distinct. And I swear (ok, I am probably lying but not on purpose) the field/attribute and roles properties are all the same on the attributes. But you get my general question/situation...
Any insight for me? Does it have to do with how I am building the report rather than the underlying model?
I was wondering if anyone can explain the positives and negatives of using a single stored procedure that contains one or more distinct queries. I know there are problems with dynamic SQL but I am not proficient enough to know whether this falls under that umbrella.
For clarification, what I am referring to is this: In a single stored procedure, I have a parameter called Query_ID that is used to identify which query in the sproc that I want to execute. Then from my ASP page, I simply pass the appropriate value for Query_ID. So:
IF @QUERY_ID = 1 BEGIN SELECT [whatever] FROM [tbl1] WHERE [conditions] GROUP BY [something] ORDER BY [somethingelse] END ELSE IF @QUERY_ID = 2 BEGIN SELECT [whatever] FROM [tbl2] WHERE [conditions] GROUP BY [something] ORDER BY [somethingelse] END END
INSERT INTO #LatLong SELECT DISTINCT Latitude, Longitude FROM RGCcache
When I run it I get the following error: "Violation of PRIMARY KEY constraint 'PK__#LatLong__________7CE3D9D4'. Cannot insert duplicate key in object 'dbo.#LatLong'."
Im not sure how this is failing as when I try creating another table with 2 decimal columns and repeated values, select distinct only returns distinct pairs of values.
The failure may be related to the fact that RGCcache has about 10 million rows, but I can't see why.
I need to run a SELECT DISTINCT query acrossmultiple fields, but I need to add another field that is NON-DISTINCTto my record set.Here is my query:SELECT DISTINCT lastname, firstname, middleinitial, address1,address2, city, state, zip, age, genderFROM gpresultsWHERE age>='18' and serviceline not in ('4TH','4E','4W')and financialclass not in ('Z','X') and age not in('1','2','3','4','5','6','7','8','9','0')and (CAST (ADMITDATE AS DATETIME) >= DATEDIFF(day, 60, GETDATE()))ORDER BY zipThis query runs perfect. No problems whatsoever. However, I need toalso include another field called "admitdate" that should be treatedas NON-DISTINCT. How do I add this in to the query?I've tried this but doesn't work:SELECT admitdateFROM (SELECT DISTINCT lastname, firstname, middleinitial, address1,address2, city, state, zip, age, gender from gpresults)WHERE age>='18' and serviceline not in ('4TH','4E','4W')and financialclass not in ('Z','X') and age not in('1','2','3','4','5','6','7','8','9','0')and (CAST (ADMITDATE AS DATETIME) >= DATEDIFF(day, 60, GETDATE()))ORDER BY zipThis has to be simple but I do not know the syntax to accomplishthis.Thanks
Hello, I have written a small asp.net application, which keeps record of the proposals coming from the branch offices of a bank in a tableCREATEd as a TABLE Proposals ( ID smallint identity(7,1), BranchID char(5), Proposal_Date datetime ) This app also calculates the total number of proposals coming from a specific branch in a given date bySELECTing COUNT(BranchID) FROM Proposals WHERE BranchID=@prmBranchID AND Proposal_Date=@prmDateand prints them in a table (my target table). This target table has as many rows as the result of the "SELECT COUNT( DISTINCT Proposal_Date ) FROM Proposals"and excluding the first column which displays those DISTINCT Proposal_Dates, it also has as many columns as the result of the"SELECT DISTINCT BranchID FROM Proposals". This target table converts the DateTime values ToShortDateString so that we are able to see comfortably which branch office has sent how many proposals in a given day. So far so good, and everything works fine except one thing: Certain DateTime values in the Proposals table which are of the same day but of different hours (for ex: 11.11.2005 08:30:45 and11.11.2005 10:45:30) cause some trouble in the target table, where "SELECT COUNT( DISTINCT Proposal_Date ) FROM Proposals" is executed, because (as you might already guess) it displays two identical dates in ShortDateString form, and this doesn't make much sense (i.e. it causes redundant rows) What I need to do is to get a result like (in a neat fashion :) "SELECT COUNT( DISTINCT Proposal_Date ) <<DISTINCT ONLY IN THE DAYS AND NOT IN HOURS OR MINUTES OR SECONDS>> FROM Proposals" So, how to do it in a suitable way? Thanks in advance.
Okay, I've been working on this for a couple of hours with no success. I'm trying to find the number of telephone numbers that are associated with multiple students at different school sites. I've created a temp table that lists all phone numbers that are associated with more than one student. I'm now trying to query that table and count the number of telephone numbers that are associated with more than one site. Essentially, I'm looking for parent/guardians that have students at different sites.
Here's an example of what I'm hoping to accomplish:
*In this example, I'm just trying to get a count of the different/distinct school sites associated with each number. If I can, at the same time, limit it to a count of > 1 (essentially excluding parents with students at the same site), even better :)
HelloWhen I use a PreparedStatement (in jdbc) with the following query:SELECT store_groups_idFROM store_groupsWHERE store_groups_id IS NOT NULLAND type = ?ORDER BY group_nameIt takes a significantly longer time to run (the time it takes forexecuteQuery() to return ) than if I useSELECT store_groups_idFROM store_groupsWHERE store_groups_id IS NOT NULLAND type = 'M'ORDER BY group_nameAfter tracing the problem down, it appears that this is not preciselya java issue, but rather has to do with the underlying cost of runningparameterized queries.When I open up MS Enterprise Manager and type the same query in - italso takes far longer for the parameterized query to run when I usethe version of the query with bind (?) parameters.This only happens when the table in question is large - I am seeingthis behaviour for a table with > 1,000,000 records. It doesn't makesense to me why a parameterized query would run SLOWER than acompletely ad-hoc query when it is supposed to be more efficient.Furthermore, if one were to say that the reason for this behaviour isthat the query is first getting compliled and then the parameters aregetting sent over - thus resulting in a longer percieved executiontime - I would respond that if this were the case then A) it shouldn'tbe any different if it were run against a large or small table B) thisperformance hit should only be experienced the first time that thequery is run C) the performance hit should only be 2x the time for thenon-parameterized query takes to run - the difference in response timeis more like 4-10 times the time it takes for the non parameterizedversion to run!!!Is this a sql-server specific problem or something that would pertainto other databases as well? I there something about the coorect use ofbind parameters that I overall don't understand?If I can provide some hints in Java then this would be great..otherwise, do I need to turn/off certain settings on the databaseitself?If nothing else works, I will have to either find or write a wrapperaround the Statement object that acts like a prepared statement but inreality sends regular Statement objects to the JDBC driver. I wouldthen put some inteligence in the database layer for deciding whetherto use this special -hack- object or a regular prepared statementdepending on the expected overhead. (Obviously this logic would onlybe written in once place.. etc.. IoC.. ) HOWEVER, I would desperatelywant to avoid doing this.Please help :)
If I know this SELECT will get me unique username to configname records:Select DISTINCT configname, username FROM EtechModelRequests
JOIN CC_host.dbo.usr_smc ON username = user_id
JOIN Webservices.dbo.PartNumberPricingImport ON PartNumber = configname
Where RequestDateTime > '9/26/2007' And country = 'US' And interfacename Like '%download%' And result = 0 How do I show the other fields I need? The field I need is List Price but I don't want to DISTINCT on it too.
Hi I have two tables linked by MemberID and I'll like to produce a list from the two tables but also want to specify one table to only use the Top 50 distinct records
Table 1 (top 50 Distinct) Memberid (distinct)PictureDateTakenGalleryID Order By DateTaken Table 2 MemberidFirstNameLastNamePrivatelist MemberidPictureFirstNameLastNameGalleryIDPrivate
I have a query and it is bringing up multiples of the same data what i would like to know is if there is a way to use something like Distinct that i can use as a clause such as
Select * from Table Where Distinct ColumnName
I know that distinct doesnt work in this situation I would like to know if there is a command to do this or a way to fix it if i use this
select distinct columnName,columnName2 from table
it returns the rows where columnName and ColumnName2 both are not the same My purpose is that i need to select more than one column but i would like non duplicates based on ONE column name
I have 6 fields in my table and I want to display only distinct values from one of those columns. However, I also need to use the fields from the remaining 5 columns in my asp page. For example,
Columns: CompID CompanyName Ticker Industry CEOName MarketCap
In one table on the page, I want to display only unique entries for 'CompanyName' which I've been doing with this statement:
SqlText = "SELECT DISTINCT CompanyName FROM [tablename] WHERE UserID=" & intUserID & ""
However, I also need to use the other values associated 'CompanyName' in my asp page after opening my recordset. I need to display <%=Rs("CompID"%>, <%=Rs("Ticker"%>, <%=Rs("Industry"%> etc. while maintaining the DISTINCT portion of my statement.
I am trying to grab the distinct category so only 'male and female' is output, but i am also needing to grab the id, file_name to use later on the page as well.
When i try
select distinct category, id, file_name from tbl_pictures
it outputs all the records since there is no exact match in all 3 fields,
but i am wanting this to happen
select distinct category from tbl_pictures
but i am still needing to grab the other two fields because i need to use them in the next part of the page
Hi, I have data in several tables and Im having trouble filtering the data.
This is statement that Im executing: 1. SELECT DISTINCT Countries.Name, Companies.ShortName, Persons.FirstName, Persons.LastName, PersonSkills.Skills
This is the statement that gives me the results I want: 2. SELECT DISTINCT Countries.Name, Companies.ShortName, Persons.FirstName, Persons.LastName
The problem with this is I need to have Persons.Skills in the statement and DISTINCT doesnt filter the data the way I need it to because the data in Persons.Skills is different which results in I get duplicated results.
Is it possible to do something like SELECT DISTINCT column1,..,column5,(SELECT Persons.Skills)
by this I mean to apply distinct to some of the columns in the SELECT statement?
I'm trying to find the Distinct for CODEID but it's still returning duplicate data. What is the best way to write this query so that the result I get are Distinct Codeid?
SELECT DISTINCT dbo.AgreementDurationCode.CodeID, dbo.Item.PartNumber, dbo.ItemShadow.AgreementDurationCode, dbo.AgreementDurationCode.CodeCategory, dbo.AgreementDurationCode.CodeCategoryID, dbo.AgreementDurationCode.Code, dbo.AgreementDurationCode.CodeName, dbo.AgreementDurationCode.CodeAbbreviation, dbo.AgreementDurationCode.CodeShortAbbreviation, dbo.Item.ItemName, dbo.AgreementDurationCode.CodeMaxcimAbbreviation, dbo.ItemShadow.ItemStatusCode, dbo.ItemShadow.LicenseTypeCode FROM dbo.Item INNER JOIN dbo.ItemShadow ON dbo.Item.ItemID = dbo.ItemShadow.ItemID INNER JOIN dbo.AgreementDurationCode ON dbo.Item.AgreementDurationCodeID = dbo.AgreementDurationCode.CodeID WHERE (dbo.ItemShadow.LicenseTypeCode IN ('SEL', 'OLP', 'OLV')) AND (dbo.ItemShadow.ItemStatusCode = 'COM')
I am using SQL Server 2000. I have 3 fields in my table. I want to do distinct on one field but also want to show the remaining two fields in the result. How can I do that?
Is there some way to use the distinct keyword so that it applies ONLY to a subset of the items in the select list???
For example, suppose I want to select col_1, col_2, col_3, col_4
but I do not want distinct applied against all 4 items... Maybe I want all 4 items in the selection list, but I want distinct to use only col_3 as its filtration criteria...
I know the syntax shown below is not valid, but I am showing it anyway because I am hoping it will illustrate what I am looking for...
Is there a VALID syntax that is something like this???
I am trying to pull information out of a table using distinct, but instead of just pulling a certain column I want to pull multiple columns in a row?
However when I use the command below I only get the "workitem_number" column available, where there are approx. 4 other columns that I need (workitem_title, workitem_comment, ETC) When I add the additional columns (after the distinct statement) it doesn't work due to the type of data.
"SELECT DISTINCT workitem_number from workitem_cost_view where assigned_to_worker_nt_id = '" & Request.QueryString("Name") & "' AND workitem_start_on = '" & Request.QueryString("schDate") & "' order by workitem_number"
Is there any way to pull multiple columns, based only on distinct for one column?
Hello - I am having trouble with my sql statement:
SELECT * FROM (SELECT DISTINCT rfs_id FROM tbl_comment) DT INNER JOIN summary_rfs t1 ON DT.rfs_id = t1.rfs_id INNER JOIN tbl_callStatus as t2 on t1.callstatus_id=t2.callstatus_id INNER JOIN tbl_user as t3 on t1.requestor=t3.user_name INNER JOIN tbl_user as t4 on t1.assigned_to=t4.user_name WHERE date_opened BETWEEN '1/1/2008' AND '1/1/2009' AND ASSIGNED_TO='Elaine Tran' ORDER BY priority_id ASC
this runs fine - but I want to retrieve data from tbl_comment how can I do this to display info from tbl_comment but not see the duplicates - can I flatten or merge the data?
Hello,I have a tableItemID Version12 1.012 1.112 2.013 2.013 1.014 1.015 1.015 5.015 2.1How do I write a Select query to get me all distinct item IDs, whichmare of the latest version?Like this:ItemID Version12 2.013 2.014 1.015 2.1Any help would be appreciated.Thanks
I love reporting services 2005 BUT have struck a major limitation!
Basically I need a sum distinct function. I have various duplicate details lines and just need to sum the unique values. Anyway this is not possible and a number of people a stuck with this. Yes you can write another sql statement using DISTINCT but then how can you easily integrate that into a table with scope? you can't!
Anyway has anyone been able to achieve this nicely in reporting services? I was thinking of calling a distinct SQL statement from an Expression in a text box on a header field and passing another text box as a parameter to get around this limitation. Is this possible?
HiI wanna write a proc that returns Distinct CustomerID's in a table and returns result in output parameterWhen i try this, i get error - incorrect syntax near distinct. Any ideas??ALTER PROCEDURE proc_Report_CountCustomers_Sept( @CustCount int OUTPUT)ASSET NOCOUNT ON select @CustCount = distinct(CustomerID)from OrdersWhere OrderDate > '2006-09-01' and OrderDate < '2006-10-01'
OK I have a Forum on my website make up of 3 tablesTopisThreadsMessageI show a list of the 10 most recent Changed Threads. My Problem is that my Subject field is in the messages Table, IF I link Threads to Messages then try to use Select Disticnt I get mutliple Subject fields as the messsges are not unique (obvisally) So I want to get the top 10 Threads by postdate and link to the Messages table to get the Subject headerAny help? Or questions to explain it better?
OK heres what I have so far: SELECT TOP (100) PERCENT dbo.EVENTS.EVENTIME, dbo.EMP.LASTNAME, dbo.EMP.FIRSTNAME, dbo.UDFEMP.EXT, dbo.READER.READERDESC, dbo.EVENTS.DEVID, CASE WHEN (dbo.EVENTS.DEVID = '23' OR dbo.EVENTS.DEVID = '24' OR dbo.EVENTS.DEVID = '25' OR dbo.EVENTS.DEVID = '26') THEN 'OUT' ELSE 'IN' END AS STATUSFROM dbo.READER INNER JOIN dbo.EVENTS ON dbo.READER.READERID = dbo.EVENTS.DEVID INNER JOIN dbo.UDFEMP INNER JOIN dbo.EMP ON dbo.UDFEMP.ID = dbo.EMP.ID ON dbo.EVENTS.EMPID = dbo.EMP.IDWHERE (CONVERT(CHAR, dbo.EVENTS.EVENTIME, 101) = CONVERT(CHAR, GETDATE(), 101)) AND (dbo.EVENTS.EMPID <> 0)ORDER BY dbo.EVENTS.EVENTIME Works great, however, I need to display only one instance of each employee and that instance should be the latest instance found. So instead of several differant emplyees with several different "IN" and "OUT" times:EvenTime FirstName Last Name Ext ReaderDesc DevID Status
12/28/2006 7:22:01 AM Rechie Michael 75766 SE Glass Door 21 IN
12/28/2006 7:10:01 AM Rechie Michael 75766 SE Glass Door 21 OUT
12/28/2006 7:01:01 AM Rechie Michael 75766 SE Glass Door 21 IN I just want the latest record for any given employee regardless of wether its status is "IN" or "OUT": EvenTime FirstName Last Name Ext ReaderDesc DevID Status
12/28/2006 7:22:01 AM Rechie Michael 75766 SE Glass Door 21 IN EVENTS.EMPID would be what I would want to be DISTINCT but I dont know how to use it in the code above and I dont know how Id specify DISTINCT based on latest time found. Any help/direction would be greatly appreciated. TIA,Stue
I have this stored procedure: ALTER PROCEDURE usp_My_Procedure ( @Country varchar(5) ) AS SELECT DISTINCT City, Short FROM Table1 WHERE Country = @Country RETURN I want to select just one of each 'city' and 'short' in the database....But this is not working correct......Whats wrong? Lets say that I have a table that looks something like this City Short New York NY Los Angeles LA Lake Alice LA Los Angeles LosAng