CREATE TABLE [dbo].[Audit] (
[id] [int] identity (1,1) NOT NULL ,
[col1] [char] (10) NOT NULL ,
[col2] [char] (15) NOT NULL ,
[col3] [int] NOT NULL ,
[col4] [varchar] (50) NOT NULL ,
[col5] [datetime] NOT NULL ,
[col6] [varchar] (4000) NULL ,
[col7] [char] (3) NULL
)
GO
This table grows to about 40 million rows during the course of the month. The table has a clustered index on the id field and a non clustered index on the col2 and col3. The id column is not used in queries. At the moment we run weekly dbcc reindexes on all the indexes. We are running into a space issue on the reindex of the clustered index (copying the whole table out , ordering etc) and are considering dropping the index or changing to a non clustered index. (The DBCC utility that we have built will only rebuilt all the indexes or none at all.)
I feel this is not a good idea and know my reasons. I would like some input as to why this might prove a bad idea.
Will it increase page splitting? Will the table performance be impacted even if the queries are not specifically using the clustered index?
There is now a NEW SQL discussion Group currently being tested; http://www.baysignia.com/discussions/discussions.xml Try it out and send comments to faq@baysignia.com
I'm creating a discussion forum for my website, using Sql Server 2000. I need to display 'Number of threads', 'Number of posts', 'Last post by' (username/id and date) for each forum, and 'Number of replies' and 'Last post by' (username/id and date) for each thread.Here are a couple of ideas I have come up with to solve this problem:1) Poll the database (using a stored procedure that returns the number I'm looking for) for each forum when I loop all the fourms. - I suspect this approach isn't optimal, since it creates more traffic to the database.2) Have fields in my Forum and ForumPost tables for 'Number of threads' and so on. Now, create triggers that updates these fields every time a post is made. - I guess this would be much more effective than the first apporach, since everything is done on the database server directly.Are there any other ways that are better? Please advice! Thanks a bunch for any help!
I'm sure this has been a topic in the past. I would like to build a fact sheet about each one and do a compairison. Are there any articles/links/resources out there that speak to this issue?
OK, I`ve been researching the use of the TEXT datatype all day and would like opinions on what I`ve found.
First, a little background. I have been tasked with writing an ASP application to handle the display of FAQs for a company`s products. I would like to store all info in a table much like
faqID int question TEXT answer TEXT
Simple enough, right? I then tried to create a stored procedure to add a new FAQ and all hell broke loose. ASP would not pass anything larger than 255 chars to the stored procedure.
I read in the "ADO and SQL Server Developer`s Guide" from Microsoft about using varchar datatypes of 255 chars (instead of TEXT) and chunking large text up to fit in these smaller datatypes. This seems like a lot of work.
I also read in "Inside SQL Server 6.5" that "The text datatype is sometimes awkward to work with. Many functions don`t operate against text, stored procedures are limited in what they can do with text, and some tools don`t deal with it well." (page 632). This statement concerns me greatly. How are stored procedures limited in dealing with TEXT? Do the standard SQL UPDATE and INSERT commands work or must READTEXT and UPDATETEXT be used instead?
I guess my question is, what is the best way to accomplish this? I have a feeling that others have had to do this before. Is SQL Server not meant to handle large textual objects? Is chunking the best way to go? Will version 7.0 handle this scenario better?
I would like to hear your thoughts on a philosophy I adhere to.
As a rule of thumb I've always preached that Unique Indexes are for linking tables and Primary Keys are used to ensure that records aren't duplicated. I’ve embraced this philosophy for a couple reasons, the main one being that I don’t have to create numerous foreign key fields in the foreign key table.
However I’ve done most of my programming in Access and am now in need of something more robust (SQL Server v7) and I’m wondering if I need to reconsider.
I do also have a how to question; that being is it possible to create a table join on a unique index in SQL Server v7 and if so how? I would like to have an Auto Number / Auto Incremented / Unique Identifier field in the Primary Key table that links to a numeric field in the Foreign Key table.
I have been doing relational database forever(or a long time) and have been intruduces to a team that uses a highly normalized database(propietary) to manage workflow.
We are capturing data in an AUDIT Trail EAV format.(500 million rows)
It is my task to build this into a data warehouse for reporting and I need to have with my team a relational database discussion. The relational database knowledge on this team is DB2 based, IDMS, and other past evolutions.
The common processes used are recieve a flat file and process this file sequentially using C# or VB doing lookups of other databse tables and writing out another flat file to be converted in XML for load to the propritary system.
My goal is to attempt to introduce new design concepts to my team and these are some talking points that I have come up with for a lunch and learn session.
can anyone else add to this list I don't want to get into a deep discussion about 3rd NF, Star Schemas vs Snowflake, etc.. I want to keep is informational and light to eliceit discussion and relat it back to older technologies.
some of the topics we can discuss are: Why the data warehouse Real-time tables what needs to stay in prod What is going to happen to reporting database Interaction between database on the same cluster/server Interaction between databases on different servers (linked servers not allowed) Set processing as opposed to cursor processing. Table types EAV Type1 Type2 Fact Dimensions Code
I'm a DB teacher in a University. Planning and developing SQL exercises for my students I found a "tricky" or extrange sql exercise.
The sql exercise using Northwind DB reads as follows: "Wich is the total amount of the freight that corresponds to all orders containing products of seafood category ?"
The first sql code that comes to my mind was:
select sum(freight) from orders o, "order details" d, products p, categories c where o.orderid=d.orderid and d.productid=p.productid and p.categoryid=c.categoryid and categoryname='seafood'
This is equivalent code constructed with Query constructor in Enterprise Manager: SELECT SUM(dbo.Orders.Freight) AS Expr1 FROM dbo.Orders INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID WHERE (dbo.Categories.CategoryName = N'seafood')
These 2 equivalent queries output 27722.9600 as result, but the correct result (total freight) is 23791.1400.
Did you also think (as I did) that the exposed code is correct?
Now I know the correct sql code that gives the corect answer and also the explanation why the code exposed fails, but before share it under your posible request, want to know your tech comments about this exercise in order to know if this particular exercise is "tricky" (make people to fail) or is my total fault and need to review my strategies of applying/constructing sql (and teaching).
I am about to prepare a paper concerning the field of real-time data mining. Real-time here means the process of incremental training of an existing model as soon as the data arrives.
There is a number of papers introducing algorithms for incremental association analysis, incremental clustering etc. Stream mining Ãs a field which is closely related to that. The main reason for the implementation of incremental algorithms is a) the large amount of data to be mined and b) the high rate of new data that is evolving every day.
Using classical batch mining algorithms, models that are outdated for some reason, would have to be re-trained, which could be very time consuming for billions of records. And once the training is completed, the training would have to be restarted once again because a bulk of new data has been arrived.
The question that I would like to discuss now is: For what real world applications would it be a meaningful or even essential to use real-time training of models?
Two main reasons could determine the answer to that question:
You just want to incorporate new data into existing models in order to increase the prediction accuracy of your model or Your underlying data is subject to more or less massive changes (also refered to as concept drift) and you want to adapt your mining model continuously to that reality.
I'm looking for some examples or ideas where one of these cases apply and it would be a good idea to have incremental mining algorithms involved.
I'm looking forward to inspiring some discussion on that issue.
please explain the differences btween this logical & phisicall operations that we can see therir graphical icons in execution plan tab in Management Studio
I am using Full Text Index to index emails stored in BLOB column in a table. Index process parses stored emails, and, if there is one or more files attached to the email these documents get indexed too. In result when I'm querying the full text index for a word or phrase I am getting reference to the email containing the word of phrase if interest if the word was used in the email body OR if it was used in any document attached to the email.
How to distinguish in a Full Text query that the result came from an embedded document rather than from "main" document? Or if that's not possible how to disable indexing of embedded documents?
My goal is either to give a user an option if he or she wants to search emails (email bodies only) OR emails AND documents attached to them, or at least clearly indicate in the returned result the real source where the word or phrase has been found.
Web Base application or PDA devices use to initiate the order from all over the country. The issue is this table is not Partioned but good HP with 30 GB RAM is installed. this is main table that receive 18,0000 hits or more. All brokers and users are using this table to see the status of their order.
The always search by OrderID, or ClientID or order_SubNo, or enter any two like (Client_ID+Order_Sub_ID) or any combination.
Query takes to much time when ever server receive more querys. some orther indexes are also created on the same table like (OrderDate, OrdCreate Date and Status)
My Question are:-
Q1. IF Person "A" query to DB on Client_ID, then what Index will use ? (If any one do Query on any two combination like Client_ID+Order_ID, So what index will be uesd.? How does MS-SQL SERVER deal with these kind of issues.?
Q2. If i create 3 more indexes on ClientID, ORderID and OrdersubID. will this improve the performance of query.if person "A" search record on orderNo so what index will be used. (Mind it their would be 3 seprate indexes for Each PK columns) and composite-Clustered index is also available.?
Q3. I want to check what indexes has been used? on what search?
Q4. How can i check what table was populated when, or last date of update (DML)?
My Limitation is i Dont Create a Partioned table. I dont have permission to do it.
In Teradata we had more than 4 tb record of CRM data with no issue. i am not new baby in db line but not expert in sql server 2003.
My SSIS package is running very slow taking so much time to execute, One task is taking 2hr for inserting 100k records, i have disabled unused index still it is taking time.I am rebuilding/Refreshing indexes and stats once in month if i try to execute on daily basis will it improve my SSIS Package performance?Â
hello friends i have table1 and 200 coulumn of table1 :) i have 647.600 records. i entered my records to table1 with for step to code lines in one day :) i select category1 category2 and category3 with select code but i have just one index.. it is productnumber and it is primarykey..So my select code lines is so slow.. it is 7-9 second.. how can i select in 0.1 second ? Should i create index for category1 and category2 and category3 ? But i dont know create index.. My select code lines is below.. Could you learn me and show me index for it ?? or Could you learn me and show me fast Select code lines and index or etc ??? Also my search code line have a dangerous releated to attaching table1 with hackers :) cheersi send 3 value of treview1 node and childnode and child.childnode to below page.aspx :) Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load If Not Me.IsPostBack Then If Request("TextBox1") IsNot Nothing ThenTextBox1.Text = Request("TextBox1") End If If Request("TextBox2") IsNot Nothing ThenTextBox2.Text = Request("TextBox2") End If If Request("TextBox3") IsNot Nothing ThenTextBox3.Text = Request("TextBox3") End If End If Dim searchword As String If Request("TextBox3") = "" And Request("TextBox2") = "" Then searchword = "Select * from urunlistesi where kategori= '" & Request("TextBox1") & "'" End If If Request("TextBox3") = "" Then searchword = "Select * from urunlistesi where kategori= '" & Request("TextBox1") & "' and kategori1= '" & Request("TextBox2") & "'" End If If Request("TextBox3") <> "" And Request("TextBox2") <> "" And Request("TextBox1") <> "" Then searchword = "Select * from urunlistesi where kategori= '" & Request("TextBox1") & "' and kategori1= '" & Request("TextBox2") & "' and kategori2= '" & Request("TextBox3") & "'" End If SqlDataSource1.SelectCommand = searchword End Sub
I'm running a merge replication on a sql2k machine to 6 sql2k subscribers. Since a few day's only one of the merge agents fail's with the following error:
The merge process could not retrieve generation information at the 'Subscriber'. The index entry for row ID was not found in index ID 3, of table 357576312, in database 'PBB006'.
All DBCC CHECKDB command's return 0 errors :confused: I'm not sure if the table that's referred to in the message is on the distribution side or the subscribers side? A select * from sysobjects where id=357576312 gives different results on both sides . .
Hi everyone, When we create a clustered index firstly, and then is it advantageous to create another index which is nonclustered ?? In my opinion, yes it is. Because, since we use clustered index first, our rows are sorted and so while using nonclustered index on this data file, finding adress of the record on this sorted data is really easier than finding adress of the record on unsorted data, is not it ??
I have a clustered index that consists of 3 int columns in this order: DateKey, LocationKey, ItemKey (there are many other columns in this data warehouse table such as quantities, prices, etc.).
Now I want to add a non-clustered index on just one of the other columns, say LocationKey, like this: CREATE INDEX IX_test on TableName (LocationKey)
I understand that the clustered index keys will also be added as key columns to any NC indexes. So, in this case the NC index will also get the other two columns from the clustered index added as key columns. But, in what order will they be added?
Will the resulting index keys on this new NC index effectively be:
LocationKey, DateKey, ItemKey OR LocationKey, ItemKey, DateKey
Do the clustering keys get added to a NC index in the same order as they are defined in the clustered index?
Quick question about the primary purpose of Full Text Index vs. Clustered Index.
The Full Text Index has the purpose of being accessible outside of the database so users can query the tables and columns it needs while being linked to other databases and tables within the SQL Server instance. Is the Full Text Index similar to the global variable in programming where the scope lies outside of the tables and database itself?
I understand the clustered index is created for each table and most likely accessed within the user schema who have access to the database.
Is this correct?
I am kind of confused on why you would use full text index as opposed to clustered index.
One of the major syntax inside the SELECT statment is ..
WHERE FIELDA IN (SELECT PARAVALUE FROM PARATABLE WHERE SESSIONID = "XXXXX" AND PARATYPE='A') AND WHERE FIELDB IN (SELECT PARAVALUE FROM PARATABLE WHERE SESSIONID = "XXXXX" AND PARATYPE='B') AND WHERE FIELDC IN (SELECT PARAVALUE FROM PARATABLE WHERE SESSIONID = "XXXXX" AND PARATYPE='C') AND WHERE FIELDD IN (SELECT PARAVALUE FROM PARATABLE WHERE SESSIONID = "XXXXX" AND PARATYPE='D') AND WHERE FIELDE IN (SELECT PARAVALUE FROM PARATABLE WHERE SESSIONID = "XXXXX" AND PARATYPE='E') AND WHERE FIELDF IN (SELECT PARAVALUE FROM PARATABLE WHERE SESSIONID = "XXXXX" AND PARATYPE='F')
(It's to compare the field content with some user input parameter inside a parameter table... )
I think properly is that the SELECT ... IN is causing much slowness in the sql statement. I have indexed FIELDA , FIELDB, FILEDC etc and those PARAVALUE and PARATYPE in the PARATABLE table. But perfromance is still slow and execution takes >20 seconds for 200000 rows of records.
Do any one know if still any chance to improvide the performance like this?
I'm trying to find whether there is a dmv or system view that can help me see the last time an index was rebuilt or created. Assuming I rebuilt an index using tsql commands (not a job with a history), is there a way to find out the last time that index was rebuilt?
SELECT a.AssetGuid, a.Name, a.LocationGuid FROM Asset a WHERE a.AssociationGuid IN ( SELECT ada.DataAssociationGuid FROM AssociationDataAssociation ada WHERE ada.AssociationGuid = '568B40AD-5133-4237-9F3C-F8EA9D472662')
takes 30-60 seconds to run on my machine, due to a clustered index scan on our an index on asset [about half a million rows]. For this particular association less than 50 rows are returned.
expanding the inner select into a list of guids the query runs instantly:
SELECT a.AssetGuid, a.Name, a.LocationGuid FROM Asset a WHERE a.AssociationGuid IN ( '0F9C1654-9FAC-45FC-9997-5EBDAD21A4B4', '52C616C0-C4C5-45F4-B691-7FA83462CA34', 'C95A6669-D6D1-460A-BC2F-C0F6756A234D')
It runs instantly because of doing a clustered index seek [on the same index as the previous query] instead of a scan. The index in question IX_Asset_AssociationGuid is a nonclustered index on Asset.AssociationGuid.
The tables involved:
Asset, represents an asset. Primary key is AssetGuid, there is an index/FK on Asset.AssociationGuid. The asset table has 28 columns or so... Association, kind of like a place, associations exist in a tree where one association can contain any number of child associations. Each association has a ParentAssociationGuid pointing to its parent. Only leaf associations contain assets. AssociationDataAssociation, a table consisting of two columns, AssociationGuid, DataAssociationGuid. This is a table used to quickly find leaf associations [DataAssociationGuid] beneath a particular association [AssociationGuid]. In the above case the inner select () returns 3 rows.
I'd include .sqlplan files or screenshots, but I don't see a way to attach them.
I understand I can specify to use the index manually [and this also runs instantly], but for such a simple query it is peculiar it is necesscary. This is the query with the index specified manually:
SELECT a.AssetGuid, a.Name, a.LocationGuid FROM Asset a WITH (INDEX (IX_Asset_AssociationGuid)) WHERE a.AssociationGuid IN ( SELECT ada.DataAssociationGuid FROM AssociationDataAssociation ada WHERE ada.AssociationGuid = '568B40AD-5133-4237-9F3C-F8EA9D472662')
To repeat/clarify my question, why might this not be doing a clustered index seek with the first query?
Found out a while back that my facts-tabel has an non-clustered index on its facts_id. In a bunch of procedures an update is executed against a facts_id unfortunately on it's facts-table. I was wondering if changing it into a clustered index is worth the effort / would make sense considering a +110 million facts and re-indexing the other indexes as well? Facts are loaded sequentially, so I would suspect them facts are in the ordered already?