I am developing an application that has a table with lots of records(network traffic) but the data is summarize every so often to create summary records (old records are deleted). The problem is that I have a PK based on an autoincrement ID (int) that will run out of numbers. However, this ID is not referenced anywhere, (not a foreign key from another table, not use for deletion and there is no update in this table whatsoever).
So my possibilites are:
1.- reseed the id when it is about to run out.
2.- make the id bigint
3.- remove the id and change the PK to 2 other fields
4.- remove the id and without PK
I am leaning toward option 4, because I do not see the need for a PK, but I understand that it is quite out of the normal.. So I would like to hear from other people ( I do not have much experience with DB).
I also like option 3. I already have a index on one of the other fields (time).
I am wondering if tempdb stores all results tempararily whenever I query a large fact table with over 4 million records which joins another dimension table? Since each time when I run the query, the tempdb grows to nearly 1GB which nearly runs out all the space on my local system drive, as a result the performance totally down. Is there any way to fix this problem? Thanks a lot in advance and I am looking forward to hearing from you shortly for your kind advices.
If I use BCP to export a very large table will that table be blockedfor writes during the export process? I don't want to prevent usersfrom accessing that table during the bcp process?Thank You, TFD.
Hi, I have this page that upload's PFD's to a table. In principle this works fine. Until I try to upload large files (3 to 4 MB)I need to even upload larger files than that. (Don't really know as of yet what users are going to come up with) I get TimeOut problems. Now some people say it is not possible to exceed a limit of about 4 MB. But that there is a workaround by changing something to the web.config file.Can somebody give me info about that, (I am quite a novice really)I tried to change it like this, but to no avail: <system.web><httpRuntime maxRequestLength="102400"enable = "True"requestLengthDiskThreshold="102400" useFullyQualifiedRedirectUrl="True"executionTimeout="102400"/></system.web> Thanks for any help!
On a nightly basis, this table gets rebuilt in a temporary database. Once the table has been built and scrubbed, i need to move it into our webservers db.
I'd like to do this with minimal interuption to the website.
Possible techniques:
1) I could set up a DTS package to copy the table object overwriting the destination table
2) I could export to a flat file and then bulk import into the live table (after truncating it)
3) I could run a process to update smaller chunks of data at a time running delete queries and insert queries.
Anybody have a thought on the best way to do this so that the web users would be virtually unaware that anything was happening ?
I am absolutely innocent as far as T-SQL is concerned. I need to detect all duplicates (key consists of 5 fields) in the table and delete the duplicates. I tried different approaches like joins etc but nope. Any help is appreciated Thanks
OK, I imported 680 million records into an unindexed table. That went well.
Then, I went into Enterprise Manager and added a two column non-unique clustered index to that table to speed access.
It's been running for ~36 hours and I have no idea when it will complete. I have deadlines that I'm going to miss and am very nervous; what can I do?
SQL Server 2000 Enterprise Edition (8.00.818 - sp3 + hotfixes) Dual 3Ghz Xeon (two physical CPUs each have HyperThreading enabled) Windows 2000 SP4 4GB RAM (although I just noticed the 3GB OS switch wasn't on) SCSI boot drive tempdb, data, and transaction log are on a FibreChannel RAID SAN
Hi folks! I'm looking for advice on partitioning a large table. In the DDL below I've changed names to protect the guilty.
My table has this schema:
CREATE TABLE [dbo].[BigTable] ( [TimeKey] [int] NOT NULL, [SegmentID] [int] NOT NULL, [MyVal] [tinyint] NOT NULL ) ON [BigTablePS1] (TimeKey) -- see below for partition scheme
-- will evaluate whether this one is needed, my thinking is yes -- based on the expected select queries. create index NCI_SegmentID on BigTable(SegmentID asc)
The TimeKey column is sort of like a unix time. It's the number of minutes since 2001/01/01, but always floored to a 5 minute boundary. so only multiples of 5 are allowed.
Now, this table will be rather big. There are about 20k possible SegmentIDs. For every TimeKey from 2008/01/01 to 2009/01/01 (12 months), I'll have on the order of 20000 rows, one for each SegmentID.
For the 12 month period, there are 365*24*60/5=105120 possible TimeKey values. So the total rowcount is over 2 billion. (20k * 105120)
Select queries are expected to be something like this:
-- fetch just one particular row... select MyVal from BigTable where TimeKey=5555 and SegmentID=234234
--fetch for a certain set of SegmentID and a particular time... select b.SegmentID ,b.MyVal from BigTable b join OtherTable t on t.SegmentID=b.SegmentID where b.TimeKey=5555 and t.SomeColumn='SomeValue'
Besides selects, also I need to be able to efficiently issue update statements against the table with new values in the MyVal column based on a range of TimeKey values (a contiguous span of a few days) and sets of about 1000 SegmentID. updates would always look like this:
update t set t.MyVal=p.MyVal from BigTable t join #myTempTable p on t.TimeKey=p.TimeKey and t.SegmentId=p.SegmentId
where #myTempTable would have order of 1000*24*60 rows in it, all with contiguous TimeKey values, and about 1000 different SegmentID values. #myTempTable also has a clustered pk on (timekey asc, SegmentId asc).
After the table is loaded, it would never get any inserts or deletes. only selects and updates.
Given the size, and the nature of the select and update queries, this table seems like a good candidate for partitioning. I'm thinking it makes sense to partition on TimeKey.
So my question is, is it stupid to create a separate partition for each day in the year long span of TimeKeys this table covers? That would mean 365 partitions in the partition function and partition scheme. Something like this:
CREATE PARTITION FUNCTION [BigTableRangePF1] (int) AS RANGE LEFT FOR VALUES ( 3680640 + 0*1440, -- 3680640 is the number of minutes between 2001/01/01 and 2008/01/01 3680640 + 1*1440, 3680640 + 2*1440, 3680640 + 3*1440, ...snip... 3680640 + 363*1440, 3680640 + 364*1440, 3680640 + 365*1440 ); GO
CREATE PARTITION SCHEME [BigTablePS1] AS PARTITION [BigTableRangePF1] TO ( [PRIMARY],[PRIMARY],[PRIMARY], ...snip... [PRIMARY],[PRIMARY],[PRIMARY] ); GO
does anyone have any experience with partitioned tables with so many partitions? Is a few hundred partitions too many? From my understanding of partitions, seems like having so many will be ok. Is it somehow worse than having hundreds of tables in a database?
Even with one partition for each day, I'll still have 24*60*20000/5 ~ 5m rows in each one.
Greetings All, I was wondering what would happen if I were to do a"select * from table" on a table that has about 5 million rows. Wouldmy read block other writers to the same table? Would it block otherreaders? I know SQL uses optimistic lockign by default but I am notsure what this means to other users trying to access the same table?Any advise would be greatly appreciated.TFD
Quick question:Does SQL do table/schema changes "in place"?I've got a large table (140+ million rows of very widedata) that we want to change the schema on -- basicallyto remove a number of the unused data elements that wedon't use.Anyway, does anyone know if SQL will do an in-placechange, or if it will copy the table to a new table, therebyincreasing my space allocation needs? I'd effectively,temporarily, need space for two tables while the changeis happening if it copies the table first. This is not good asI do not have enough available space at the moment.If you've got pointers to specific MS docs regardingthis issue, please let me have 'em.Thanks in advance.
I have query that takes 12 minutes to execute. The query uses around 9 tables but I have narrowed down the problem to one table that has over 65 million rows. The problem table has only 3 fields
The query uses the primary key of this table to perform the join. FieldTwo and FieldThree are only used as output parameters.
I noticed if I remove FieldTwo and FieldThree from the output (but still leave the table in the query), the query executes in 1 second. However if I include FieldTwo and FieldThree in the output, the query takes over 12 minutes to execute.
I cannot index FieldTwo and FieldThree because of the field size and I cannot reduce the size of the fields because of the data that needs to be stored in it? How can I index or do something similar to speed up the table look up.
I have been asked to look at some performance isssues with an application that utilises a 800GB table. This table is huge and contains 4 int columns and 1 decimal column. The table has a clustered index that covers 4 of the int columns and is heavily fragmented and it has not been maintained for a long time. The system has limited free space to even attempt rebuilding the index. Does anyone have any experience of running a the Alter Index Reorganize command on such a large table? Any information on what storage would be required to attempt this, how long would this take?
I have to migrate an Oracle Db to SQL Server 2005, including a 450 gb table with images. The estimate is that it will take about 24 hours to move this data. I€™m using SSIS with just one OLEDB input and sending to one OLEDB output. The SSIS process will be running on the destination SQL Server with 2gb of memory and at least half that memory in use by other apps (including SQL Server). I tried the SQL Server Destination but received errors trying to use it.
Are there any suggestions on settings or the best way to do this?
I have 4 tables with the respective amount of records 1) 6755 2) 2021 3) 2021 4) 355
They all have the same columns. However, they need to be seperate, or at least when I query them. I'll be accessing this database via the web. i was first afraid that a large database would cause major slow down when accessing the db. So I broke it up into 4 tables. If I combined all 4 tables into one large table and just had a column that differentiated the 4, how significant would be the change in speed when accessing the table? It's not a big deal to keep them seperate, its just that when I have to add or remove a column from one table I have to remove it from all the tables. Furthermore, I'm using a module from DEVEXPRESS, don't know if anyone has heard of it, but when you use a gridview, it loads up the entire table even though your paging (which I think is retarded), so for that reason I was afraid it would slow up my access to the db. Any thoughts?
We have a table that we BCP into, the data is then processed and inserted into its appropriate table. Then the table or its data needs to be removed. This seems to be a very slow operation to remove the table or table data. I have tried drop table, and truncate table and it takes nearly as long as the bcp operation. The table has 12 million rows. I didn't think either operation wrote to the transaction log except for page extent management. Why is the drop and truncate so slow. Suggestions?
The purchased-application mssql 6.5, sp 4 that I am working on has one large table has 13m illion. It the largest table considering thenextlatgest table is only1.75 million rows.
Thew vnedor has made a change to this largest table in recommending changing a data type -- char to varchar. To make this change easier to do, I want to "archive" older data not necessary for the current year or current processing to another table.
What is the best way to do this archiving?
Any information you can provide will be greatly appreciated. Thanks.
I have a table that currently holds about 5 million records. We add an average of 5,000 new records per day, all of them from overnight batch jobs. I guess it's not that big, but there are two text columns that hold a couple KB each, so the total size isn't exactly small either. The data is created from medical billing data we receive overnight. We get two reports- patient demographic information and a physician's dictation relating to that patient. This data is always one to one, and the purpose of this table is to store the data as we originally receive, which is why both reports are in the same table. After we extract the details from the report, (which are by this point always reduced to text documents) we need to keep not only the data but the original documents, hence the two text columns. We considered moving the large columns to their own table, which would just have an ID field and the column, but the powers that be really wanted all this in the same table. Nothing new goes into the table during day- it's all SELECT statements.
I need to add a column to this table. It's just a small char(7) column, NULLS allowed, of course. We bill for several clients, and reports from different clients become available at different times, so there's really no down time overnight. Altering the table during the day is out of the question. So how can I add a column while the table is active?
My best idea so far is to use SELECT *, NULL AS NewColumn INTO NewTable to create a copy of the table (using a cast to get the correct datatype) during the day, when no new data is going in, and replacing the old table with the new by simply changing the names right after everyone goes home. But this could still cause slowdowns while it builds the copy, and leave the problem of re-creating indexes (there are several). There ought to be some graceful way to tell it to add the column to the existing table and play nice with ongoing traffic.
I have a few hundred users, maybe a dozen or two active at any given time, accessing the same database via ASP. The database has many tables, one being a very large orders table with a few million records, in which I have created a view against. A view only because I need to allow the user to filter quite extensively against the results. The users typically only need to view records for the last 30 days and results for each user might be five thousand records or less.
My question is this. Would I be better off writing each user's resultset to a temp table for that user's session and allow the filtering and sorting by the user go against that temp table and increase my hardware requirements to accomodate that. Possibly to the point of creating a database cluster. OR would I be better off leaving it as is where each users uses the same view.
FYI...each user may need visibility to only a hand full of fields, but over all the view must maintain many fields.
Any thoughts on this would be greatly appreciated. Thanks in advance.
Using SQL Server 2000, SP1 with 4Gb max memory allocated to the instance. The problem is that one large table is hogging cache and it's dragging down overall query performance. I realise it's in cache because it's getting queried regulary. However, I need to know what options exist to get around this problem - to free up some cache for other tables and indexes? Of course, there is the option of archiving off some the data in the table to reduce its size and we will look at doing this although it will not be as easy as it sounds.
I can imagine that there must be many databases that have at least one large table that is getting hit regularly and is left in cache more-or-less permanently. Therefore, I can't believe I have an usual problem.
What's the most efficient way to store the following information:
* Table contains 1 million listings * Each listing can be geo-targeted to any of the 200+ countries * Searches return listings based on geo-location
Storage options:
Option #1 (normalized) * ListingsTable (PK listingID int) [1 million rows] * ListingGeoLocations (listingID, geoLocationID) [could be up to 200 million rows]
Option #2 (denormalized) * ListingsTable (PK listingID int, binary(32) with bit-mask consisting of 200 bits one for each location)
Did anyone have experience with similar structures? Which option is more efficient?
What will be the best way to go to select summary data from a big table with detail records and insert it into another table.
The source table contains approx 100 million detail records for a couple of months. I have a select statement that select a summary of the latest month's data and it average about 15 million records for the month which i want to insert into another table. In the past i just used a standard insert into statement but not the best way of doing it.
If a view is created with just the last months summary data and i select from the view will the performance be better or will it just add more overhead?
Will a SSIS package work better to insert the summary data?
i just want to know tht i have table which have large record almost 7 lakh round about , when i set auto increment on it gives me timeout error, is der any easy way to ON auto increment ??
SQL Server 7/2000: We have reasonably large tables (3,000,000 rows)that we need to add some indexes for. In a test, it took over 12 hoursto CREATE a new INDEX against this table. One of us suggested that wecreate a temp table with the new index and copy the data from the oldtable into the new one, then rename it. I understand this took 15minutes. Why the heck would it be faster to move the data and buildmultiple indexes incrementally vs adding an index??
I'm working with a table with about 60 million records. This monster is growing every minute of the day as well, by 200,000 - 300,000 records/day. It's 11 columns wide, and has one index on a datetime column. My task is to create some custom reports based on three of these columns, including the datetime one.
The problem is response time. Any query executed on this table takes forever--anywhere between 30 seconds and 4 minutes. Queries such as this one below, as simple as it is, can take a minute or more:
select count(dt_date) as Searches from SearchRecords where datediff(day,getdate(),dt_date)=0
As the table gets larger and large, the response time is going to get worse and worse. Long story short, what are my options to get the speed of queries down to just a few seconds with a table this big? So far the best I can come up with is index any other appropriate columns (of which there is one for sure, maybe two).
I have a table with a couple hundred billion records (sql server 2005). When I do a select count(*) from tblx -- it takes this side of forever. Is it possible to count partitions and then add them up to make it faster? Â
How I could improve the performance for count(*) of this huge table. Note: if the partition idea sounds viable -- what would that look like?Â
We have a table that is 800GB. We are planning to re-build the clustered index on this table to a different filegroup. The new filegroup and files associated with it will sit on a SAN which will have a 1.5TB allocation. Does anyone have any suggestions in regards to how many files to have associated with the filegroup to provide optimal performance? Apparently we could have 3 LUNS (500gb each), so would 1 file on each LUN provide additional performance as opposed to one file on 1 LUN?
I need to do a 4 column lookup against a large table (1 Million rows) that contains 4 different record types. The first lookup will match on colums A, B, C, and D. If no match is found, I try again with colums A, B, C, and '99' in column D. If no match, try again with column A, B, D, and '99' in Column C. Finally, if no match in any of the above, use column A, '99' in B, '99' in C, '99' in D. I will retreive 2 columns from the lookup table.
My thought is that breaking this sequence out into 4 different tables/ lookups would be most efficient. The other option would be to write a script that handled this logic in a single transform with an in-memory table. My concern is that the size of the table would be too large to load into memory.
I need to delete data from a particular table which has more than half a million records. The data needs to be deleted is more than 200,000 records from the table. What is the best way to delete the data from the table other than importing into a temporary table and performing the same operation?
Let me know if the strategy to be followed is okay.
1. Drop all the triggers 2. Drop all the indexes 3. Write a procedure with a loop setting ROWCOUNT to 1000 and delete the records. ( since if I try to delete all the rows it will give timeout error ) The above procedure will delete 1000 records for each batch inside the loop till it wipes out all the data for the specified condition. 4. Recreate Indexes and Triggers.
Please let me know if there are any other optimal solution.
I need to change the datatype of a very large table from smallint to int... What would be an ideal solution to get this done in least amount of time. May be I can try with ALTER but , I am not sure about the time it would take ...and the page splits etc..
I have a table that’s about 3 gigs, using this table and a few others I’m making another table. The problem is when making the new table my transaction log inflates so much that I’m running out of disk space. What I can I do to prevent this or to keep the transaction log size under control?
* SQL Server 2008 R2 * Database was created from a third party product. The product writes to the 3 tables that I need to make changes to 24/7 and downtime is not an option. All changes must be done live. * Database overall size is ~200 GB * The 3 tables I must update make up ~190 GB of that space. * Tables have no primary key or ID columns. Therefore, the data is highly fragmented. * Of the ~190 GB of space allocated for the tables, there is roughly 70 GB of actual data. * Rows of the table are not guaranteed to be unique. In fact, on one of the tables, tests were ran with a small sample of data and duplicates were very much evident.
What I'm trying to accomplish here is to get an ID column added to the 3 tables and set that ID field as the primary key. Doing so will force the data to become much less fragmented than it is currently and with purging and new inserts, eventually fragmentation will be nearly non-existent.
Problem: Making table changes on tables this large while data is constantly being added poses many risks and can cause data loss. This was tried on a smaller table than these three and the entire table was lost in the process. Restore from backup was needed to get back to most recent log backup point.
Original Solution: My original plan was to create a backup of each table and run the script below to migrate the majority of the data temporarily into the new table. I could then update the original table (which now would contain much less data) and then migrate the data back.
Original Solution Problem: The problem with the solution above is that it calls the DELETE function on the original table using the values from the temporary table. When there are duplicate rows, which have not all been inserted into the backup table yet, they will all be removed from the original table because there is nothing unique to separate them out. In my testing, I had 10,000 rows in the original table and ended up with 9,959 rows in the backup table.
Question 1: Is my approach to making these table changes reasonable? Question 2a: If so, how can I make sure I don't lose data as part of this temporary migration of the data to my backup tables? Question 2b: If not, what would be a better approach that isn't going to cause disruption to the application that INSERTs data 24/7 and won't have any risk of data loss?
I have a task to count distinct records in a big table with roughly 30M records, performance is an important factor. Query is to be written to calculate weekly stats, weekly record number could be as high as 1M.
The actual result is like: ID Policy 350235744Credit Cards 350235744PCI 350235744PCI Audit
So the final number for this particular Policy is 3
I can write the query like:
select count(distinct Incident_id) policy_name from Reporting_DailyDlpDetail Where (year(INSERT_DETECT_TS)=2015) and (month(INSERT_DETECT_TS) =6) and (day(INSERT_DETECT_TS) between 2 and 9) This returns 526254 and costs 11 seconds to complete
or a query like: Select distinct Incident_id, policy_name from Reporting_DailyDlpDetail Where (year(INSERT_DETECT_TS)=2015) and (month(INSERT_DETECT_TS) =6) and (day(INSERT_DETECT_TS) between 2 and 9) This returns 749687 and costs roughly 1 minute to complete.
Result is different from the two queries, I believe the later gives correct number. How can I count the distinct based on a combo?Considering the size of data, what is the best and most efficient way to run the stats calculation against over 30 different scenarios (different policies and alert types) and not timeout?