DataWarehouse - 140 Million Rows - Load Performance Issue
Mar 10, 2000
Need help, I am managing a Data Warehouse (80 G.B Database Size), I purge older than 6 months data from a table which has more than 140 Million rows on daily basis. The daily data load performance is degrading. The table has no clustered indexes (only non-clustered indexes).
Tried dropping and rebuilding the non-clustered indexes, didn't work.
One way to solve the problem is drop the non-clustered index, bcp out the data, truncate the table and bcp in the data and rebuild the non-clustered indexes. This is too risky and taking 14 hours to bcp out the data.
This was not the issue in SQL Server 6.5, because SQL 6.5 always insert new record indexes at the end of the heap link (heap = non-clustered indexes without clustered index). In contrast, SQL Server 7.0 first checks for available space in existing pages by using percent free space pages (this is where it is killing the performance ).
Could not load file or assembly 'Microsoft.DataWarehouse, version 9.0.242.0, Culture=neutral,publicKeyToken=89845dcd8080cc91' - the system cannot find the file specified.
I get this message when I try to do a forcast from the Data Mining - Forecast menu.
Forecast from Table Tools, Analyse, Forecast works perfect.
Running Vista and using the DMAddins_SampleData workbook.
I run the following statement and it will not update beyond 7 million plus rows and I have about 38 million to complete. I keep checking updated row counts and after 1/2 day it's still the same so I know something is wrong because it was rolling through no problem when I initiated it. I need to complete ASAP so it's adding to my frustration. The 'Acct_Num_CH' field is an encrypted field (fyi).
SET rowcount 10000 UPDATE [dbo].[CC_Info_T] SET [Acct_Num_CH] = 'ayIWt6C8sgimC6t61EJ9d8BB3+bfIZ8v' WHERE [Acct_Num_CH] IS NOT NULL WHILE @@ROWCOUNT > 0 BEGIN SET rowcount 10000 UPDATE [dbo].[CC_Info_T] SET [Acct_Num_CH] = 'ayIWt6C8sgimC6t61EJ9d8BB3+bfIZ8v' WHERE [Acct_Num_CH] IS NOT NULL END SET rowcount 0
I'm new to using a DB and have a few questions about what I'm trying to do. I have some historical options data and want to place it into a sql express database. (I understand I might need to use a none express version once the db gets to big.) A months worth of data is over 5.5 million rows of data. So six years worth is ~400 million rows. Is it possible to put this into a sql db and be able to search it very fast? I have a months worth in a db now and it is pretty slow. Should I use a new table for each month and then have 6 years * 12 month = 72 tables to increase the search speed? I search by date and stock_symbol and the data looks like this: Date, Stock_Symbol, Option_Symbol, Strike, BidPrice, AskPrice, Volume, OpenInterest, (and a few others) The select statement is simple: SELECT * FROM Options WHERE Date = @Date and StockSymbol = @Symbol Thanks
In our database, we have a very large table that gets updated every morning, start of the day is copying 4 million rows from the fact table from previous date to today's date in the same table and then some other processing. It takes 1 1/2 to 2 hrs to do this. There is a dts package created to copy these rows into temp table and then to this fact table.
This table has more than 200 million rows
Any ideas on how to accomplish this without doing the copy twice and not running into locking problems.
I need to update about 1.3 million rows in a table of mine. I am getting the data from one of the columns of the same table and updating the new column. I am doing this using a cursor which I have put in a stored procedure. As this is a production table which users might be accessing.It is a web based application and I can't slow the system down. So I am willing to run the stored prcedure during off peak hours. However, do I need to put this in a transaction? If I did put it in a transaction what type of isolation level should I opt for? Data integrity is very important for me and I don't mind to compromise on the performance. I am doing this because one of the columns which has "short description" entry is has become too small for business purposes and we want to increase it's length from varchar(100) to varchar(150). As this is SQL 6.5, I can't increase the lenght of the column. So Iadded a new column and will run the stored proc. What precautions are to be taken? This is on a high priority basis and very important too.
Thanks in advance...
Stored procedure code:
USE DB_Registration_Dev GO IF EXISTS (SELECT NAME FROM SYSOBJECTS WHERE NAME='usp_update_product' AND TYPE='P') DROP PROCEDURE usp_update_product GO CREATE PROC usp_update_product AS DECLARE @short_desc varchar(100) DECLARE @prod_id int
DECLARE sdesc_curs CURSOR FOR SELECT [Product].[product_id] , [Product].[short_description] FROM Product
OPEN sdesc_curs
FETCH NEXT FROM sdesc_curs INTO @prod_id, @short_desc
WHILE @@FETCH_STATUS = 0 BEGIN UPDATE Product SET [Product].[sdesc] = @short_desc WHERE Product_id=@prod_id FETCH NEXT FROM sdesc_curs INTO @prod_id, @short_desc IF @@FETCH_STATUS <> 0 PRINT ' Finished Updating the table...go ahead and have fun ...! ' END DEALLOCATE sdesc_curs GO
Hi, I used the /e in my bcp code. yet did not get all the rows from the main frame into the sql talbes... here is the case I have 11 million rows in an ftp server I use this code to bcp into sql server can anyonecheck if this code is good for the process, I am missing one million row in the bcp process and do not know why??? I put the /e to see if there is any error but could not see any error file in my hard drive? Please check it out and let me know
I'm looking for some performance assistance on updating a column value in a table that contains approximately 50 million rows. I have a permanent table in another database that has the key column and value to be set. My query is listed below, but I'm afraid it will run quite awhile. Any suggestions would be appreciated.
update mytable set column2 = b.column2 from mytable as a join mytable1 as b on a.column1 = b.column1
There is a one to one relationship between the two tables.
I have 1+ CSV files (using a foreach loop) which I'm doing a lot of transform work on and then inserting into a SQL database table. Each CSV file usually contains about 2 days worth of data (contains date stamps) - somewhere in the region of 60k records per day. The destination table currently contains 3 million+ rows and will get bigger. I need to make sure that before inserting into the destination table, the data doesn't already exist.
I've read the following article: http://www.sqlis.com/311.aspx While the lookup method works, it takes ages and eats up memory as it caches the 3m+ records before running for each CSV. Obviously this will only get worse as the table grows in size.
To make things a little more efficient what I'd like to do, is first derive the dates I'm dealing with in the current file - essentially storing the max(date) and min(date) in variables. Then in the lookup SQL use those vars, to reduce the amount of data that needs to be brought into the transformation to check against before inserting into the destination table. Lookup SQL eg. SELECT * FROM MyTable WHERE Date BETWEEN varMinDate AND varMaxDate.
Ideally I'd use an aggregate transformation and then use the subsequent output from that either in the lookup query or store the output in vars, but I don't think you can do that and I get the feeling I'm approaching this with the wrong mindset.
I have a requirement to delete 1 Million records from a table having 10 Million data and it's being queried on 24/7 basis (don't have a downtime). how can I achieve that?
I am doing a performance testing for In-memory option is sql server 2014. As a part I want to insert 500 million rows of records into a in-memory enabled test table I have created.
I need a sample script to insert 500 million records into a table ....
I have a row that is being used log track plays on our website.
Here's the table:
CREATE TABLE [dbo].[Music_BandTrackPlays]( [ListenDate] [datetime] NOT NULL DEFAULT (getdate()), [TrackId] [int] NOT NULL, [IPAddress] [varchar](20) ) ON [PRIMARY]
There's a CLUSTERED INDEX on ListenDate ASC and a NON CLUSTERED INDEX on the TrackId.
I have a TRIGGER on the Music_BandTrackPlays table that looks like the following:
CREATE TRIGGER [trig_Increment_Music_BandTrackPlays_PlayCount] ON [dbo].[Music_BandTrackPlays] AFTER INSERT AS UPDATE Music_BandTracks SET Music_BandTracks.PlayCount = Music_BandTracks.PlayCount + TP.PlayCount FROM (SELECT TrackId, COUNT(*) AS PlayCount FROM inserted GROUP BY TrackId) AS TP WHERE Music_BandTracks.TrackId = TP.TrackId
When a simple INSERT statement is done on the Music_BandTrackPlays table, it can take quite a long time. When I remove the TRIGGER the INSERTs are immediate. The Execution plan for the TRIGGER shows that a 'Inserted Scan' is taking up most of the resources.
How exactly is the pseudo 'inserted' table formed?
For now, I think the easiest thing to do is update my logging page so it performs 2 queries. One to UPDATE the Music_BandTracks table and increment the counter, and perform the INSERT into the Music_BandTrackPlays table seperately.
I'm ok with that solution but I would really like to understand why the TRIGGER is taking so long. The 'inserted' pseudo table will be 1 row 99% of the time. Does SQL Server perform a table scan on all 20 million rows in order to determine what's new and put it in the inserted pseudo table?
IF NOT EXISTS (SELECT TOP 1 1 FROM dbo.syscolumns WHERE id = OBJECT_ID(N'dbo.Employee) and name = 'DoNotCall') BEGIN ALTER TABLE [dbo].[Employee] ADD [DoNotCall] bit not null Constraint DoNot_Call_Default DEFAULT 0 IF ( @@ERROR <> 0 ) GOTO QuitWithRollback END
It just takes a LOT of time in SQL Server Management studio. I have to cancel the query and cancelling takes a whole lot time. I am using SQL Server 2008.
Hi i tried designing a SSIS package which loads only those rows which were different from existing rows in the table , i need to timestamp the existing row with an inactive date when a update of that row is inserted (ex: same studentID ) and the newly inserted row with a insert time stamp so as to indicate the new row as currently active, in short i need to maintain history and current rows in same table , i tried using slowly changing dimension but could not figure out, anyone experience or knowledge regarding the Data loads please respond.
example of Data would be like
exisiting data
StudentID Name AGE Sex ADDRESS INSERTTIME UPDATETIME 12 DDS 14 M XYZ ST 2/4/06 NULL 14 hgS 17 M ABC ST 3/4/07 NULL
New row to insert would be
12 DDS 15 M DFG ST 4/5/07
the data should reflect
StudentID Name AGE Sex ADDRESS INSERTTIME UPDATETIME 12 DDS 14 M XYZ ST 2/4/06 4/5/07
12 DDS 15 M DFG ST 4/5/07 NULL
14 hgS 17 M ABC ST 3/4/07 NULL
Please provide your input as much as you can even though it might not be a 100% solution.
I'm trying to improve the loading of some tables with large amounts of data that forms part of an ETL. I was going to try removing any indexes before the inserting to speed up the process, but I had some questions on whether or not I should include the clustered index (assuming one exists).
I was originally planning on including a step to disable all indexes on the destination table using the following:
ALTER INDEX ALL ON MyTable DISABLE
Once the load had finished I'd simply rebuild all the indexes.
should I simply disable the non-clustered indexes?
I think this question has been asked number of times. However, I amlooking for some specific information. Perhaps some of you can helpclose the gap. Or perhaps you can point me towards right direction.Perhaps this group can help me fill in ms-sqlserver related followingquestions.1. Do this database have data Clustering capabilities?1a. If yes, what mechanism is used such as shared disk, share nothing,etc.2. Do these dB have Security features?2a. If yes, what security features are supported? For instance do theysupport encryption or SSL connection?3. How does the database perform and what is the criteria for theperformance matrix?4. Do they have inbuilt load balance capabilities?I want to thank everyone for taking your time to read thiscorrespondence. I will also greatly appreciate your efforts in sharingyour thoughts.Regards,Manish
I am using sql server 2005 enterprise edition in the clustered environment with netapp storage (one server) I monitor the server performance, basically using windows performance monitor counters such as avg.disk queue length,avg. disk reads/sec, avg disk writes/sec, processor time, available memory Mbytes, cache hit ratio etc. and using some sql server dmvs .
At this point all looks good.
My problem i dont know how much load sql server can handle more .How much it being utilized now? whehet it has reached its limits...how do i know it is just about to reach its linits??
how much throughput sql server is handling at present, how much more it can handle if incoming traffic increases.
Basically i want to find performance baselines , how much more it can handle so that i can plan for the future ..
How do i measure all this? what are different methods available if i want to handle increased incoming traffic? (e.g. adding multiple servers etc.) It will be really great if someone can share his experience on this..
I have a SQl server on a Win2000 box. The server is located behind a screened subnet, and has no access to the outside world except for 1433 and 445 inbound from a webserver located in a DMZ, forward of the internal firewall. Rules are very tight, and the IIS box is patched and has been hardened further including running urlscan. (allowed verbs=get,head,post. all executable and script types disallowed, permissions very tight) I have written an appication that monitors bad http requests in real time to check that the URL scan is working, and it seems to be. The only way to get to the SQL box, would be running injection on page code (majority written by myself no raw sql) on the web server or launching an exploit over HTTP against known ADODB flaws on the web server. The SQL server is a domain member, the IIS server is a standalone. If theres a way to compromise the SQL box without first breaching both firewalls, or first compromising the IIS server, I dont know how to do it. Still, the sql box began acting up, refusing to accept a terminal services connection (the error stated that there was a failure to load Win32.sys) No sign that the machine had been compromised. I restored from a complete back up, which seemed to fix the terminal services problem, but now the sql performance counters fail to load. Specifically MQPERF.dll. Any ideas on what may have caused the server to have 'issues' (prevailing theory centers around evil spirits), and what can I do to get the performance counters to load. Tried replacing the library.
I need to use Bulk insert statement for copying a table with 200 million rows to another table on the same server...the table has no primary key or identity column.... script for BULK INSERT ...
We are working on a DataWarehouse app. The DW has been loaded wiith transactional data from the start of September. and we want refresh the DW with a full load from the original source. This full load wil consist largely of the same records that we loaded initially in the DW but some records will be new and others will have changed.
During the load I want to direct input records NOT already in the DW to a "mods" table and ignore those input records that alreayd exist in the DW. Can SSIS help out with this task?
Hi Guys, I am looking for a large datawarehouse database implemented in SQL Server to use it to run some examples in my thesis. if someone knows where I can find a large datawarehouse database (e.g Sales database) please tell me.
I have been advised to design multidimensional model based on the operation data model. What I have is ---- Couple of documents about the application. Data model of operational data. Now I have been told to create a datawarehouse from which any type of wuestion can be answered. This seems horrendous. Please guide.
I am re-engineering the data warehouse and my client is currently using autogenerate keys, their concern is that after a certain amount of keys (can't remember the figure) sql server starts having problems, dose anyone know how i should handle it when i am doing the designing? thanks any input will be appreciated
i am writing a graduation thesis about clickstream datawarehouse with SQL Server , now who can give me some samlie about it? I just only want to learn it,thank you! :)
I am testing different raid setups in a Datawarehousing environment. E.G 3 X Raid 5 (with four filegroups)
3 X Raid 10 (with four filegroups) At the moment I have chance to trial different hardware configurations. I want to measure what are the performance differences
I need to stress test these configs. Is there any tools I can use to do this. Any loads I can use. What should I measure. Are there any aricles for the environment
Iam using SQl server 2005 Integration Service to transfer the data from Source Data Base to Datawarehouse DB. Here my main problem is after trnsfering the data from main Data Base to datawarehouse DB. If any records is delete from main database, how can we delete same records in Datawarehouse DB. If I want to delete any record in the datawarehouse that record mapped with some other tables [Foriegn Keys].