Updating A Column In A Table That Contains 50 Million Rows
Feb 27, 2008
I'm looking for some performance assistance on updating a column value in a table that contains approximately 50 million rows. I have a permanent table in another database that has the key column and value to be set. My query is listed below, but I'm afraid it will run quite awhile. Any suggestions would be appreciated.
update mytable
set column2 = b.column2
from mytable as a
join mytable1 as b
on a.column1 = b.column1
There is a one to one relationship between the two tables.
IF NOT EXISTS (SELECT TOP 1 1 FROM dbo.syscolumns WHERE id = OBJECT_ID(N'dbo.Employee) and name = 'DoNotCall') BEGIN ALTER TABLE [dbo].[Employee] ADD [DoNotCall] bit not null Constraint DoNot_Call_Default DEFAULT 0 IF ( @@ERROR <> 0 ) GOTO QuitWithRollback END
It just takes a LOT of time in SQL Server Management studio. I have to cancel the query and cancelling takes a whole lot time. I am using SQL Server 2008.
I need to update about 1.3 million rows in a table of mine. I am getting the data from one of the columns of the same table and updating the new column. I am doing this using a cursor which I have put in a stored procedure. As this is a production table which users might be accessing.It is a web based application and I can't slow the system down. So I am willing to run the stored prcedure during off peak hours. However, do I need to put this in a transaction? If I did put it in a transaction what type of isolation level should I opt for? Data integrity is very important for me and I don't mind to compromise on the performance. I am doing this because one of the columns which has "short description" entry is has become too small for business purposes and we want to increase it's length from varchar(100) to varchar(150). As this is SQL 6.5, I can't increase the lenght of the column. So Iadded a new column and will run the stored proc. What precautions are to be taken? This is on a high priority basis and very important too.
Thanks in advance...
Stored procedure code:
USE DB_Registration_Dev GO IF EXISTS (SELECT NAME FROM SYSOBJECTS WHERE NAME='usp_update_product' AND TYPE='P') DROP PROCEDURE usp_update_product GO CREATE PROC usp_update_product AS DECLARE @short_desc varchar(100) DECLARE @prod_id int
DECLARE sdesc_curs CURSOR FOR SELECT [Product].[product_id] , [Product].[short_description] FROM Product
OPEN sdesc_curs
FETCH NEXT FROM sdesc_curs INTO @prod_id, @short_desc
WHILE @@FETCH_STATUS = 0 BEGIN UPDATE Product SET [Product].[sdesc] = @short_desc WHERE Product_id=@prod_id FETCH NEXT FROM sdesc_curs INTO @prod_id, @short_desc IF @@FETCH_STATUS <> 0 PRINT ' Finished Updating the table...go ahead and have fun ...! ' END DEALLOCATE sdesc_curs GO
I am trying to update a large table which consists of 45 million records , it is taking more than 2 days to the update , below is my approach
1. The table has only one clustered index and no other indexes on the table. 2. I am updating in batches say 20000 record-wise. 3. Changed the recovery mode to bulk logged and auto-growth size is set to 300MB and there is enough space in my disk for transaction log .
I am doing a performance testing for In-memory option is sql server 2014. As a part I want to insert 500 million rows of records into a in-memory enabled test table I have created.
I need a sample script to insert 500 million records into a table ....
I have a row that is being used log track plays on our website.
Here's the table:
CREATE TABLE [dbo].[Music_BandTrackPlays]( [ListenDate] [datetime] NOT NULL DEFAULT (getdate()), [TrackId] [int] NOT NULL, [IPAddress] [varchar](20) ) ON [PRIMARY]
There's a CLUSTERED INDEX on ListenDate ASC and a NON CLUSTERED INDEX on the TrackId.
I have a TRIGGER on the Music_BandTrackPlays table that looks like the following:
CREATE TRIGGER [trig_Increment_Music_BandTrackPlays_PlayCount] ON [dbo].[Music_BandTrackPlays] AFTER INSERT AS UPDATE Music_BandTracks SET Music_BandTracks.PlayCount = Music_BandTracks.PlayCount + TP.PlayCount FROM (SELECT TrackId, COUNT(*) AS PlayCount FROM inserted GROUP BY TrackId) AS TP WHERE Music_BandTracks.TrackId = TP.TrackId
When a simple INSERT statement is done on the Music_BandTrackPlays table, it can take quite a long time. When I remove the TRIGGER the INSERTs are immediate. The Execution plan for the TRIGGER shows that a 'Inserted Scan' is taking up most of the resources.
How exactly is the pseudo 'inserted' table formed?
For now, I think the easiest thing to do is update my logging page so it performs 2 queries. One to UPDATE the Music_BandTracks table and increment the counter, and perform the INSERT into the Music_BandTrackPlays table seperately.
I'm ok with that solution but I would really like to understand why the TRIGGER is taking so long. The 'inserted' pseudo table will be 1 row 99% of the time. Does SQL Server perform a table scan on all 20 million rows in order to determine what's new and put it in the inserted pseudo table?
I have a requirement to delete 1 Million records from a table having 10 Million data and it's being queried on 24/7 basis (don't have a downtime). how can I achieve that?
I need to use Bulk insert statement for copying a table with 200 million rows to another table on the same server...the table has no primary key or identity column.... script for BULK INSERT ...
I have a table that has 4+ million records. I need to update those records. I am facing some performance issue. Can someone please advice?
update stage set batch_status = 1 where update_status = 0
Update transaction Set aId = s.aId, b = s.b,
from stage s Where s.aId = transaction.aId and s.batch_status = 1
Update stage Set update_status = 1, batch_status = 2
where
batch_status = 1
When I run the above query with "set rowcount 1000", it runs in one minute. When I run the query for "set rowcount 10000", it runs in 1 hour 56 minutes. Can someone help me to optimize it?
Hi, I need to update column week14 in table PastWeeks with data from Eng_Goal and then result of some calculation from table AverageEngTime in the row Goal and Used respectively. I used the following and was not successful. Please advice. Thank you.
I want to update tableToUpdate in batches of 5000 per batch and set the lastenecryptionDT to null based on the the join to the tableValues using the column ENCRYPTIONID, and also output updated rows into another table. Incase I would need to do a rollback.
I run the following statement and it will not update beyond 7 million plus rows and I have about 38 million to complete. I keep checking updated row counts and after 1/2 day it's still the same so I know something is wrong because it was rolling through no problem when I initiated it. I need to complete ASAP so it's adding to my frustration. The 'Acct_Num_CH' field is an encrypted field (fyi).
SET rowcount 10000 UPDATE [dbo].[CC_Info_T] SET [Acct_Num_CH] = 'ayIWt6C8sgimC6t61EJ9d8BB3+bfIZ8v' WHERE [Acct_Num_CH] IS NOT NULL WHILE @@ROWCOUNT > 0 BEGIN SET rowcount 10000 UPDATE [dbo].[CC_Info_T] SET [Acct_Num_CH] = 'ayIWt6C8sgimC6t61EJ9d8BB3+bfIZ8v' WHERE [Acct_Num_CH] IS NOT NULL END SET rowcount 0
So to the above table I have added a new field/column named "ProdLongDescr(varchar, Null)"
So, I need to populate this newly added column with specific values for each row depending on "ProductCode" which is different forevery row. The problem is that I have 25 rows.So instead of Writing 25 individual update scripts, is there a way in which single query will do the same job instead of writing one update query for each row ?. If so can some one guide me how to achieve that OR point to me a good resource.
Below are a couple of Individual update scripts I Wrote. "ProductCode" is different for all 25 rows.
Update tblValAdPackageElement SET ProdLongDescr = 'Slideshows' WHERE ProductCode = 'SLID' And szElementDescr='Slideshow' if @@error <> 0 begin goto ErrPos end
Update tblValAdPackageElement SET ProdLongDescr = 'CategorySlideshows' WHERE ProductCode = 'SLDC' And szElementDescr='CategorySlideshow' if @@error <> 0 begin goto ErrPos end
I'm new to using a DB and have a few questions about what I'm trying to do. I have some historical options data and want to place it into a sql express database. (I understand I might need to use a none express version once the db gets to big.) A months worth of data is over 5.5 million rows of data. So six years worth is ~400 million rows. Is it possible to put this into a sql db and be able to search it very fast? I have a months worth in a db now and it is pretty slow. Should I use a new table for each month and then have 6 years * 12 month = 72 tables to increase the search speed? I search by date and stock_symbol and the data looks like this: Date, Stock_Symbol, Option_Symbol, Strike, BidPrice, AskPrice, Volume, OpenInterest, (and a few others) The select statement is simple: SELECT * FROM Options WHERE Date = @Date and StockSymbol = @Symbol Thanks
I am trying to figure out if what i am attempting to do is possible and whether or not my approach is wrong to begin with.
I am trying to build a custom report for our accounting system which is Traverse from Open systems. This is what i have done in the stored procedure thus far
SET QUOTED_IDENTIFIER ON GO SET ANSI_NULLS ON GO
ALTER PROCEDURE rptArFLSalesByCustItemized_sp @custId pCustID, @dateFrom datetime, @dateThru datetime, @itemIdFrom pItemId, @itemIdThru pItemId as set nocount on
-- define some variables for previous year declare @LYqty int, @LyAmt money, @LYfrom datetime, @LYthru datetime
-- set defaults SET @itemIdFrom=ISNULL(@itemIdFrom,(SELECT MIN(itemId) FROM tblInItem)) SET @itemIdThru=ISNULL(@itemIdThru,(SELECT MAX(itemId) FROM tblInItem)) SET @LYfrom=DATEADD(YEAR,-1,@dateFrom) SET @LYthru=DATEADD(YEAR, -1, @dateThru)
-- create small temp table to hold customer info Create Table #tmpArCustInfo ( custId pCustID, custName VARCHAR (30), ) -- populate customer temp table with info Insert into #tmpArCustInfo select custId, custName from tblArCust WHERE custId = @custId
-- create a temp table to hold the Data for each Item Create Table #tmpArSalesItemized ( itemId pItemId, productLine VARCHAR (12), pLineDesc VARCHAR (35), descr VARCHAR (35), LYQtySold int, LYTDQtySold int, QtySold int, LYTDsales money, totalSales money, LastInvDate datetime, )
-- populate the temp table with all of the inventory items insert into #tmpArSalesItemized select ii.itemId, ii.productLine, ip.Descr, ii.Descr, 0,0,0,0,0, NULL from tblInItem ii, tblInProductLine ip where ip.productLine = ii.productLine AND ii.itemId BETWEEN @itemIdFrom AND @itemIdThru
-- update table with this years quantities update #tmpArSalesItemized SET QtySold = (select SUM(QtyOrdSell) from tblArHistDetail hd where TransId IN (select TransId from tblArHistHeader where custId = @custId) AND orderDate IN (select OrderDate from tblArHistHeader where OrderDate BETWEEN @dateFrom AND @dateThru) AND hd.partId BETWEEN @itemIdFrom AND @itemIdThru GROUP BY hd.partId )
-- Return the temp tables results select * from #tmpArSalesItemized, #tmpArCustInfo
drop table #tmpArSalesItemized, #tmpArCustInfo
return
GO SET QUOTED_IDENTIFIER OFF GO SET ANSI_NULLS ON GO
My problems begin where i want to start updating all of the Qty's of the QtySold field. I have managed to get it to write the same sum in every field but i cannot figure out how to update each row based on the sum of the qty found for that item in the tblArHistDetails table, trouble is too that there is no reference to the custId in that table either. The custId resides in tblArHistHeader and is linked to the details table via the TransId column. So really i need to update many rows based on criteria from 2 other tables.
Can anyone please help? I dont have a clue how to make this work, and most of what i have learned about sql thus far has been from opening other stored procs etc in the accounting system and just reading to see how the developers have done things.
I have to write a couple scripts that will update a couple columns in two separate tables and also insert a new row with the same data except for a few calculated or provided values ...... see specs below ...
1. tGradeHist Table Script One (Needs to be run first)
a. Read tGradeHist Table and Select rows with GradeEndDate = NULL and GradeStartDate = '1/1/2007 12:00:00 A.M.'
b. Calculate New Step Amount = StepAmount * Incr% (Round To Nearest Whole Dollar)
c. Create New Row for this table using information from row read above and insert new information where indicated :
GradeCode - Same
GradeLocationCode - Same
Step - Same
GradeStartDate - '7/1/2007 12:00:00 A.M.'
GradeEndDate - NULL
GradeCurrencyCode - Same
StepAmount - Result of b (above)
GradeFrequencyCode - Same
RangeMaximumAmount - Same
RangeMidAmount - Same
RangeMinimumAmount - Same
GradeCurrentFlag - 'True'
MarketMaximumAmount - Same
MarketMidAmount - Same
MarketMinimumAmount - Same
GradeGUID - Same
TSCOL - Same
d. Update Row read in a (above) with GradeEndDate = '6/30/2007 12:00:00 A.M.' and GradeCurrentFlag = 'False'
2. tPersonBasePayHist Table Script Two (Needs to be run second)
a. Read tPersonBasePayHist Table and Select rows with PersonBasePayEndDate = NULL
b. Calculate New PersonBasePayAmount = PersonBasePayAmount * Incr% (Round To Nearest Whole Dollar)
c. Create New Row for this table using information from row read above and insert new information where indicated :
PersonGUID - Same
PersonBasePayStartDate - '7/1/2007 12:00:00 A.M.'
PersonBasePayEndDate - NULL
PersonBasePayCurrencyCode - Same
PersonBasePayAmount - Result of b (above)
PersonBasePayFrequency - Same
PersonBasePayPayrollFrequencyCode - Same
BasePayReasonCode - 'SA'
ConductedBasePayReviewDate - Same
ScheduledBasePayReviewDate - Same
PayrollCode - Same
PersonBasePayCurrentFlag - 'True'
ApprovedByPersonGUID - Same
PersonBasePayGUID - Same
TSCol - Same
d. Update Row read in a (above) with PersonBasePayEndDate = '6/30/2007 12:00:00 A.M.' and PersonBasePayCurrentFlag = 'False'
In our database, we have a very large table that gets updated every morning, start of the day is copying 4 million rows from the fact table from previous date to today's date in the same table and then some other processing. It takes 1 1/2 to 2 hrs to do this. There is a dts package created to copy these rows into temp table and then to this fact table.
This table has more than 200 million rows
Any ideas on how to accomplish this without doing the copy twice and not running into locking problems.
And I would like to produce the following outcome to the same table (using update statement): As what you all observe, it merge all overlapping dates based on same promotion ID by taking the minimum start date and maximum end date. Only the first row of overlapping date is updated to the desired value and the flag value change to 1. For other overlapping value, it will be set to NULL and the flag becomes 2.
The second part that I would like to acheive is based on the first table as well. However, this time I would like to merge the date which results in the minimum start date and End Date of the last overlapping rows. Since the End date of the last overlapping rows of promotion ID 1 is row with ID 4 with End Date 2015-05-29, the table will result as follow after update.
Hi, I used the /e in my bcp code. yet did not get all the rows from the main frame into the sql talbes... here is the case I have 11 million rows in an ftp server I use this code to bcp into sql server can anyonecheck if this code is good for the process, I am missing one million row in the bcp process and do not know why??? I put the /e to see if there is any error but could not see any error file in my hard drive? Please check it out and let me know
Need help, I am managing a Data Warehouse (80 G.B Database Size), I purge older than 6 months data from a table which has more than 140 Million rows on daily basis. The daily data load performance is degrading. The table has no clustered indexes (only non-clustered indexes).
Tried dropping and rebuilding the non-clustered indexes, didn't work.
One way to solve the problem is drop the non-clustered index, bcp out the data, truncate the table and bcp in the data and rebuild the non-clustered indexes. This is too risky and taking 14 hours to bcp out the data.
This was not the issue in SQL Server 6.5, because SQL 6.5 always insert new record indexes at the end of the heap link (heap = non-clustered indexes without clustered index). In contrast, SQL Server 7.0 first checks for available space in existing pages by using percent free space pages (this is where it is killing the performance ).
I have 1+ CSV files (using a foreach loop) which I'm doing a lot of transform work on and then inserting into a SQL database table. Each CSV file usually contains about 2 days worth of data (contains date stamps) - somewhere in the region of 60k records per day. The destination table currently contains 3 million+ rows and will get bigger. I need to make sure that before inserting into the destination table, the data doesn't already exist.
I've read the following article: http://www.sqlis.com/311.aspx While the lookup method works, it takes ages and eats up memory as it caches the 3m+ records before running for each CSV. Obviously this will only get worse as the table grows in size.
To make things a little more efficient what I'd like to do, is first derive the dates I'm dealing with in the current file - essentially storing the max(date) and min(date) in variables. Then in the lookup SQL use those vars, to reduce the amount of data that needs to be brought into the transformation to check against before inserting into the destination table. Lookup SQL eg. SELECT * FROM MyTable WHERE Date BETWEEN varMinDate AND varMaxDate.
Ideally I'd use an aggregate transformation and then use the subsequent output from that either in the lookup query or store the output in vars, but I don't think you can do that and I get the feeling I'm approaching this with the wrong mindset.
hi i want to update column with new value in a table is it possible to do so in stored procedure . the name of the column will be an input to stored procedure ie at the time of writing the stored procedure i dont know which column the user will be updating
Does anyone know of the SQL statement to add additional comuns to an existing table. I know i can do it through enterprise managaer, but I want to see if I can do it from query analyzer or some script to add columns without having to recreate the table. Or a way to save off the data and recreate the table then placing data back in? I hope that makes sense. I am thinking of this from the sens of doing all updates through source safe
Hi, I have an application where I'm filling a dataset with values from a table. This table has no primary key. Then I iterate through each row of the dataset and I compute the value of one of the columns and then update that value in the dataset row. The problem I'm having is that when the database gets updated by the SqlDataAdapter.Update() method, the same value shows up under that column for all rows. I think my Update Command is not correct since I'm not specifying a where clause and hence it is using just the value lastly computed in the dataset to update the entire database. But I do not know how to specify a where clause for an update statement when I'm actually updating every row in the dataset. Basically I do not have an update parameter since all rows are meant to be updated. Any suggestions? SqlCommand snUpdate = conn.CreateCommand(); snUpdate.CommandType = CommandType.Text; snUpdate.CommandText = "Update TestTable set shipdate = @shipdate"; snUpdate.Parameters.Add("@shipdate", SqlDbType.Char, 10, "shipdate"); string jdate =""; for (int i = 0; i < ds.Tables[0].Rows.Count - 1; i++) { jdate = ds.Tables[0].Rows[i]["shipdate"].ToString(); ds.Tables[0].Rows[i]["shipdate"] = convertToNormalDate(jdate); } da.Update(ds, "Table1"); conn.Close();
My goal is to with one update statement, fill TABLE1.counter (currently empty) with the total count of TABLE2.e# (employee). Also, if TABLE1.e# isn't in TABLE2.e# then it sets it to "0" (TABLE1.e# 8 and 9 should have a counter of 0) This is for sqlplus.
e.g. TABLE2:
e# -- 1 2 3 4 5 5 6 7 7 1 2 3 4 5 UPDATE TABLE1 SET counter = ( SELECT COUNT(TABLE2.e#) FROM TABLE2 INNER JOIN TABLE1 ON (TABLE2.e# = TABLE1.e#) GROUP BY TABLE2.e#);