SQL Server 2008 :: Migrate Contents Of Large Table From One DB To Another?
Mar 3, 2015
I have a large table containing about 800 million rows with an average row length of about 1K. The columns in the table are char columns. I need to move the contents of this table into a similar table where the target columns are varchar. The original table column definitions are compatible with the target table but the reverse is not necessarily true. For example, one column is being changed from int to bigint. The table is partitioned.
So, what is the fastest way to migrate the data. I was thinking to unload each partition into a flat file and load the target table running multiple load streams? Is this a good way?
* SQL Server 2008 R2 * Database was created from a third party product. The product writes to the 3 tables that I need to make changes to 24/7 and downtime is not an option. All changes must be done live. * Database overall size is ~200 GB * The 3 tables I must update make up ~190 GB of that space. * Tables have no primary key or ID columns. Therefore, the data is highly fragmented. * Of the ~190 GB of space allocated for the tables, there is roughly 70 GB of actual data. * Rows of the table are not guaranteed to be unique. In fact, on one of the tables, tests were ran with a small sample of data and duplicates were very much evident.
What I'm trying to accomplish here is to get an ID column added to the 3 tables and set that ID field as the primary key. Doing so will force the data to become much less fragmented than it is currently and with purging and new inserts, eventually fragmentation will be nearly non-existent.
Problem: Making table changes on tables this large while data is constantly being added poses many risks and can cause data loss. This was tried on a smaller table than these three and the entire table was lost in the process. Restore from backup was needed to get back to most recent log backup point.
Original Solution: My original plan was to create a backup of each table and run the script below to migrate the majority of the data temporarily into the new table. I could then update the original table (which now would contain much less data) and then migrate the data back.
Original Solution Problem: The problem with the solution above is that it calls the DELETE function on the original table using the values from the temporary table. When there are duplicate rows, which have not all been inserted into the backup table yet, they will all be removed from the original table because there is nothing unique to separate them out. In my testing, I had 10,000 rows in the original table and ended up with 9,959 rows in the backup table.
Question 1: Is my approach to making these table changes reasonable? Question 2a: If so, how can I make sure I don't lose data as part of this temporary migration of the data to my backup tables? Question 2b: If not, what would be a better approach that isn't going to cause disruption to the application that INSERTs data 24/7 and won't have any risk of data loss?
I have some huge tables (think 200+GB for a single table) which are excellent candidates for sparse columns. The tables have many columns which are defined with decimal datatypes (13,2) with a large percentage of them (over 50% in most cases- some as much as 99%) being 0.00. Since this is very expensive in terms of storage my idea is to set all the 0.00 values equal to NULL then set them as sparse. Across 100 or so identical databases, I have 5 such tables, with 20-40 columns in each table.
1.) three steps for each column in each table in each db.
Step 1: update table to allow for nulls
Step 2: update tabe set column=null where column =0.00
Step 3 update table set sparse columns
2.)
Step 1: Create entirely new table with sparse column definitions
Step 2: copy entire table, transforming 0.00 to null for affected columns via SSIS
Step 3: drop original table, rename new table to original name
Is there an easy way to compare the contents of objects between 2 different databases? For example, say I had 2 databases, My_DB_1 and My_DB_2, each with a SEC_User table. Say I wanted to do an object-to-object comparison between databases to see if there were any differences. Here's some sample SQL:
use My_DB_1 select * from sys.dm_sql_referencing_entities('dbo.sec_user', 'object')
use My_DB_2 select * from sys.dm_sql_referencing_entities('dbo.sec_user', 'object')
Say that the result sets above both returned a SEC_GetUser sproc object ref. Is there a way to write SQL that will compare the SEC_GetUser sprocs (and other objects in the above rowsets) from both databases? For example, if My_DB_1.SEC_GetUser returns an extra column in the result set than My_DB_2.SEC_GetUser then I'd like my comparison SQL to return a single column "IsDifferent" with SEC_GetUser as a row....
I have a requirement to create a package that takes all files in a given folder and adds them to a single archive (.zip) file. I've tried several methods using 7zip and while I can create archives for every file in the directory I can't seem to get them into a single .zip file.
hi. I have a question. I work in BI (with sql server 2000) and this year (2008) my idea is migrate to 2005, but, consulting to expert in Microsoft technology, they suggested wait to 2008, sql 2005 have many problem, sql server 2008 was completed re-write.
The question is, what problem have sql server 2005?
I've got two databases on the same server and replicate some tables from one database to another.The replication is configured so not to drop the table if it exists, but to delete the data based on the filter if one exists. There are two tables on the subscriber that have some extra columns.
I get "field size too large" error when trying to replicate them. Is there a workaround without having to make the publisher and the subscriber tables identical by schema?
We have an existing BI/DW process that adds large chunks of data daily (~10M rows) to an existing table, as well as using Deletes to remove stale data. This scenario seems to beg for partitioning to support switching in/out data.
After lots of reading on this, I have figured out the mechanics of the switching, bit I still have some unknowns about the indexes needed to support this.
The table currently has several non-clustered indexes, including one on the partitioning column - let's call that column snapshotdate. Fortunately there are no FKs involved, and no constraints.
Most of the partitioning material I see focuses on creating a clustered PK to assist with switching. Not sure if this is actually necessary, but assume I create one using an Identity column (currently missing) plus snapshotdate.
For the other non-clustered, non-unique indexes, can I just add the snapshotdate to the end of the index? i.e. will that satisfy the switching requirement?
I have the default trace on a SQL Server 2008R2 instance enabled and found today that there is a gap of nearly 4 minutes in the trace during a time of the day when there most certainly is not going to be a 4 minute window of nothing.
What if anything could cause the default trace to have a gap like this? The SQL Server Instance (against my preferences) is hosted on VMware however it has its on HOST and so its resources are not being shared with any other server. The data & log files reside on different parts of the SANS. Our IT & Network admins are looking into the issue on their end but when I looked and found a near 4 minute gap in the default trace it hit me that this could be something above/outside of SQL Server.
I have a query below which filters detail field in the #TempLogins table. The details field is a text field which contains many types of text strings, some containing urls that have parts like "ResultID=5" which is what is contained in the ResultIDSearch and ResultSetIDSearch fields. The records with entries like "ResultID=5" are the ones I'm trying to filter for.
The problem I have is that the query takes way too long to run. The TempLogin table has around 200 K records and the TempSearch table has around 80 K records.
select * from #TempLogins a where exists (select 1 from #TempSearch t1 where a.detail like '%' + t1.ResultIDSearch + '%' or a.detail like '%' + t1.ResultSetIDSearch + '%')
I have a well-structured but also very large binary data-set that is generated by a C++ application every five minutes. The data needs to be accessed by SQL applications. Since data is generated every five minutes, performance is key, both for write and read. The data set is about 500MB.If data is written to the file system, the write performance doesn't involve SQL server. For reading it, I have a CLR to read the portions of the data that I need based on offset and length. That works and is very fast. The problem is that data is stored in the file system, so it is not self-contained within the database.
A second option that I haven't explored yet, is to write the data into a table as VARBINARY(MAX). I would read the data using SUBSTRING with appropriate offset and length. Performance of SQL write/read of binary data of this size, and whether there is a third option I haven't thought off. I'm using SQL Server 2014.
Currently we has a database of size about 300G. Because our backup system failed some time past we were left with a transaction log file which grew to about 160G. However our backups are working again and everything is working fine. My understanding is that now the transaction log file is practically empty but the capacity remains at 160G.
When you delete records the deleted transactions are going to get logged to the transaction file. My understanding is when a backup is done these transactions get discarded out of the transaction file.
could I make use of this relatively large transaction file and start deleting transactions without out actually adding to the transaction file size.
The plan is to delete records from logging tables that are not referenced to by any other table without this increasing the transaction log file.For example over a period of a few weeks we can delete a chunk of records from a table. Then after it has completed a backup we can delete another chunk of records out of this table until we have got the table down to the records that we now need.Will this work?
I downloaded Sql server 2008 Nov CTP and am gonna use that. Just wanna make sure if I can migrate from this version to the release version? Coz the release won't come till the end of next year (?!) so...
We have a database. It is enabled for mirroring. We need to delete the old records. That is around 500k records from a table. But it has foreign key relation. How to do in Production servers these kind of deletes?
Our monitoring tool shows that our production system periodically experiencing large rate - up to 800 memory pages/sec. How to find out which particular queries, S.P., processes that initiate this?
I am attempting to populate an empty table from a csv file. I tried to use BULK insert but i got an error saying it wasn't supported. I am using Visual Web Developer 2008 Express Edition and I am using the query window. If someone can direct me to a link, or give me the simple sql script to do it, that would be most appreciated. Thanks in advance!
I have a Windows 2008 R2 Always on Cluster with 3 nodes (two in the primary site and one in the DR site).
Primary Site: -Primary Site Server1 -Primary Site Server2
DR Site 1 (to be decommed): -DR Site Server1
Our company is planning on decommissioning the DR site. But before we do this, we want to add a 4th site to the cluster. Migrate the data...and then decommission the original DR Site.
Is it possible to have this configuration:
Primary Site: -Primary Site Server1 -Primary Site Server2
DR Site 1 (to be decommed): -DR Site Server1
DR Site 2 (NEW DR Site): -DR Site Server1
IF this is possible, do I simply add the new DR site to the existing cluster (same steps as adding the first DR node to the cluster when the cluster was originally configured? or are there special steps?
Hi All,I have come up against a wall which i cannot get over.I have an sql db where the date column is set as a varchar (i know, should have used datetime but this was done before my time and i've got to work with what is there). The majority of values are in the format dd/mm/yyyy. However, some values contain the word 'various'.I'm attempting to compare the date chosen on a c# .net page with the values in the db and also return all the 'various' values as well.I have accomplished casting the varchar to a datetime and then comparing to the selected date on the .net page. However, it errors when it comes across the 'various' entrant.Is there anyway to carry out a select statement comparing the start_date values in the db to the selected date on the .net page and also pull out all 'various' entrants at the same time without it erroring? i thought about replacing the 'various' to a date like '01/01/2010' so it doesn't stumble over the none recognised format, but am unsure of how to do it.This is how far i have got: casting the varchar column to datetime and comparing. SELECT * FROM table1 WHERE Cast(SUBSTRING(Start_Date,4,2) + '/' + SUBSTRING(Start_Date,1,2) + '/' +SUBSTRING(Start_Date,7,4) as datetime) '" + date + "'"Many thanks in advance!
I'm inserting from TempAccrual to VacationAccrual . It works nicely, however if I run this script again it will insert the same values again in VacationAccrual. How do I block that? IF there is a small change in one of the column in TempAccrual then allow insert. Here is my query
INSERT INTO vacationaccrual (empno, accrued_vacation, accrued_sick_effective_date, accrued_sick, import_date)
Hi, I am relatively new to this stuff. I am using Microsoft SQL Server Management Studio Express (9.00.1399.00) Can someone tell me the way to get a table and its content to another database (I use two at webhosts4life) Or perhaps a way to export the data of a table so I can do it at a later stage.
Is that at all possible with this program or do I have to use the non-express version?
Hello, Its hard trying to explain this. I have 3 tables Table 1 is where the users are stored, each user has a username and a userrank Table 2 is where the points that decides the userrank are stored Table 3 contains the available userranks like this
Table 1 (user_list) looks briefly like this:username nvarchar(20),userrank int, -- Reference to Table3 id... alot more fields Table 2 (settings_profile) looks like this:username nvarchar(20),total_active_points int,... some more fields Table 3 (data_ranks) looks like this:id int primary key auto inc,rankname nvarchar(20),min_pts int,max_pts int
Points get added to table 2 whenever they do something that generates points on the site. Points also get withdrawn every 7 days, so a user can only collect points for 7 days, on the 8th day, all points he earned on the 1st day is reduced from the current points with this code: WHILE (SELECT @username = username, @id = id, @temp1 = ap_sentmails, @temp2 = ap_createdthreads, @temp3 = ap_createdanswers, @temp4 = ap_signguestbook, @temp5 = ap_blogcomment, @temp6 = ap_createblogentry, @temp7 = ap_profilefirsttime, @temp8 = ap_profilephoto, @temp9 = ap_activateguestbook, @temp10 = ap_addnewfriend, @temp11 = ap_superguruvote, @temp12 = ap_forumtopicvote, @temp13 = ap_labervote, @temp14 = ap_funstuffitemvote, @temp15 = ap_movievote, @temp16 = ap_actorvote, @temp17 = ap_money_new WHERE (created < Dateadd(dd, -7, @todaysdate))BEGINSET @sum = 0SET @sum = @temp1 + @temp2 + @temp3 + @temp4 + @temp5 + @temp6 + @temp7 + @temp8 + @temp9 + @temp10 + @temp11 + @temp12 + @temp13 + @temp14 + @temp15 + @temp16 + @temp17UPDATE settings_profile SET total_active_points = total_active_points - @sum WHERE (username = @username)DELETE FROM konto_daylist WHERE (id = @id)END Now my question is this, i want to loop thru the table A, collect all usernames inside of it, then run it against table b and table c to determine the current rank of the user.Something like this... DECLARE @username nvarchar(20)DECLARE @pts int, @rank int ...something that starts a loop thru table A (user_list) and get the username into @username... SELECT @pts = total_active_points FROM settings_profile WHERE (username = @username)-- Determine the rank here, by compairing the points the user have against the pointstabel in table data_ranksSELECT @rank = id FROM data_ranks WHERE (pts_min => @pts AND pts_max < @pts)UPDATE user_list SET rank = @rank WHERE (username = @username) ...next persion in the loop... This SP runs once a day and will first reduce the points from 8days ago, then it will run thru all the users and determine their new rank... But how do i loop thru all the users? with a cursor?
Can anybody help me with the generation of Table of Contents for a report using SQL Server 2005 Reporting Services. Let me ellaborate. The scenerio is...i m having a report of lets say 500 pages grouped on employees, showing the performance of each employee between specific date range. Now if the manager prints the report of 500 pages he will be more intersted to jump directly to a perticular employee's page which means my printed report had to have a Table of Contents on my grouped criteria (which in this case is employee number). I would really appriciate if someone can suggest me a solution for this.