SQL Server 2008 :: Changing Large / Existing Table To Sparse Columns?
Sep 21, 2015
I have some huge tables (think 200+GB for a single table) which are excellent candidates for sparse columns. The tables have many columns which are defined with decimal datatypes (13,2) with a large percentage of them (over 50% in most cases- some as much as 99%) being 0.00. Since this is very expensive in terms of storage my idea is to set all the 0.00 values equal to NULL then set them as sparse. Across 100 or so identical databases, I have 5 such tables, with 20-40 columns in each table.
1.) three steps for each column in each table in each db.
Step 1: update table to allow for nulls
Step 2: update tabe set column=null where column =0.00
Step 3 update table set sparse columns
2.)
Step 1: Create entirely new table with sparse column definitions
Step 2: copy entire table, transforming 0.00 to null for affected columns via SSIS
Step 3: drop original table, rename new table to original name
We have an existing BI/DW process that adds large chunks of data daily (~10M rows) to an existing table, as well as using Deletes to remove stale data. This scenario seems to beg for partitioning to support switching in/out data.
After lots of reading on this, I have figured out the mechanics of the switching, bit I still have some unknowns about the indexes needed to support this.
The table currently has several non-clustered indexes, including one on the partitioning column - let's call that column snapshotdate. Fortunately there are no FKs involved, and no constraints.
Most of the partitioning material I see focuses on creating a clustered PK to assist with switching. Not sure if this is actually necessary, but assume I create one using an Identity column (currently missing) plus snapshotdate.
For the other non-clustered, non-unique indexes, can I just add the snapshotdate to the end of the index? i.e. will that satisfy the switching requirement?
I have recently converted a table with many columns that are now using sparse columns.However I have now found that a lot of my queries are not working.It seems to be due to the fact that I am using a lot of #Temp tables in my queries along with the SELECT... INTO... method to get data into the temporary table.This worked previously fine, however the sparse columns are not coming across to the #Temp table.Here is an example of what previously working fine:-
(This obviously just grabbed everything in the Client table) Select * INTO #temptable1 from ClientI have tried the following without any success (Knowing I now need to specify the column names) Select ClientId, Val1, Val2, Val3, Val4, Val5, Val6 INTO #temptable1 from ClientVal3, Val4, Val6 and Val6 are sparse columns.
The query runs fine, however when I try and do a select of any of the sparse columns from #temptable1, it just says they do not exist.After reading up. It would seem I can not use the SELECT... INTO... however I have not found any other alternatives that I can convert my code to?I have a lot of quite complex queries that use this method and I am looking for something that I can convert them all over to.
The only way to add a new column to an existing mapping that I know is to go to advanced editor and refresh. This however keeps only the default mapping (where the field names match), the rest is wiped out, so need to restore the mapping manually after that. Risky and annoying at the same time. Is there any alternative?
* SQL Server 2008 R2 * Database was created from a third party product. The product writes to the 3 tables that I need to make changes to 24/7 and downtime is not an option. All changes must be done live. * Database overall size is ~200 GB * The 3 tables I must update make up ~190 GB of that space. * Tables have no primary key or ID columns. Therefore, the data is highly fragmented. * Of the ~190 GB of space allocated for the tables, there is roughly 70 GB of actual data. * Rows of the table are not guaranteed to be unique. In fact, on one of the tables, tests were ran with a small sample of data and duplicates were very much evident.
What I'm trying to accomplish here is to get an ID column added to the 3 tables and set that ID field as the primary key. Doing so will force the data to become much less fragmented than it is currently and with purging and new inserts, eventually fragmentation will be nearly non-existent.
Problem: Making table changes on tables this large while data is constantly being added poses many risks and can cause data loss. This was tried on a smaller table than these three and the entire table was lost in the process. Restore from backup was needed to get back to most recent log backup point.
Original Solution: My original plan was to create a backup of each table and run the script below to migrate the majority of the data temporarily into the new table. I could then update the original table (which now would contain much less data) and then migrate the data back.
Original Solution Problem: The problem with the solution above is that it calls the DELETE function on the original table using the values from the temporary table. When there are duplicate rows, which have not all been inserted into the backup table yet, they will all be removed from the original table because there is nothing unique to separate them out. In my testing, I had 10,000 rows in the original table and ended up with 9,959 rows in the backup table.
Question 1: Is my approach to making these table changes reasonable? Question 2a: If so, how can I make sure I don't lose data as part of this temporary migration of the data to my backup tables? Question 2b: If not, what would be a better approach that isn't going to cause disruption to the application that INSERTs data 24/7 and won't have any risk of data loss?
Can we create the Partition on Existing Table?e.g Create table t ( col1 number(10,0), Col2 Varchar(10)) ;After the table Creation can we alter the table to partition the table.
I have a large table containing about 800 million rows with an average row length of about 1K. The columns in the table are char columns. I need to move the contents of this table into a similar table where the target columns are varchar. The original table column definitions are compatible with the target table but the reverse is not necessarily true. For example, one column is being changed from int to bigint. The table is partitioned.
So, what is the fastest way to migrate the data. I was thinking to unload each partition into a flat file and load the target table running multiple load streams? Is this a good way?
I have a scenario where I need to add a blank column to a table that is a publisher. This table contains over 100 million records. What is the best way to add the column? In the past where I had to make an update, it breaks replication because the update would take forever as jobs are continuously updating the table so replication can't catch up.
If I alter a table and add a column, would this column automatically get picked up in replication?
I need to recover some data in a table but i'm not 100% sure the right way to do this safely.
I'll need to query the two tables to compare the before and after but how do i go about restoring/attaching the backup database to SQL without causing conflicts?
If I restore, I assume this would just overwrite which is obviously the worst thing that can happen. if i attach the backup, how does this affect the current live DB? how do i make sure that it's not getting accessed and mistaken for the live DB?
Hi, Is it possible to change any fieldname of an existing table?I mean to say by TSQL statement.We know that we can alter the data type and width etc. But I haven't got any info about filedname change.So if it is possible Please help... And Is there any TSQL command to alter multiple columns in a single statement?
i need to add a datetime column to an exisitng table that has like 1.2 million records and its being accessed frequently but i cant afford to stop the db at all
whenever i do : alter table mytable add Updated_date datetime
it just takes too long and i have to stop executing the query after a couple of mins I am running sql express 2005 sp2. db size is over 3 gb but still under the 4 gb limit
can u plz advice on how to add this column. its urgent!!
I need to change the datatype of a very large table from smallint to int... What would be an ideal solution to get this done in least amount of time. May be I can try with ALTER but , I am not sure about the time it would take ...and the page splits etc..
Hi, Im trying to create a VERY wide table, with 1,000 columns of type varchar(MAX), nullable. The CREATE TABLE statement (both in SQL 2005 & 2008), gives the following warning:
Warning: The table "WIDE_TABLE" has been created, but its maximum row size exceeds the allowed maximum of 8060 bytes. INSERT or UPDATE to this table will fail if the resulting row exceeds the size limit.
When I insert data into the table, filling all columns with small, 10-byte string values, I get the following error:
Msg 50000, Level 16, State 1, Procedure sp_pivot, Line 118
Cannot create a row of size 15034 which is greater than the allowable maximum of 8060.
Id like to verify this observation: each row is created with 2000 bytes of offset data (2 byte * 1000 columns), 125 bytes for null bitmap (1000 columns / 8 bits) and some more wasted? row information. This leaves less than 6K for the data itself. But since not all columns can fit within the page, forwarding pointers in the row need to be created, 24 byte per column, which very quickly add up to more than 8K, thus the error. So the 8K limit is met for much less columns than the max 1024 column restriction.
Furthermore, in SQL 2008, SPARSE columns will not solve the problem (maybe save some metadata? space in case the columns are null, but if not, Im with the same problem again, or even worse, since now each value takes more storage space. The max 30,000 columns in 2008 is only for cases where the column values are really sparse
Is this the right observation? if so, is there a workaround besides splitting to multiple tables?
The query Im running so far is wrong, but here it is...
SELECT t.FromUserID, t.ToUserID, t.msg, u.UserName AS UserFrom, u.GroupID AS FromGroup, u2.UserName AS UserTo, u2.GroupID AS ToGroup FROM tmp_Messages t LEFT JOIN (SELECT UserID, GroupID, UserName FROM tmp_users WHERE GroupID = 3) u
[Code] .....
im missing the details of one of the users.I know what the problem is, I just cant figure out how to get this working without using temp tables, which I cant do in the production version.
I have multiple databases in the server and all my databases have tables: stdVersions, stdChangeLog. The stdVersions table have field called DatabaseVersion which stored the version of the database. The stdChangeLog table have a field called ChangedOn which stored the date of any change made in the database.
I need to write a query/stored procedure/function that will return all the database names, version and the date changed on. The results should look something like this:
Is there a good way to add columns to a table type?
I built several procs which make use of table-valued-parameters, and they work pretty nicely, until I need them to accept additional columns. Then I have to drop all the procs that use them, alter the types, and rebuild all the procedures, which is a huge pain in the rear.
Is there any good way (built in, or custom) to alter the def of a table type that's used as a parameter to multiple stored procedures?
what is best and quicker way to add 500 columns to existing table having 145 columns already.
Is there any way to avoid manual work of adding columns one by one in design mode or using script.
I have a TXT file (comma delimited) that contains all those columns names as a first row,but I am not sure if i can use DTS package to create table design having such sourcre TXT file.
I have a table. I want to add 2 date columns. One when we are inserting any record it will show and another whenever the record updated to record that.
I want to insert dummy data for the previous dates. How to insert those dummy dates in batch wise?
As I understood, if SPARSE is used on a column, which have many NULL marks, then the storage could be efficently used (we need less spaces to save NULL marks, hence a table which has many NULL marks with SPARSE property needs less storage than the same table, but without SPARSE. I created two table as follow:
/******* Table with Sparse ******/CREATE TABLE Sprstb( unsprsid INT IDENTITY(1,1) NOT NULL, Firstname varchar(20) NOT NULL, Lastname varchar(20) NOT NULL, Tel int NOT NULL, adress nvarchar(60) SPARSE NULL)/***** Table without Sparse*******/CREATE TABLE Unsprstb(unsprsid INT IDENTITY(1,1) NOT NULL,Firstname varchar(20) NOT NULL, Lastname varchar(20) NOT NULL, Tel int NOT NULL, address nvarchar(60) NULL)
I have populated the Sprstb with 5 Milion records. It needs 509,961 MB storage. Then I have copied this table into Unsprstb
SET IDENTITY_INSERT [dbo].[Unsprstb] ON Insert [dbo].[Unsprstb](unsprsid,Firstname,Lastname,Tel,adress) SELECT unsprsid,Firstname,Lastname,Tel, adress FROM [dbo].[Sprstb] SET IDENTITY_INSERT [dbo].[Unsprstb] OFF The Unssprstb need only 466,031MB !
That means the Table with SPARSE column need more storage, Why?
By the way, in table Sprstb column address has 1666198 Null mark (from 5000000)
I have 3 columns. I would like to update a table based on job_cd and permit_nbr column. if we have same job_cd and permit_nbr, reference number should be same else it should take max(reference number) from the table +1 for all rows where reference_nbr column is null
Let's say I have a function named MyCustomFunction and I want to ensure that it exists in the database. Let's say I have a create script for the function. I want the script to be runnable multiple times if needed. A common way to do this is to check for an object_id at the top of the function like this:
IF OBJECT_ID('MyCustomFunction') IS NULL DROP FUNCTION MyCustomFunction GO CREATE FUNCTION MyCustomFunction...
But is there a more elegant way to do this? For example, instead of dropping and recreating the function, is there a way to simply exit from the script and do nothing if the function already exists? Something like this:
IF OBJECT_ID('MyCustomFunction') IS NULL RETURN GO CREATE FUNCTION MyCustomFunction...
We have a SQL 2008 Instance existing on active/passive cluster with 2 nodes running on Windows server 2008 R2 Ent. Edn.
Now we need to install another SQL instance on this cluster. So what are the prerequisites apart from new IP address? Do we need shared disks or can the existing disks can be utilized?
After reading some comments here I decided to look at tables to see if any had a clustered index that was a unique identifier. Yep. So if I have a table with a unique identifier as the primary key/clustered index and an identity column that is indexed, I would like to make the identity a clustered index (maybe even the primary key) and make the unique identifier a unique non-clustered index (not the primary key).
Does this sound reasonable?If I do this will I need to drop and recreate the other indexes? Or maybe just rebuild the other indexes?
Currently:
CREATE TABLE Payments ( IDX INT IDENTITY(1,1) NOT NULL, GUID UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID()), .....
-- many other columns
); GO ALTER TABLE [dbo].[PAYMENTS] ADD CONSTRAINT [PK_PAYMENTS_GID] PRIMARY KEY CLUSTERED ([GUID] ASC); GO CREATE NONCLUSTERED INDEX [IX_Payments_ID] ON [dbo].[PAYMENTS] ([IDX] ASC); GO
Would like:
ALTER TABLE [dbo].[PAYMENTS] ADD CONSTRAINT [PK_PAYMENTS_IDX] PRIMARY KEY CLUSTERED (IDX ASC); GO CREATE UNIQUE NONCLUSTERED INDEX [IX_Payments_GUID] ON [dbo].[PAYMENTS] (GUID ASC); GO
We are using sql 2008r2 standard edition.One of our Production database is using default isolation Readcommitted.The transactions also using read committed. But we want change isolation level to read comitted snapshot isolation and test it to avoid deadlocks.
Is it possible to set in the transaction level for some queries or do we need to change entire database isolation level by using alter database "ALTER DATABASE AdventureWorks2008R2 SET READ_COMMITTED_SNAPSHOT ON"
I've got two databases on the same server and replicate some tables from one database to another.The replication is configured so not to drop the table if it exists, but to delete the data based on the filter if one exists. There are two tables on the subscriber that have some extra columns.
I get "field size too large" error when trying to replicate them. Is there a workaround without having to make the publisher and the subscriber tables identical by schema?
I have the default trace on a SQL Server 2008R2 instance enabled and found today that there is a gap of nearly 4 minutes in the trace during a time of the day when there most certainly is not going to be a 4 minute window of nothing.
What if anything could cause the default trace to have a gap like this? The SQL Server Instance (against my preferences) is hosted on VMware however it has its on HOST and so its resources are not being shared with any other server. The data & log files reside on different parts of the SANS. Our IT & Network admins are looking into the issue on their end but when I looked and found a near 4 minute gap in the default trace it hit me that this could be something above/outside of SQL Server.