For my work I am now learning Sql server 2005 and I have been given a database that has been set up by someone else to work with. It is my job to get the database ready for use in reports.
My problem is that the current database has one huge table with almost 8GB of data. The table contains data from 2004 to present (and growing) from 14 different countries. The reports we use are mostly per country, but we also want to compare the 14 countries to eachother for say, whole 2006. At the moment the table is stored in one single file instead of using partitions.
I believe partitions can give a good performance boost when running the queries. But how do I do this? Currently the country codes are just plain text, can they be used for partitions?
We have a huge table with around 250 million records and have implemented SQL server 2005's new table partitioning feature. Now the data seems to be evenly spread across 20 different filegroups ( each 5 GB approx ) for the same table that was occupying 100 GB itself in the PRIMARY filegroup earlier.
Still the query response times have not come down drastically but we could see a good improvement in the execution plans now.
WE ARE USING RAID 5 IN OUR PRODUCTION ENVIRONMENT. ANY IDEA / THOUGHT ON HOW TO PLACE THE PARTITIONED FILEGROUPS AND THE LOG FILES IN THE RAID 5 (BTW , I'm very new to RAID concepts , any detailed instruction would be helpful ).
We have a very huge database that stores 12 years of data(120 Million records). But our application mainly accesses past 3 years data i.e , the queries would scan the 120 million records even when it actually has to scan 30 million records alone (for 3 years).
Since few other important applications needs access to all the 12 years data, we are in a position to have 12 years data in the same database.
Right now we are looking for an approach that would help us to efficiently access the 3 years data alone and boost the performance.
1. Will SQL server table paritioning help in this scenario ?
Or
2. Indexed views would help us ? Is it possible to create indexed views based on year range and access the views in the stored procedures ?
Hi guys , assuming right now I already create partition function (PF_Date) and partition scheme (PS_Date). Let say I would like to implement the partition on the existing tables ( eg: transaction table which is in PRIMARY filegroup), how am I switch it from PRIMARY to PS_Date ? Is it I have to re-create the particular table then only able to put in the partition scheme? Hope can get any assistance here. Thanks alot.
I have a table that contains records of transactions with ID column is primary key
I use partition follow ID column, each partition have 1 million records.
CREATE PARTITION FUNCTION [pfTBLTRANS_ID](int) AS RANGE LEFT FOR VALUES (1000000, 2000000, 3000000, 4000000, 5000000, 6000000, 7000000, 8000000, 9000000, 10000000)
CREATE PARTITION SCHEME [psTBLTRANS_ID] AS PARTITION [pfTBLTRANS_ID] TO ([GROUP01], [GROUP02], [GROUP03], [GROUP04], [GROUP05], [GROUP06], [GROUP07], [GROUP08], [GROUP09], [GROUP10], [GROUP11])
But now I have more records with IDs that are greater than 11.000.000. So how can I add more partitions to this table ?
First off, I'm not very familiar with SQL Server. I need some guidance on what the best path to take is for this as it may not even be table partitions.
I have a huge table (155 million rows) and it's gotten so large than I can't even delete a large set of rows from it (i.e. delete everything older than 6 mo, which would be ~100 million rows). When trying to run a delete like this, it just goes for a LONG time and then just eventually runs out of memory.
The current data in this table can actually be completely cleared out soon (after Feb 1st) and I plan to do this with TRUNCATE TABLE, or just DROP and recreate. Once I do this, I want to create a way to keep this table moderately sized so it never grows that large again and it seems table partitions may be the way to go for this?
I'd like to keep the last 6mo of data in it (I have a datetime column to keep track of this). Anything older I'd like automatically removed. Can I do this with table partitioning? Create 6 partitions that store the 6 most recent months of data and everything older automatically gets dropped off?
If not partitions, what do you suggest to keep this DB modest size?
SQL 2014 I've inherited a db that has several partitioned tables. They are partitioned by month. We're approaching the last partition, 11-30-2015, so we need to extend the tables. My question is how do I do this? There are Partition Schemes and Partition Functions setup at the db level. I've figured out how to ALTER those. Next I go to a table that I know is partitioned, right-click Storage and select Manage Partitions.Â
My only option is to "create a staging table for partition switching". Not knowing what switching is, I'm not sure if this is what I want to do. All I want to do is add new partitions to the table - and remove some of the old ones since they are empty due to archiving of data.
So, what is the proper steps to adding new partitions to a table that is already partitioned?
I have 3 columns. I would like to update a table based on job_cd and permit_nbr column. if we have same job_cd and permit_nbr, reference number should be same else it should take max(reference number) from the table +1 for all rows where reference_nbr column is null
I have table having around 100 million rows.Everyday we have an ETL process in which table will be trucnated and relaoded. Will creating a partition on the table increase the inserting speed?
I imported a SQL Table into SQL DataBase, But I can not update this table even with SQL Server management Studio When I change any data on mentioned table above, Red exclamation sign appears left of the record . How can I correct this problem? Thanks.
Does anyone have any statistics on the performance gains one can get using raw partitions. The database in question is very IO intensive and performs about 1,000,000 inserts/updates per select.
I have a server that has SQL Server installed on both C and D drives. The SQL Server software is currently running from the C drive and the live databases and backups are stored on the D drive.
I need to have everything on the D drive. Is there an easy way to make the registry point to the D drive without reinstalling SQL Server? The software will needs to run from the D drive because the C drive is running out of disk space. I will also need to delete the whole C:mssql directory.
assuming that you have two databases, the OLTP db and the OLAP db (take not that both have the same structure -- archiving purposes)... using table partitioning, is there a way where we can move 1 partition from the OLTP db to the OLAP db???
i'm actually trying to use this example with both tables in the DB.. I tried to modify to use two databases but sql server is unable to move the partition...
ALTER TABLE [Production].[TransactionHistory] SWITCH PARTITION 1 TO [Production].[TransactionHistoryArchive] PARTITION 2;
In SQL Server 2005 database we have partitioned a very big table into 30 partitions each holding few million of records.
Im just curious to know whether there are some configuration related to processors or system hardware in order to benefit from partitioning ? (Ex : If we have multiple processors Whether they need be configured to do a parallel processing ? )
Any real time experience (other than referring links) would be really helpful for me.
I have a quick question regarding merging cube partitions. If I create partitions sliced by date (let's assume we have year level partitions like 2006,2007,2008...) Later, if I want to merge selected partition to another partition , for example I have history partition and 2006 partition and I want to merge 2006 to history partition then I can simply merge them using ' merge partition' through Management Studio.
My question is that in script, History partition has condition which is where clause and restricted by year level (i.g. WHERE date < '01-01-2006' ) ; however after merging , script won't change like WHERE date <'01-01-2007').
If so, whenever I merge partitions then I have to alter the script as well based on selected merge partion? I need to refresh history partition once a month;however even if I merge 2006 partition , once I reprocess history partition then it only process what it was wrote in script. So, after reprocessing 2006 data won't appear in this history partition. So, wondering it's mentatory to alter the script once partition is merged. Please give me some comments on this issue. Thanks in advance.
I am testing horizontal partitions to see whether it is a feasible option for a project. IF I have a composite Primary Key and the constraint column (a part of the Primary Key) that helps the partitioned view is defined with DateTime Data Type, select on a restricted set of data through a partitioned view still tries to access all the tables instead of just one table that contains the data. Is this the case or am I missing something ?
CREATE TABLE [dbo].[tst01] ( [Dt] datetime NOT NULL , [TID] int NOT NULL , [Nm] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ) ON [PRIMARY] GO
CREATE TABLE [dbo].[tst02] ( [Dt] datetime NOT NULL , [TID] int NOT NULL, [Nm] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ) ON [PRIMARY] GO
ALTER TABLE [dbo].[tst01] ADD CONSTRAINT [PK_tst01] PRIMARY KEY CLUSTERED ( [Dt], [TID] ) ON [PRIMARY] GO
ALTER TABLE [dbo].[tst02] ADD CONSTRAINT [PK_tst02] PRIMARY KEY CLUSTERED ( [Dt], [TID] ) ON [PRIMARY] GO
ALTER TABLE [dbo].[tst01] ADD CONSTRAINT [CK_tst01] CHECK (Dt between '11/1/2002' and '11/30/2002') GO
ALTER TABLE [dbo].[tst02] ADD CONSTRAINT [CK_tst02] CHECK (Dt between '12/1/2002' and '12/31/2002') GO
insert into tst01 values('11/1/2002', 1, 'SS') insert into tst01 values('11/2/2002', 2, 'KK') insert into tst01 values('11/3/2002', 3, 'DD') Go
insert into tst02 values('12/1/2002', 1, 'LL') insert into tst02 values('12/2/2002', 2, 'MM') insert into tst02 values('12/3/2002', 3, 'GG') Go
CREATE VIEW vtst AS SELECT * FROM tst01 UNION ALL SELECT * FROM tst02 Go
I'm considering using horizontal partitions to separate my data by year. For example, SomeTable_2004, SomeTable_2003, etc. This works well for backups, maintenance, etc. because I'm working with 150+ GB of data. I'll be a partitioned view for queries.
However, I'm new at this and have a few questions. I would also like to do partitioned updates or inserts. But I need to make sure that the tables don't use similar primary keys. Does that make sense? I need to make sure that the primary keys from the first table are not used again in the second table.
SomeTable_2003 primary keys: 1,5,8,9,15
SomeTable_2004 primary keys: 2,3,4,10
I don't really care what keys are used on what table, as long as they are different. I have apps that already use this data, and I don't want to change the application logic.
how to set up the partitions.I have a transaction table with 50 million records that's very hard to query. it holds data for the last 4 years but the application only ever looks at the last 6 months so i believe this is and ideal candidate for partitioning.
Would it be better to
1) create a partition based on each year for all data so would have a 2015, 2014, 2013, 2012? 2) create 1 partition based on month for this years data then 3 based on year so would have jan,feb,march,april,may..., 2014, 2013, 2012
For 1) would you have to perform some maintenance at the turn of each year for accommodating the next years data. For 2) although this would give better performance as query's are mostly in the last 6 months wouldn't this have more maintenance to move month data to year partitions come the turn of the year and then create the next years months partitions.
I just inherited a dev box, and need to do some performance analyzing on a 40 gig db for a client. Time is of the essence!
My question is that this dev box only has one disk partition (c: drive). Is it a huge deal that I don't have the db system files on one drive, with the data files on another, and tempdb on another,etc.....
I have a requirement that I need to reload the last seven days worth of data each night to ensure that we pick up late arriving and updated records. To avoid having to do updates we delete the last seven days data and reload.
I was wondering if it is possible to set up the table as a partition, paritioned on a value (OLD, NEW) or similar.
The job would set the last day in the NEW partition to be old, the theory being that this would cause the rows to move to the OLD parition, and then truncate the new partion rather than deleting. The last seven days data could then be inserted into the empty new partition.
My questions is 1. Is my theory about the data moving from one partition to another correct. 2. Can I actually truncate and individual parition, 3. Do you think it will perform any quicker. We would expect data in the range of 100K to 500K rows in the seven days and will store up to 4 years of historical data.
I have been asked to move the indexes on our membership database tables to seperate partitions on the server. This is a new concept to me and thought I could use some advice on how to go about doing it.
I have 3 partitions using a year grouping. Current year, previous 4 years, older than 5 years. I have two measure groups, one is a distinct count, so I actually have 6 partitions.I also use usage based optimization to build my aggregations. Should each partition have a separate aggregation or should there be one for each measure group?
We are currently developing an OLTP application, which will need to purge data when it becomes older than 1 hour.Rather than having a process which deletes rows periodically (and risks locking the tables), I am considering using partitioning on a rolling 15 minute window.The idea is to have 5 active partitions, with the 5<sup>th</sup> one being swapped out, merged and a new one split in. This will allow data to live to a max of 1 hour 15 mins,which is acceptable.
Actually, I will have 8 partitions; there will be 4 partitions set in the future, just to ensure when the last partition is split, there isn’t any data movement, as the newest partition will be empty.I am wondering if there will be any performance issues due to partition swapping, merging and splitting every 15 minutes? The application will have a high volume of users when live. I think this should be a better option that continually deleting from the tables.
I am using Sql 2005 and merge replication with push subscriptions. I have several dynamic join filters on some of my tables.
The join filters all use a central table that maps say a server location name (something that is returned from HOTNAME() in my case) to an for a store branch ID. This is a retail system database.
When I add a new new subscription I update this table with the new server location name and it's corresponding branch ID. My filtered tables all have a foreign key in them that is the branch ID. I can then effectively join from the server location name to a Branch ID.
What I have noticed is that if I update one row in the map table, sql server will re-generate all partitioned rows for all subscribers, even for rows that haven't been updated.
The net result is that when I add a subscription, my existing subscriptions all get about 52,000 row updates.
Am I seeing this because I said my partitions will overlap when I created the table articles?
First of all, we are using SQL Server 2005 with a SQL Mobile subscriber and we are attempting to use Data Partitions on our current database schema which contains associative tables for many-to-many relationships.
We have two tables, a User table and an Audit table. A user can be assigned more than one Audit. An Audit can be assigned to more than one User. So an AuditUser associative table exists. If data partitions are used based on User, then any Audits that are assigned to one or more users should be copied to the proper partition for each User (the msmerge_current_partition_mappings table with the proper partition_id values).
In order to insert records with such a schema, the following steps occur in order:
Insert new row into Audit table with new rowguidInsert entry into AuditUser table associating the auditguid with every userguid that is assigned this audit.
Merge replication triggers are fired on insert of the Audit row and another one for the insert of the AuditUser row.
When the Audit row is inserted, the replication trigger follows the following logic:
Inserts a copy of that row into the msmerge_contents table. Evaluates the row to determine which partition(s) this row should be copied to as well (msmerge_current_partition_mappings table). To do this, it checks to see if the AuditGuid is referenced in one or more AuditUser rows. Since we haven€™t inserted the AuditUser row at this point, the trigger€™s logic doesn€™t find a partition to copy this row to.
When the AuditUser row is inserted, the replication trigger performs the same logic as with the Audit row, it:
Inserts a copy of that row into the msmerge_contents table.Evaluates the row to determine which partition(s) this row should be copied to as well (msmerge_current_partition_mappings table). Since the row meets the criteria for one or more partitions, it is copied to the msmerge_current_partition_mappings table for each partition that exists.
When replication occurs, we see only the AuditUser rows copied down to our device, and not the corresponding Audit rows. Now that we understand the triggers, it is plain to see why. If the AuditUser row could be inserted first, then the trigger on the Audit row would copy that row into the proper partitions and all would work well. However, the Audit row must be inserted first, so that foreign key relationship constraints are preserved.
It seems that the Update trigger on the AuditUser row actually walks the relationships and copies any related child rows to the msmerge_current_partition_mappings table.
I am looking for information/suggestions in regards to portioning data across a large SAN. The database is several TB in size, and we are looking to partition the data on a date so that the current data is always the fastest available and history can be set as read only but still be available.
If the server has 8 drive letters how can I ensure that the 4 data drives are stripped across the most physical drives for the most through-put??
Are there any specific questions I need to ask the server technicians?
The environment here is SSIS ETL feeding a Fact Table. The Fact Table is pulled into SSAS as a cube and reporting services are handled there. I am on the ETL side and don't pretend to know all the processing that happens with cubes, etc.
What we are trying to accomplish is to add partitions to a Cube via the ETL processing. The partitions should be incremented by Day, i.e. 20070501, 20070502, etc.
This is currently processed manually by the Reporting developer and we are looking for an automated process to reduce errors and hand work.
I have explored the following objects: Partition Processing DataFlow Destination and did not find much documentation or examples on it's use. If you have any information on this stage, please reply with such.
The other option is the Analysis Services Processing Control Flow. I understand that we can process Analysis Server objects as part of our package. Is there a way to incorporate a Partitioning Script in this object? If so, how. Again, I did not find detailed examples on the use of this object.
If you have experience with either of these, please feel free to reply. I appreciate any and all comments.
I am currently using SSIS ,MS Sql Server 2000 database and 2000 Analysis Services for the cube. I am creating a new table everyday and giving name like day_20080504, day_20080505 etc... So then I go to Analysis Services and process dimensions(incremental) AND Create a new partition using old partition as a template.
My first question is how to create a new partition everday and use old partition as template...(Almost same except database table) My Second question : Can I do this on 2000 Analysis services or Should I convert my cube into SSAS?
I have a table with millions of rows and about 70 columns that move through a number of states (11 possible states in all) from "New" via various states to "Processed" and eventually to "Archive" (there's a complicated state diagram that I won't bore you with)
Movement between states is based on a heap of business logic including the move to Archive (not just dates).
Different sorts of processing (querying and update both by users and overnight processing) are carried out on the data according to its state.
Maintaining the indexes for optimum performance across the board is a headache.
We have two problems in that we want better query performance and want to be able to easily switch out objects that are in the Archive state.
I had in mind partitioning the table (and its indexes) on state so that : (a) Queries would be directed only at the appropriate partition (that is always use "where state=" as part of the query) (b) The Archive partition could be swapped out of the table periodically
In my test setup 10 of 11 partitions are in [PRIMARY] but Archive is in a different filegroup.
Query performance is OK - execution plans look good.
However my update performance is now appalling when moving between any two states (10 times as long as on the unpartitioned table).
I understand that when you update a column which is used as a partition key it will cause the row to "move from one partition to another" as it says in another post.
Fine - because that's exactly what I want - logically.
I can also understand that moving from one filegroup (and hence the underlying file) to another must mean that the data has to physically move.
However is the data physically moving whenever you move between partitions or what's going on to cause such a degradation in performance ?