SQL Server 2012 :: Fast Data Loading With Partition Switching Strategy
Jul 28, 2015
I’m looking for clearity on partition switching. The idea is to use many BULK INSERT statements into table dbo.X_n in parallel and when BULK INSERT for table dbo.X_n is completed, switch dbo.X_n into dbo.bigdaddy. I think this is the fastest way to upload a couple hundred GB of data.
In learning about partition switching (in part) from The Data Loading Performance Guide under Partition SWITCH, I hear the instructions to say copy the main table exactly to become a target. But in that same step (#1), I read that we need to change the default file group of the target (dbo.X_n) from the default file group. Then it says I need to match indexes and lists the filegroup as something we need to match with the main table.
As an overview of the partition switching strategy, I think the whole point of BULK INSERT with partitioning is to have seperate files (in same group) to enable concurrent uploading where each table has its own file. Once the upload is completed to a table (dbo.X_n) then we do the partition switch into the main table (dbo.bigdaddy). The data we just uploaded doesn’t actually move, just the metadata for it.
“Don’t have the same filegroup on your target as the main table. You must have the same filegroup on your target as the main table.”
I’m looking for clearity on partition switching. The idea is to use many BULK INSERT statements into table dbo.X_n in parallel and when BULK INSERT for table dbo.X_n is completed, switch dbo.X_n into dbo.bigdaddy. I think this is the fastest way to upload a couple hundred GB of data.
In learning about partition switching (in part) from The Data Loading Performance Guide under Partition SWITCH, I hear the instructions to say copy the main table exactly to become a target. But in that same step (#1), I read that we need to change the default file group of the target (dbo.X_n) from the default file group. Then it says I need to match indexes and lists the filegroup as something we need to match with the main table.
As an overview of the partition switching strategy, I think the whole point of BULK INSERT with partitioning is to have seperate files (in same group) to enable concurrent uploading where each table has its own file. Once the upload is completed to a table (dbo.X_n) then we do the partition switch into the main table (dbo.bigdaddy). The data we just uploaded doesn’t actually move, just the metadata for it.
When I read the instructions linked above, I hear “Don’t have the same filegroup on your target as the main table. You must have the same filegroup on your target as the main table.”
When you load the data into a new partition table, can it to done online without any downtime? because I have few tables that are around 250 gigs and more.
I have been developing a genealogy application using a SQL Server 2000 database and ASP .NET 2.0. In this application a process, Ged.Parse, converts data from the GEDCOM standard format (a heirachical file format that looks as if it was designed for 80-column cards) into my SQL Server database. As we started to load reasonable quantities of data into the system we found that the on-line response became abysmal. This problem was fixed by defining a number of secondary indexes (response times dropped to under a second, from previously exceeding 2 minutes and often timing out). Unfortunately however the processing time of Ged.Parse then tripled, and it may now take up to an hour to process a GEDCOM. I believe that this is a byproduct of defining several indexes that are not needed by Ged.Parse itself, but which are of course maintained as Ged.Parse inserts new records into the database. I am wondering what my best strategy is, apart from putting Ged.Parse into a background task and just letting it trickle away. (I will probably do this anyway). What I'd like to be able to do is to have Ged.Parse load records without creating the secondary indexes, and then create the indexes for the newly-added records as a penultimate step just before it makes them available for general use. Of course there is no way that you can do this: records in a table are either indexed or they are not. Proposed change: recode Ged.Parse to load data into temporary tables, say NewPeople, NewFacts, etc., with these tables having only the indexes required by Ged.Parse. Then, as the last process in Ged.Parse run a SQL procedure with code like: - Insert into People Select * From NewPeople Delete from NewPeople etc This is a reasonable amount of programming, so before I make this change could somebody tell me: will this be significantly faster overall, or is this likely to make little or no improvement compared to the present process in which Ged.Parse loads data directly into People, Facts, etc? Two facts that may influence the answer. First, all record relationships are through GUIDs, so records in NewPeople, NewFacts, etc would already have their final key values. Second: although Ged.Parse needs to form relationships between records, these relationships are only within the new records (created from the same GEDCOM), and Ged.Parse does not need to relate any of these new records to earlier records. Thank you, Robert Barnes.
I am looking for a way to convert the following format into a sql table. The format it is Bib Tex.
Essentially a new row in the table would be for each entry, denoted by an @ logo and each column is denoted by an =, as you can see from the example data no one contains all the possible columns and some fields can be over two lines long.
To load this I was considering loading it into a table as each line being a row. Adding a row number, then a column counting the @ signs in order and essentially grouping each record, then for each group running through and looking for the column keywords 'author' , 'title' etc then splitting the data out into those constituent parts using substring and charindex.
@Book{hicks2001, author = "von Hicks, III, Michael", title = "Design of a Carbon Fiber Composite Grid Structure for the GLAST Spacecraft Using a Novel Manufacturing Technique", publisher = "Stanford Press", year = 2001,
I am using SQL Server 2012 SE.I am trying to delete rows from a couple of tables (GetPersonValue has 250 million rows and I am trying to delete 50Million rows and GetPerson has 35 Million rows and I am trying to delete 20 million rows). These tables are in TX replication.The plan is to delete data older than 400 days old.
I tried to move data to new tables from the last 400 days and it took me like 11 hours. If I delete data in chunks of 500000 then its taking a long time to rebuild indexes(delete plus rebuild indexes 13 hours). Since I am using standard edition partition wont work.
find ddl below:
GO CREATE TABLE [dbo].[GetPerson]( [GetPersonId] [uniqueidentifier] NOT NULL, [LinedActivityPersonId] [uniqueidentifier] NOT NULL, [CTName] [nvarchar](100) NULL, [SNum] [nvarchar](50) NULL, [PHPrimary] [nvarchar](50) NULL,
I have one stored proc that uses the Row_number over partition that looks like this:
Select TargetID, Academic_Year_id, Course_Mode, UK_Enrol, Int_Enrol, Notes, Revision_Number from (SELECT ROW_NUMBER() OVER (partition by [Academic_Year_id] order by [Revision_Number] DESC) as [RevNum],TargetID, Academic_Year_id, Course_Mode, Target_Year, UK_Enrol, Int_Enrol, Notes, Revision_Number FROM tbl_targets where course_mode=@course_mode) RV where (RV.RevNum=1)
Now the next store proc needs to use the above but i need to add the Academic_year from the tbl_acyear_lookup table also add filter the target_year ='year 1'
I would like to know how I can switch from one database to another within the same script. I have a script that reads the header information from a SQL BAK file and loads the information into a Test database. Once the information is in the temp table (Test database) I run the following script to get the database name.
This part works fine.
INSERT INTO @HeaderInfo EXEC('RESTORE HEADERONLY FROM DISK = N''I:TESTdatabase.bak'' WITH NOUNLOAD') DECLARE @databasename varchar(128); SET @databasename = (SELECT DatabaseName FROM @HeaderInfo);
The problem is when I try to run the following script nothing happens. The new database is never selected and the script is still on the test database.
EXEC ('USE '+ @databasename)
The goal is switch to the new database (USE NewDatabase) so that the other part of my script (DBCC CHECKDB) can run. This script checks the integrity of the database and saves the results to a temp table.
We have a typical issue with Column Store Index, we have a procedure which does 2 activities - Switch & Reverse Switch
Switch: 1. Fetch the Partitions needed to be switched 2. Switch the data from Main Table to Switch table 2. Disable the Column store on Switch table
SSIS Package: 3. Load data to Switch (Insert / Update)
Reverse Switch: 4. Enable the Switch 5. Switch back the data from Switch table to Main table
Issue: Some time the Column store is not getting disabled, and the package fails complaining try disabling the Column store index and try loading data.
If we re-run the procedure, the column store gets disabled.
DECLARE @DatePartitionFunction nvarchar(max) = N'CREATE PARTITION FUNCTION DatePartitionFunction (datetime) AS RANGE RIGHT FOR VALUES ('; DECLARE @i datetime = '2007-09-01 00:00:00.000'; WHILE @i < '2008-10-01 00:00:00.000' BEGIN SET @DatePartitionFunction += '''' + CAST(@i as nvarchar(10)) + '''' + N', ';
[Code] ....
Msg 7705, Level 16, State 2, Line 1 Could not implicitly convert range values type specified at ordinal 1 to partition function parameter type.
However if I change to datetime2 it works
DECLARE @DatePartitionFunction nvarchar(max) = N'CREATE PARTITION FUNCTION DatePartitionFunction (datetime2) AS RANGE RIGHT FOR VALUES ('; DECLARE @i datetime2 = '2007-09-01 00:00:00.000'; WHILE @i < '2008-10-01 00:00:00.000' BEGIN SET @DatePartitionFunction += '''' + CAST(@i as nvarchar(10)) + '''' + N', ';
[Code] ...
Is the data type of the column used for partitioning. All data types are valid for use as partitioning columns, except text, ntext, image, xml, timestamp, varchar(max), nvarchar(max), varbinary(max), alias data types, or CLR user-defined data types.
In this case why isn't datetime works?
version is as follow:
Microsoft SQL Server 2012 (SP1) - 11.0.3128.0 (X64) Dec 28 2012 20:23:12 Copyright (c) Microsoft Corporation Enterprise Evaluation Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)
from [URL] .....
Table and index partitioning is supported in this edition
I have a report that uses three tables from a database. As long as I only use two of the tables, it runs fine. I need the data from the third table for a line chart. So of course there is a great deal of data in the third table. I have a where clause for start date and end date. Is there a way I could only search the third table after I know what I need from it? Or
I have a directory with images and a table in my DB with the path of each file. The main application allow me to create reports where I can display an image, so I was thinking to use a query like:
SELECT [ID] ,[PT_CODE] ,[FILE_PATH], CASE WHEN [FILE_PATH] IS NOT NULL THEN (SELECT * FROM OPENROWSET(BULK [FILE_PATH], SINGLE_BLOB) TT) END AS IMAGE_LOADED FROM [DB].[dbo].[TABLE_MR_FILES]But I keep getting the error:
Incorrect syntax near 'FILE_PATH'.I have try multiple combinations without luck to make the OPENROWSET read the path stored in the column [FILE_PATH]. What am I missing?
Note: I am using MSSQL 2012. I don't want to import the images into the DB just load them in the fly as needed by the report runned from the application. I have full access to the DB so if a store procedure is the solution I can go with it.
This Table has the same granularity as the fact table as it’s one row per booking.However due to the nature of the data I would not want to incorporate this into the fact table.The Originating and Destination addresses are populated for each booking and are required for reporting.
Question:Should this be moved into a fast changing Dimension table.? or would there be a better way to incorporate this data.
I have a scenario where a customer is going to be using Log Shipping to the DR site; however, we need to maintain the normal backup strategy on the current system. (i.e. Nightly Full, Every 6 Hour Differential and Hourly Transaction Log backup)I know how to setup Transaction Log Shipping and Fail-over to DR and backup but now the local backup strategy is going to be an issue. I use the [URL] .... maintenance solution currently.
Is it even possible to do regular backups locally keeping data integrity for your backup strategy with Transaction Log Shipping enabled?
I have a heavy database , More than 100 GB only for six month .every Query on it takes me along time and I dont have enough space to add more indexes.by a way I decided to do partitioning. I create a partition function , on date filed and all Data records per month was appointed to a separate file.And is partitioning only for Future data entry?
The project is a C/S data analysis system built with .Net 2.0 in windows environment, OS: Microsoft Windows 2003 R2 standard Edition Service Pack2, Database used in this project is: Sql server 2005. As a data analysis system, we need to load large amount of data from file to database, we do it by create a dts package and then do data loading by execute "m_Package.Execute(null, variables, m_PackageEvents, null, null)".
The problem is, we fount that DTS miss some data randomly sometimes, we can't find the rule till now. for example we've data as follows in data file, all data field splited by '|' 11234|26341|2007-09-03 00:00|0|0|0.0|0|0.011470833793282509|1|0.045497223734855652|0|0|1|0|3|2929|13130|43|0|2|0|0|40|1|0|0|0|0|0|1||0|0|3|0|0|0|0|0||0|3|0|0|43|43|0|41270|0|3|0|0|10|3|0|0|0|0|0||0|1912|0|0|3|0|0|0|0|0|0|0|3|0|0|5|0|40|0|9|0|0|0|0|0|0|0|0|29|1|1|24|24.0|16|16.0|0|0|0.0|0|0|24|23.980693817138672|0|0.0|0|0.0|0|0.0|0|0.0|11|2.0764460563659668|43|2|0|0|30|11|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|3|3|0|0|0|0|0|0|0|0|0|6|0|0|0|0|0|6|0|0|45|1|0|0|0|2|42|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|2|0|0|0|2|0|0|0|0|0|0|51|47|85|0|0|||||||||||||||||||||||||||||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|||||||||||||0|0|0|0|0|97.117401123046875|0|0|83|57|||0.011738888919353485|0|1|0.065955556929111481|0|4|||0.00026658669230528176|1|0.00014440112863667309|1|68|53|12|2|1|2.0562667846679688|10|94|2|0|0|30|11|47|4|13902|7024|6878|18|85|4.9072666168212891|5|0.0|0|0.0|0|0.0|0|0.0|0|358|6448|6324|0|0|0|0||0||462|967|0|41|39|2|0|0|0|1|0|0|0|0|0|0|0|0|3|0|0|3|0|0|0|0|0|0|0|0|0|3|0|3|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0.0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|46|0|1|0|1|37|0|0|46|0|1|0|1|37|0|0|0|0|0|0|0|0|0.0|0|0|6|4|2|0|0|2|1|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0.0|0|1|0.012290795333683491|0|44|44.0|0|0.0|0|0|0|30|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|2|0|2|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|2|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|27|0|0|2112|411|411|45|437|2|0|2|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|4|0|4|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|1|0|0|0|0|0|0|0|0|0|0|0|6|6|0|3|2|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|5|5.0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|600|600|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|6|0|0|0|0|0|0|6|0|9|1|2|2|3|0|1|0|0|0|0|0|0|0|0|0|0|0|13|3|2|5|1|1|1|0|0|0|102|0|1|1|0|0|0|3|3|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0||||||0|0|0|0|0|0|0|0|0|0|0||0|0|0|0|0|0|0|0|0||||||||||0|0|0|0|0|0|0||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0.0|46.0|46|0.0|0|0.0|0|0.011469460092484951|1|0.0|0|0.0|0|3|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0.0|0|0.0|0|0|0|0|0|0|0|0|0|0|0|0|0|||0|100.0|100.0|0|1|0|1|0|0|0.02481505274772644|1|0.014951236546039581|1|0|0|0|0|0|0|0|0|0|0|0|0|0|||||||||||||||||||||||||||||||||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|||0|||||||||||||||||||||||||||||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|4695.5556640625|42260|7126.66650390625|62040|||||||||||||||||||||||||||||||||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0||||||||||0|0||||||||||
We found that some of the data field become 'null' after the load action finished, if we load the same data again, problem disappeared, we can't 100% reproduce this issue each time, we don't know why, Anybody here can help us to solve this issue or give us some clue?
How can I make partitions on a table for a particular value and ranges together?
For example, for customer id 12345 i need a separate partition, then for 56789 i need a separate partition, and if i have range of values like 1000 to 1020 then a separate partition for this.
For certain ids i need unique partition, and for certain ids i need Ranges.
How to add some more ranges to existing partition schema and function?
Already My table partitioned on date ranges,
6 partitions , each partition contains 6 moths data, so total data is 3 years.
i.e. 1 partition data- from jan2012 to Jun2012 2 partition data- from july2012 to dec2012 3 partition data- from jan2013 to Jun2013 4 partition data- from july2013 to dec2013 5 partition data- from jan2014 to Jun2014 6 partition data- from july2014 to dec2014 After Jan2015 data will go to Primary file group(Default)
Now customer wants to add two more file groups with these partitions ranges, i.e. jan2015 to jun15 and Jul15 to dec15.
File group and ndf file adding is OK, But
how to alter partition scheme and partition function with these additional ranges to existing partition function and scheme?
Please find the code below (which I am using). 1) CREATE TABLE rssFeeds( feedXML XML) SELECT * FROM RSSFEEDS DECLARE @xmlDoc XML SET @xmlDoc = ( SELECT * FROM OPENROWSET ( BULK 'C:Test.xml', SINGLE_CLOB ) AS xmlData)
2) INSERT INTO rssFeeds (feedXML) VALUES (@xmlDoc)
I am able to get the contents of xml file into sql server in only one field of Table. Now I am asked to make one table with the schema same as xml file's. How to proceed? Pls let me know some URL for this.
Hi,SQL Server 2000 SP3Has anyone ever successfully loaded data into SQL Server from a SASdataset. I have tried using DTS and SAS OLE DB drivers but get thefollowing errorError Description:A provider specific error occurred (%1:%2)Context:Error calling GetRowSet to get DBSCHEMA_TABLES schema info. Yourprovider does not support all the schema rowsets required by DTS.It does seem to me to be a problem with the OLE DB providers but ifanyone has seen this issue with DTS previously , ler me know......Any help is appreciated....Thanks in advanceReg
Don't know if this is the right forum to be asking this, but I'll give it a try...
I'm relativelly a beginner in SQL Server and T-SQL in general. The problem I'm trying to solve is the following:
The big picture is that I have data coming from different data sources which I need to store on a database for later reference. Each data source might have a different set of measurements. For example, data source 1 might log Pressure and Humidity while data source 2 logs Pressure and Temperature. Once the data is present on the DB, the users can go ahead and retrieve data for a given [datasource/measurement/time interval] to generate reports or charts.
My implementation so far consists of two tables: series_info and series_data. series_info holds general information for a given series of measurements for a given data source (Pressure for data source 1, Pressure for data source 2, Humidity for data source 1 and Temperature for data source 2, in our example). Each series has a bigint index as primary key.
The table series_data contains all data relative to the series from series_info. Each piece of data has a bigint as a primary key, an associate time (which is always crescent) and a foreign key to the series it represents (in series_info).
Alright, everything is cool so far. However, whenever a user wants to retrieve data for given [data source/measurement/time interval], this takes very long, since all data is interposed in series_data and for every search it's necessary to find where the desired data actually lies.
One obvious solution for this would be to dynamically create a new table to hold the data for each series, but that would just make my database disorganized, since there would be thousands and thousands of tables.
Another thing that comes to my mind is to create a table with information of where lies the data for a given [data source / measurement] for given dates. So when the user requested data for a given [data source/measurement] between, say, january and february, we would first look at this intermediate table and find out that the data lies between indexes 1000 and 2000 on the series_data table, so the next SELECT command to series_data would already contain a restriction like WHERE index>=1000 and index<=2000. This should probably improve the speed of retrieval.
What do you guys (or girls) think? Maybe there's simply a classical solution for such a case.
Regarding SQL Server data, I am looking to implement the beset Data-Archive and Purge policy. Normal, we do SQL Backups and keep the history for some period , for example, 8 weeks, so we can go back and restore any data point in time upto 8 month in past. and we also do Tape backups.
Question is Where can I get nice article or documentation on this to best design such policy where I make sure that I am covered for point in time recovery of database (which is sql backups) and point in time recovery in far past, say, 3 years ago using tape backups, and I need to make sure that I don't repeat the same efforsts.
I am so new to SQL Server 2005 and just studying. Saying that... We use SQL Server 2005 Express edition. Some one sent me a file (info.mdb) and asked me to load the data in this file in to a table called Products and also asked me to load in another table (ProdCat) where id = 'X05'. So being not knowing anyting regarding data loading etc, how should I do this and proceed? The .mdb means its a Access database file? If that is the case, I dont have Access in my machine and what should I do?