Transact SQL :: Fast Data Loading With Partition Switching Strategy
Jul 28, 2015
I’m looking for clearity on partition switching. The idea is to use many BULK INSERT statements into table dbo.X_n in parallel and when BULK INSERT for table dbo.X_n is completed, switch dbo.X_n into dbo.bigdaddy. I think this is the fastest way to upload a couple hundred GB of data.
In learning about partition switching (in part) from The Data Loading Performance Guide under Partition SWITCH, I hear the instructions to say copy the main table exactly to become a target. But in that same step (#1), I read that we need to change the default file group of the target (dbo.X_n) from the default file group. Then it says I need to match indexes and lists the filegroup as something we need to match with the main table.
As an overview of the partition switching strategy, I think the whole point of BULK INSERT with partitioning is to have seperate files (in same group) to enable concurrent uploading where each table has its own file. Once the upload is completed to a table (dbo.X_n) then we do the partition switch into the main table (dbo.bigdaddy). The data we just uploaded doesn’t actually move, just the metadata for it.
When I read the instructions linked above, I hear “Don’t have the same filegroup on your target as the main table. You must have the same filegroup on your target as the main table.”
Where am I disconnected?
View 5 Replies
ADVERTISEMENT
Jul 28, 2015
I’m looking for clearity on partition switching. The idea is to use many BULK INSERT statements into table dbo.X_n in parallel and when BULK INSERT for table dbo.X_n is completed, switch dbo.X_n into dbo.bigdaddy. I think this is the fastest way to upload a couple hundred GB of data.
In learning about partition switching (in part) from The Data Loading Performance Guide under Partition SWITCH, I hear the instructions to say copy the main table exactly to become a target. But in that same step (#1), I read that we need to change the default file group of the target (dbo.X_n) from the default file group. Then it says I need to match indexes and lists the filegroup as something we need to match with the main table.
As an overview of the partition switching strategy, I think the whole point of BULK INSERT with partitioning is to have seperate files (in same group) to enable concurrent uploading where each table has its own file. Once the upload is completed to a table (dbo.X_n) then we do the partition switch into the main table (dbo.bigdaddy). The data we just uploaded doesn’t actually move, just the metadata for it.
“Don’t have the same filegroup on your target as the main table. You must have the same filegroup on your target as the main table.”
View 1 Replies
View Related
Apr 22, 2007
I have been developing a genealogy application using a SQL Server 2000 database and ASP .NET 2.0. In this application a process, Ged.Parse, converts data from the GEDCOM standard format (a heirachical file format that looks as if it was designed for 80-column cards) into my SQL Server database.
As we started to load reasonable quantities of data into the system we found that the on-line response became abysmal. This problem was fixed by defining a number of secondary indexes (response times dropped to under a second, from previously exceeding 2 minutes and often timing out). Unfortunately however the processing time of Ged.Parse then tripled, and it may now take up to an hour to process a GEDCOM. I believe that this is a byproduct of defining several indexes that are not needed by Ged.Parse itself, but which are of course maintained as Ged.Parse inserts new records into the database.
I am wondering what my best strategy is, apart from putting Ged.Parse into a background task and just letting it trickle away. (I will probably do this anyway). What I'd like to be able to do is to have Ged.Parse load records without creating the secondary indexes, and then create the indexes for the newly-added records as a penultimate step just before it makes them available for general use. Of course there is no way that you can do this: records in a table are either indexed or they are not.
Proposed change: recode Ged.Parse to load data into temporary tables, say NewPeople, NewFacts, etc., with these tables having only the indexes required by Ged.Parse. Then, as the last process in Ged.Parse run a SQL procedure with code like: - Insert into People Select * From NewPeople Delete from NewPeople etc
This is a reasonable amount of programming, so before I make this change could somebody tell me: will this be significantly faster overall, or is this likely to make little or no improvement compared to the present process in which Ged.Parse loads data directly into People, Facts, etc? Two facts that may influence the answer. First, all record relationships are through GUIDs, so records in NewPeople, NewFacts, etc would already have their final key values. Second: although Ged.Parse needs to form relationships between records, these relationships are only within the new records (created from the same GEDCOM), and Ged.Parse does not need to relate any of these new records to earlier records.
Thank you,
Robert Barnes.
View 2 Replies
View Related
Apr 8, 2006
I am searching for a way to fast load relation data. I know how to load data fast but how can i store relation data fast.
For example :
Table1 ( tabel1Id int identity , name varchar(255) )
Table2 ( tabel2Id int identity , table2Id , name varchar(255))
When i insert 50 records into Table1 i can't get the 50 identity fields back, to insert the related data into Table2.
I think one of the solutions could be returning a selection of
Table1 joined with syslockinfo, but i have no idea how to do it.
Does anyone have an idea?
View 3 Replies
View Related
Jul 10, 2007
Hi Every one,
How can I load or copy say millions of rows to a table in the database faster?
Thanks,
Mejo George
View 6 Replies
View Related
Mar 1, 2015
When you load the data into a new partition table, can it to done online without any downtime? because I have few tables that are around 250 gigs and more.
View 5 Replies
View Related
Aug 27, 2015
I am writing a query where I am identifying different scenarios where data changes between one week and the next. I've set up my result set in the following manner:
PrimaryID SKUChange DateChange LocationIdChange StateChange
10003 TRUE FALSE TRUE FALSE
etc...
The output I'd like to see would be like this:
PrimaryID Field Changed Previous Value New Value
10003 SKUName SKU12345 SKU56789
10003 LocationId Den123 NYC987
etc...
The key here being that in the initial resultset ID 10003 is represented by one row but indicates two changes, and in the final output those two changes are being represented by two distinct rows. Obviously, I will bring in the previous and new values from a source.
View 3 Replies
View Related
Jul 7, 2015
I have a table which has been partitioned on BIGINT column.
Partitioned_Table (ID BIGINT, Name VARCHAR(10), Gender VARCHAR(2))
I have a left range partition function on ID column.
CREATE PARTITION FUNCTION Partition_Function ( BIGINT )
AS RANGE LEFT
FOR VALUES ( '20150601000', '20150602000', '20150603000' );
That means the first partition is ID >= 20150601000 to ID < 20150602000.
I have to switch in a table into an empty partition.
Switching_In_Table(ID BIGINT, Name VARCHAR(10), Gender VARCHAR(2))
Before the switch in, I am creating a CHECK constraint on Switching_In_Table CHECK(ID LIKE '20150625%') Can I use like clause in this scenario?
View 6 Replies
View Related
Sep 8, 2015
If I have the first two columns from a SQL query and would like to create the third (row):
personid new_commsstream row
1 0 1
1 0 1
1 0 1
1 1 2
1 0 2
2 0 1
3 0 1
4 0 1
5 0 1
5 1 2
5 1 3
Is this possible? I have tried using row over partition but I'm not sure how group it correctly, so basically every time there is a new 1 in new_commsstream within a personid the row number goes up by one.
View 3 Replies
View Related
Jul 31, 2015
I am new to Partitioning tables. My scenario is as listed below.
I am getting Monthly Transaction data on Every First Monday of the Month and I want to do partition for those data.
For Example: Let's say I will get my next monthly data on August 3rd 2015 which is First Monday of the month of August.
I want those Transaction data to go in new partitioned FileGroup in my existing partitioned table. How can I do partition for this kind of scenario ? Can we create one or multiple Stored Procedure which will create New Partition and load data in that partition ?
FYI, this monthly data will be loaded in Staging table and that table has LoadDate column which will have 2015-08-03 in it. I am using SQL 2012 Enterprise edition.
View 17 Replies
View Related
Sep 22, 2015
I have the following query
WITH summary AS
(SELECT tu.SequenceNumber,
tu.trialid,
tu.SBOINumber,
tu.DisplayFlag,
[Code] ....
I am having trouble with the RowNumber Over Partition By portion of the query. I would like the query to return only the first occurrence of each sboinumber in the table for each trial id. It is only giving me the first occurrence of each sboinumber. I tried including the trialid in the partition by clause, but that is not working.
Sample Data
SequenceNumber TrialID SBOINumber
1 1 5000
2 1 5000
3 2 5000
4 2 5000
5 1 5001
6 3 5001
7 3 5001
Should return SequenceNumber 1 and 3, 5, 6
View 11 Replies
View Related
Jul 22, 2015
I am trying to spilt records into days by the start - End datetime.
I would send an image and data but because I am new to the forum, I am blocked sending images.
"Body text cannot contain images or links until we are able to verify your account"
How I can forward an image.
View 15 Replies
View Related
Jun 9, 2015
I have a non-partitioned table (TableToPartition) and I want to apply an existing partition scheme (PartSch) to it using a query. I didn't find any option so I used the StorageCreate Partition wizard to generate the script.why this clustering magic needed if it is dropped at the end? Isn't there another way without indexing to partition a table, say something with ALTER TABLE? (SQL Server 2012)
BEGIN TRANSACTION
CREATE CLUSTERED INDEX [ClusteredIndex_on_PartSch_635694324610495157] ON [dbo].[TableToPartition]
(
[ID]
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PartSch]([ID])
DROP INDEX [ClusteredIndex_on_PartSch_635694324610495157] ON [dbo].[TableToPartition]
COMMIT TRANSACTION
View 2 Replies
View Related
Dec 16, 2007
Hello there,
Don't know if this is the right forum to be asking this, but I'll give it a try...
I'm relativelly a beginner in SQL Server and T-SQL in general. The problem I'm trying to solve is the following:
The big picture is that I have data coming from different data sources which I need to store on a database for later reference. Each data source might have a different set of measurements. For example, data source 1 might log Pressure and Humidity while data source 2 logs Pressure and Temperature. Once the data is present on the DB, the users can go ahead and retrieve data for a given [datasource/measurement/time interval] to generate reports or charts.
My implementation so far consists of two tables: series_info and series_data. series_info holds general information for a given series of measurements for a given data source (Pressure for data source 1, Pressure for data source 2, Humidity for data source 1 and Temperature for data source 2, in our example). Each series has a bigint index as primary key.
The table series_data contains all data relative to the series from series_info. Each piece of data has a bigint as a primary key, an associate time (which is always crescent) and a foreign key to the series it represents (in series_info).
Alright, everything is cool so far. However, whenever a user wants to retrieve data for given [data source/measurement/time interval], this takes very long, since all data is interposed in series_data and for every search it's necessary to find where the desired data actually lies.
One obvious solution for this would be to dynamically create a new table to hold the data for each series, but that would just make my database disorganized, since there would be thousands and thousands of tables.
Another thing that comes to my mind is to create a table with information of where lies the data for a given [data source / measurement] for given dates. So when the user requested data for a given [data source/measurement] between, say, january and february, we would first look at this intermediate table and find out that the data lies between indexes 1000 and 2000 on the series_data table, so the next SELECT command to series_data would already contain a restriction like WHERE index>=1000 and index<=2000. This should probably improve the speed of retrieval.
What do you guys (or girls) think? Maybe there's simply a classical solution for such a case.
Thanks in advance!
View 6 Replies
View Related
Mar 31, 2008
Regarding SQL Server data, I am looking to implement the beset Data-Archive and Purge policy. Normal, we do SQL Backups and keep the history for some period , for example, 8 weeks, so we can go back and restore any data point in time upto 8 month in past. and we also do Tape backups.
Question is Where can I get nice article or documentation on this to best design such policy where I make sure that I am covered for point in time recovery of database (which is sql backups) and point in time recovery in far past, say, 3 years ago using tape backups, and I need to make sure that I don't repeat the same efforsts.
Any advices or suggestion on this topic.
Thanks,
View 5 Replies
View Related
Sep 2, 2015
I have a stored procedure that attempts to INSERT @BatchSize number of records at a time into a table. Currently, I have @BatchSize set to load 50,000 at a time. The table I am inserting from has a little over 67,000 records.
When I execute the procedure with NOCOUNT left off, the procedure seems to run indefinitely, and the count of records returned surpasses what I have in the source table. However, only 50,000 records are inserted into the table.
Below is my code:
begin try
--error catching variables:
declare @Error_NumberLocal int
,@Error_MessageLocal varchar(4000)
,@Error_SeverityLocal int
,@Error_StateLocal int
,@Error_ProcedureLocal varchar(200)
,@Error_LineLocal int
,@User_NameLocal varchar(200)
[code]....
What is the problem with the looping structure that would cause this issue?
View 6 Replies
View Related
Jan 23, 2004
Hello everybody
We need to move table T1 from database A to T1 database B on same server
size of table T1 15 GB and 40000000 rows
database B just created and will act as warehouse
could it be done simply by
1.creating table T1 on db B and then
2.set db to simple recovery
3.
insert into B.dbo.T1
select * from A.dbo.T1
4. create all the indexes on table T1 in db B
free disk space is 35GB
Any idea how to optimze import
Thank you
View 5 Replies
View Related
Jun 10, 2015
why bcp out (exporting data to a text file from a sql table using bcp utility ) is faster ?
View 6 Replies
View Related
Jul 20, 2005
I'm currently working with a 10 million plus row database with the dataresiding on a Unix box with Cache 5.0. The problems is that it can take fivedays to pull one table from Cache to SQL 2000 using the ODBC connectionprovided by Cache in a SQL 2000 DTS package. I think the real problem isconverting the data from the post relational format (Cache) to a relationalformat (SQL 2000)???Does anyone have any ideas / suggestions on how to speed this transfer ofdata? I'm very new to Cache and any help would be greatly appreciated.Thanks,-p
View 3 Replies
View Related
Jul 5, 2007
Hi,
For this scenario, what is the best method of exporting data to sql 2005.
I want to export data from desktop app across internet to sql which can do on a row by row basis, but this is very slow and if the connection goes down halfway then pretty much buggered.
What is the best, reliable and fastest way to copy data across internet (several thousand rows), I have read about Bulk Insert etc... but also how would get around an upload and crashes half way, is there a way of uploading and until the whole upload goes through then the data is inserted into the database.
Would appreciate any guidance.
Richard
View 3 Replies
View Related
Mar 29, 2007
I created a SSIS package and several data flow componenets for this package.
What does strategy exist to deploy SSIS package and data flow components into a enterparise server?
Thanks in advance.
View 2 Replies
View Related
Mar 4, 2005
Does anyone know how to upload (bulk) data from a client (written in Excel VBA) to a remote SQL2000 database? Of coarse I tried "INSERT INTO" and rst.addnew but I noticed this is much, much slower as downloading from the same remote database.
Thanks.
View 3 Replies
View Related
May 28, 2015
I have below DB structure in MSSQL for a small application which follow relational approach. Data retrieval (for Hostels) will need several Join, may be Key-Value approach where data retrieval will be fast.
Hostels
------------
HostelId,
Name,
Address,
CategotyId,
SubCategoryId,
FoodCategoryId,
LandLordId
Data:
1 H1 Address1 1 1 2 20
2 H2 Address2 1 2 2 21
3 H3 Address3 2 2 1 17
Category
----------
CategoryId,
CategoryName
[code]...
View 10 Replies
View Related
Jul 17, 2001
Hi folks,
Recently i've been working on a new project that would partition a large table 2 smaller tables. I then create a view to union the 2 smaller tables(table A, B). I've been getting a strange error when i try to update, insert, delete a record through the view. "View needs partitioning column"....i find this strange. Both of my table have a cluster primary key consisting of 3 columns, and one of the 3 columns(date field) consist of a check constraint. The constraint is used to determine what record goes into which table. Am i missing anything else? The really strange part is sometime it works, and sometimes i get the error message.
Any thoughts?
Joe R.
View 1 Replies
View Related
Dec 31, 2003
Hi ,
I have question regard data partition view .
Please see below sample from BOL + sample of execution plane .
I would like to ask what is the way to avoid the optimizer scan tables out of the scope (I would expect that the only table for this query will be SUPPLY1)
Thanks,
Eyal
--This example uses tables named SUPPLY1, SUPPLY2, SUPPLY3, and SUPPLY4, which correspond to the supplier tables from four offices, located in different countries/regions.
USE tempdb
GO
--create the tables and insert the values
CREATE TABLE SUPPLY1 (
supplyID INT PRIMARY KEY CHECK (supplyID BETWEEN 1 and 150),
supplier CHAR(50)
)
CREATE TABLE SUPPLY2 (
supplyID INT PRIMARY KEY CHECK (supplyID BETWEEN 151 and 300),
supplier CHAR(50)
)
CREATE TABLE SUPPLY3 (
supplyID INT PRIMARY KEY CHECK (supplyID BETWEEN 301 and 450),
supplier CHAR(50)
)
CREATE TABLE SUPPLY4 (
supplyID INT PRIMARY KEY CHECK (supplyID BETWEEN 451 and 600),
supplier CHAR(50)
)
GO
--create the view that combines all supplier tables
CREATE VIEW all_supplier_view
AS
SELECT *
FROM SUPPLY1
UNION ALL
SELECT *
FROM SUPPLY2
UNION ALL
SELECT *
FROM SUPPLY3
UNION ALL
SELECT *
FROM SUPPLY4
GO
INSERT all_supplier_view VALUES ('1', 'CaliforniaCorp')
INSERT all_supplier_view VALUES ('5', 'BraziliaLtd')
INSERT all_supplier_view VALUES ('231', 'FarEast')
INSERT all_supplier_view VALUES ('280', 'NZ')
INSERT all_supplier_view VALUES ('321', 'EuroGroup')
INSERT all_supplier_view VALUES ('442', 'UKArchip')
INSERT all_supplier_view VALUES ('475', 'India')
INSERT all_supplier_view VALUES ('521', 'Afrique')
GO
/* */
SELECT * FROM all_supplier_view WHERE supplyID BETWEEN 1 and 150
View 7 Replies
View Related
Jan 22, 2014
I'm moving set of data from one partition to another what is the best way.
what all the things need to be considered
Note: The set of data will be all from one partition to another one partition
My current query:
UPDATEtable1
SET table1.partitioncolumn = @newpartitioncolumn
FROMtable1
INNER JOIN table2
ON table1.id = table1.id
AND table1.partitioncolumn = @oldpartitioncolumn
View 7 Replies
View Related
Oct 7, 2015
I have partitions that I have filled with data. I am not trying to figure out exactly how much data the partitions contain, and therefore I will be able to see if any of them are close to hitting their autogrow conditions. If I were looking at a single unpartitioned table, then I could maybe look at the table properties to determine data and index sizes, and compare that to the size of the mdf file size, but for partitions, then I am not sure how I would query this information out. Any pointers on how this information could be queried out of the system?
View 3 Replies
View Related
May 26, 2004
Hi,
I have a sql server 7 running on a machine with two disk partitions (D: and E:).
The data files xxx.mdf and xxx.ldf are stored in D:, which has very few space available. I want to copy these files to E: but I get an error saying that it is not possible to change the source file of a database. Is it possible to do it or do i have to create another data file in E: and keep the old one in D:?
Thanks in advance,
browser
View 3 Replies
View Related
Jun 5, 2008
Hi All,
I am trying to understand, when would I do a vertical partition in a Dimensional Data Warehouse ? What are the things I need to consider, before I take the decision?
Necessity is the mother of all inventions!
View 7 Replies
View Related
Feb 11, 2008
Hi
Column with TEXT datatype is not stored in the same data row any way. I am wondering if there is any performance gain to put it in a seperate table. Thanks
View 4 Replies
View Related
Feb 12, 2007
Does anyone have a helpful link for using the partition processing data
flow task in SSIS? I am trying to process a monthly partition
from within my package and am getting the following error:
Error: 0xC113000A Errors in the high-level relational engine. Pipeline
processing can only reference a single table in the data source view.
If anyone has used this before and could point me in the right direction, I would appreciate it.
Thanks,
Nick
View 3 Replies
View Related
May 8, 2015
I am using a WriteBack Partition to receive data from various inputs and appends any new data that I add to the WB partition.
I am able to read the data immediately in the WB partition through a Fact partition query. This is working at this point as desired.
Eventually I want to move the data from the WB partition into Fact Partition. How can I do this, manually and through automation.
View 5 Replies
View Related
Nov 10, 2015
I'm using Script Component to load data into Oracle DB due to the poor performance issue. Now, I found it will missing some data during the transmission. Please see the screenshot below:
SQL Server:
Oracle:
DDL:
create table Person
(
BusinessEntityID Integer,
FirstName nvarchar2(50),
MiddleName nvarchar2(50),
LastName nvarchar2(50)
);
Result:
I follow up this article: [URL] ....
VB Script:
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
[Code] ..........
View 8 Replies
View Related