Best Way To Split A Dataset Into Manageable Chunks?
Sep 28, 2007
I have a table that's 25,000,000 records... about 10 fields. I need to export this data to a flat file in no more than 500,000 record chunks. I've tried the following algorithm, adding a flag field called "exported" with default value 0.
do:
- mark random 500,000 records, setting exported = -1
- export everything in that table where exported = -1
- set exported = 1 where exported = -1
loop
This was pretty slow, taking about 10 hours last night to run.
I find myself wanting a sort of a split dataset task in SSIS, being able to split records a chunk of records out of a dataset and handle them. Anyone have ideas for me?
What is required is to split data of this format into 3 separate datasets:
1. One dataset for DividendRequirement of 100, i.e. select * from tableName where DividendRequirement = 100
2. One dataset for DividendRequirement > 100 i.e. select * from tableName where DividendRequirement > 100
3. One dataset for DividendRequirement < 100 i.e. select * from tableName where DividendRequirement < 100
I know that i can do it with 3 separate stored procedures using a different operator ('=', '>' and '<') in each one and that i can combine the 3 stored procedures into 1 using dynamic sql and pass the operator (or some number that maps to a particular identifier) as a parameter to the stored procedure. What i'm after though is a way to avoid dynamic SQL but still keep it as one stored procedure. Possibly some clever use of case statements or something along those lines?
i have a big table (120 million records) and i want to take all this table and to insert it into another table. since this BULK insert operation can make all kind of performance problem i would like to make the bulk insert via small chunks. the table does not have any idintity.
can someone give me an exapmle with rowcount or with a loop to make each time an insert into select statment and to insert in each time for example 5000 rows.
I need to export records to a flat file using a dataflow task, but want no more than 50,000 records in each file. What's the best way to automate this?
I have a table based around requisitions, and each requisition has a number of positions. That number can change over time through updates to pertinent rows rather than through transaction-like records that record an entire history, and I'm only able to get a monthly snapshot of the table. What I decided to do is still use one table for OLAP (fact_requisitions) but add a column called period_key that refers to the month the data comes from. So if I have two months of data then the table has each requisition twice, possibly with differing position counts, and new requisitions from the second month are only present once. Then I tried to filter the MDX query like so:
SELECT { ([Dim TimeRequestClosed].[Year - MonthNumber].[Year_Text].&[2008].&[1],[Dim Requisitions].[Period].[Period Key].&[200801]) } ON COLUMNS, NON EMPTY { ([Dim Location].[Region Name].MEMBERS, [Dim Location].[Period Key].&[200801]) } ON ROWS FROM [Requisitions] WHERE [Measures].[Request Closed Date Count]
This query doesn't work even though the data is there, it just returns nulls. Am I going about this all wrong? If not, what might I be doing wrong, and how would I get the query to return more than one period (e.g. tell Dim Requisition to match up with Dim Location on the period key)?
From what I can see, the 'varbinary(max)' data type is not supported, and the 'image' data type is supposed to go away. Is there some other way to store large chunks (10MB to 100MB) of data into an SSEv DB?
If I have to use the 'image' data type to so this, does anyone have a code sample that would let me push an array() of numbers into an 'image' field, and unload an 'image' field into an array()?
Here's a little SP to break up those long-running, massively-locking, bring-app-to-a-halt queries. By default it does 500 rows at a time and allows for a maximum SQL query size of 4000 characters; it should be trivial to adjust those.
Cheers -b
CREATE PROCEDURE p_BatchExecute (@vcSQL varchar(4000)) AS set nocount on DECLARE @iRows int select @iRows=1 SET ROWCOUNT 500 WHILE @iRows>0 BEGIN print 'Executing batch of 500...' exec (@vcSQL) set @iRows=@@ROWCOUNT END GO
We have data that consists of an employee number, a start time and a finish time, similar to the example below
EMP Â STARTTIME Â Â Â Â Â Â Â Â Â Â ENDTIME
00001 10-Feb-2012 06:00:00 10-Feb-2012 10:00:00
00002 10-Feb-2012 07:15:00 10-Feb-2012 10:00:00
00003 10-Feb-2012 08:00:00 10-Feb-2012 10:00:00
I am trying to come up with a procedure in SQL that will give me each 15 minute block throughout the day and a count of how many employees are expected to be at work at the start of that 15 minute block. So, given the example above I would like to return
10-Feb-2012 00:00:00Â Â Â Â 0 10-Feb-2012 00:15:00Â Â Â Â 0 10-Feb-2012 06:00:00Â Â Â Â 1 10-Feb-2012 06:15:00Â Â Â Â 1
[code]....
I'm not too worried if the date part is not included in the result as this could be determined elsewhere, but how can I do this grouping/counting?
I'm using the code below to send files that are in a blob file in my database to the browser client. The code sends the file in chunks in order to increase performance. The file I'm using to test with is 7MB. It works great on Windows XP with any browser. It takes virtually the same amount of time compared to downloading the file directly from the webserver. However, Windows 2000 and Mac OS X both take about 4x the amount of time it takes to download the file on XP machines. Why the performance difference? Is there anything I can do to fix this? I tried downloading the file directly from the webserver instead of getting it out of the database and it takes the same amount of time on all 3 OS. I had the same problem on Windows XP when I wasn't sending the file in chunks, but after using the code below, it started working for XP only.
Dim bufferSize As Integer = 24000 Dim outbyte(bufferSize - 1) As Byte Dim retval As Long Dim startIndex As Long = 0
Dim sql As String = "SELECT ..." Dim cmd As New SqlCommand(sql, conn) conn.open() Dim dr As SqlDataReader = cmd.ExecuteReader(CommandBehavior.SequentialAccess) If dr.Read() Then ' Reset the starting byte for a new BLOB. startIndex = 0
' Read bytes into outbyte() and retain the number of bytes returned. retval = dr.GetBytes(DocCol, startIndex, outbyte, 0, bufferSize) Current.Response.Clear() Current.Response.Buffer = True Current.Response.ContentType = "application/octet-stream" Current.Response.AddHeader("Content-Disposition", "attachment; filename=" & myfile" & "." & myextension)
Do While retval = bufferSize Current.Response.BinaryWrite(outbyte) Current.Response.Flush()
' Reposition the start index to the end of the last buffer and fill the buffer. startIndex += bufferSize retval = dr.GetBytes(DocCol, startIndex, outbyte, 0, bufferSize) Loop
'Write the remainder of the last chunk Dim remaining(retval) As Byte Array.Copy(outbyte, 0, remaining, 0, retval) Current.Response.BinaryWrite(remaining) Current.Response.Flush() Current.Response.Close() End If dr.Close() conn.Close()
Using the SqlClient provider I'm trying to write big datachunks of maybe 20 MB each to SQL server to store in BLOBs using blobColumn.Write(...) using .NET 2.0 dbcommand object calling a Stored procedure
CREATE PROCEDURE [dbo].[putBlobByPK]
(
@id dKey
, @value VARBINARY(MAX)
, @offset bigint
, @length bigint
, @ModDttm dModDttm OUT
, @ModUser dModUser OUT
, @ModClient dModClient OUT
, @ModAppl dModAppl OUT
)
....
When doing this I can do this exactly 3 times than the application hangs (for ever).
When looking in the SQL Server log, I find the following to errors:
Error: 4014, Severity: 20, Status: 2.
A fatal error occurred while reading the input stream from the network. The session will be terminated.
I don't get this error on the client! OK, the session died.
What may be the problem?
I write big chunks like this to avoid many writes as the data shall be replicated later using peer to peer replication. And the more writes used for writing the total BLOB the more huge becomes the transaction log of the subscriber database.
I have a report with multiple datasets, the first of which pulls in data based on user entered parameters (sales date range and property use codes). Dataset1 pulls property id's and other sales data from a table (2014_COST) based on the user's parameters. I have set up another table (AUDITS) that I would like to use in dataset6. This table has 3 columns (Property ID's, Sales Price and Sales Date). I would like for dataset6 to pull the Property ID's that are NOT contained in the results from dataset1. In other words, I'd like the results of dataset6 to show me the property id's that are contained in the AUDITS table but which are not being pulled into dataset1. Both tables are in the same database.
I have a small number of rows in a dataset, Table 1. There is a CLOB on a large dataset, Table 2. They join on a PK. I would like to retrieve this CLOB and add it to the data flow for Table1. In short I want to emulate the following:
Table 1: Small table without CLOB, 10 rows. Table 2: Large table with CLOB, 10,000,000 rows
select CLOB from table2 where pk = (select pk from table1)
I want this to return the CLOBs for the small number of rows in Table 1. The PK is indexed obviously so it should be a fast look up.
Table 1 and Table 2 live on different Oracle databases. How do I perform this operation efficiently in SSIS? It seems the Lookup and Merge Join wont do this.
I have a report with multiple datasets, the first of which pulls in data based on user entered parameters (sales date range and property use codes). Dataset1 pulls property id's and other sales data from a table (2014_COST) based on the user's parameters.
I have set up another table (AUDITS) that I would like to use in dataset6. This table has 3 columns (Property ID's, Sales Price and Sales Date). I would like for dataset6 to pull the Property ID's that are NOT contained in the results from dataset1. In other words, I'd like the results of dataset6 to show me the property id's that are contained in the AUDITS table but which are not being pulled into dataset1. Both tables are in the same database.
I found out the data I need for my SQL Report is already defined in a dynamic dataset on another web service. Is there a way to use web services to call another web service to get the dataset I need to generate a report? Examples would help if you have any, thanks for looking
Hi, I have a stored procedure attached below. It returns 2 rows in the SQL Management studio when I execute MyStorProc 0,28. But in my program which uses ADOHelper, it returns a dataset with tables.count=0. if I comment out the line --If @Status = 0 then it returns the rows. Obviously it does not stop in if @Status=0 even if I pass @status=0. What am I doing wrong? Any help is appreciated.
ALTER PROCEDURE [dbo].[MyStorProc]
(
@Status smallint,
@RowCount int = NULL,
@FacilityId numeric(10,0) = NULL,
@QueueID numeric (10,0)= NULL,
@VendorId numeric(10, 0) = NULL
)
AS
SET NOCOUNT ON
SET CONCAT_NULL_YIELDS_NULL OFF
If @Status = 0
BEGIN
SELECT ...... END If @Status = 1 BEGIN SELECT...... END
i have two datasets.one dataset have old data from some other database.second dataset have original data from sql server 2005 database.both database have same field having id as a primary key.i want to transfer all the data from first dataset to new dataset retaining the previous data but if old dataset have the same id(primary key) as in the new one then that row will not transfer. but if the id(primary key) have changed values then the fields updated with that data.how can i do that.
D2 is a list of data. each row in D2 has a classid. D2 may or may not have all the classids in D1. all classids in D2 must be in D1.
I want to show fields in D2 and group the data with classids in D1 and show every group as a seperate table. If no data in D2 is available for a classid, It shows a empty table.
=CountDistinct(IIF(Fields!Released_DT.Value = Fields!Date2.Value, Fields!Name.Value, Nothing)) Released_DT = a date - 09/03/2015 or 09/02/2015 Date2 = returns another date value in this case 09/03/2015
What I'm trying to do is: count distinct number of people (Fields!Name.Value) if the Relased_DT = Date2.My IIF statement is returning a zero value.
Hi every body... I have a probleme I have a web Services which contains a method getValue(IDEq (int), idIndicator(int), startTime(dateTime), endTime(dateTime)) I need to call this method. But my problem is how pass parameter ? I see the tab Param but it isn't work as I wait,... maybe I do a mistake...
I want that statTime and endTime are select by the user via a calendar for example... now idIndicator and idEq was result of an other dataSet from a xml datasource...
But I don't how integrate dynamically... I try to enter a parameter via the param tab, and create and expression : =First(Fields!idEq.Value, "EquipmentDataSet") but when i execute the query, the promter display <NULL>... So I don't know how to do and if it is possible ! I hope someone can help me ! Thank you !
I've got a CHAR (70) field called NAME that has a first and last name separated by a space. I want to split it into two fields FIRST and LAST -- with all the characters to the left of the space a first name and all the characters to the right of the space as last name. I couldn't find a string function that would let me do this simply (it may be right in front of me and I missed it).
I have a database with a "large" table containing date based information Basically they're reservations. I've thought about creating a new table and adding any records from past years to this table. For the most part only current reservation need to be searchable, but in some circumstances it would be useful to be able to search through the archive too. so, my questions!!!
Is 8,000 or so rows of data "large" and unwieldly in SQL terms?
Would splitting this data into 2 tables - one small table for current and future reservations and one larger archive table then using a UNION SELECT query to make archive information seachable be a significant improvment on server resources/load or am I making the whole thing more complicated than it need be as 8,000 rows of data is nothing to worry about.............
What did they say about a little bit of knowledge being a dangerous thing?
Thanks in advance of any guidance to a neophyte!!?
I have one problem regarding sp and pass value in sp I am gating a value like Abc,Def,Ghi,
Now I want to split the whole pass value by “,� And fire one for loop to store value in database This things is done in asp.net web form but I want to do all process in sp So please guide me how I am write sp . The purpose is pass value one time so connection time is decrees and give fast perforce
The objective of this article is to help the SQL developers with an UDF that can be used within a stored procedures or Function to split a string (based on given delimiter) and extract the required portion of the string.
Scripting languages like VB script and Java script have in-built split() functions but there is no such function available in SQL server. In my experience this function is really handy when you’re working on an ASP application with SQL server as backend, whereby you’ll need to pass the ASP page submitted values to the SQL stored procedure.
To give a simple example, in a typical Monthly reporting ASP page – the users would select a range of months and extract the information pertaining to this date range. Classic implementation of this model is to have an ASP page to accept the input parameters and pass the values to the SQL stored procedure (SP). The SP would return a result set which is then formatted in the ASP page as results.
If the date range is continuous ie. JAN07 to MAR07 then the SP can typically accept a ‘From’ and ‘To’ range variables. But I’ve encountered situations whereby the users select 3 months from the current year and 2 months from previous year (non-continuous date ranges). In such scenario the SP cannot have a date range as input parameters.
Typically an ASP programmer would do is by having a single date input parameter in the SP and call the SP within a loop in the ASP page. This is an inefficient way of programming as contacting the database server within an ASP loop could cause performance overhead especially if the table being queried is an online transaction processing table.
Here is how I handled the above situation.
1.Declared one string input parameter of type varchar(8000) (if you’re using SQL 2005 then it is advisable to use Varchar(Max)) 2.Pass the ASP submitted values as string, in this case the months selected by user would be supplied to the SP as a string 3.Within the Stored Procedure I’ll call the split() function to extract each month from the string and query the corresponding data
The basic structure of the stored procedure is as pasted below:-
CREATE PROCEDURE FETCH_SALES_DETAIL ( @MONTH VARCHAR(MAX) ) AS BEGIN DECLARE @MONTH_CNT INT,@MTH DATETIME SET @MONTH_CNT=1 WHILE DBO.SPLIT(@MONTH,',',@MONTH_CNT) <> '' BEGIN SET @MTH = CAST(DBO.SPLIT(@MONTH,',',@MONTH_CNT) AS DATETIME) --<<Application specific T-SQLs>>-- (BEGIN) SELECT [SALES_MONTH],[SALES_QTY],[PRODUCT_ID],[TRANSACTION_DATE] FROM SALES (NLOCK) WHERE [SALES_MONTH]= @MTH --<<Application specific T-SQLs>>--(END)
SET @MONTH_CNT=@MONTH_CNT+1 END END
Dbo.SPLIT() function takes 3 parameters 1)The main string with the values to be split 2)The delimiter 3)The Nth occurrence of the string to be returned
The functionality of the UDF is as explained STEP by STEP:
1.Function Declaration CREATE FUNCTION [dbo].[SPLIT] ( @nstring VARCHAR(MAX), @deliminator nvarchar(10), @index int )
RETURNS VARCHAR(MAX)
Function is declared with 3 input parameters:- @nstring of type VARCHAR(MAX) will hold the main string to be split @deliminator of type NVARCHAR(10) will hold the delimiter @index of type INT will hold the index of the string to be returned 2.Variable Declaration DECLARE @position int DECLARE @ustr VARCHAR(MAX) DECLARE @pcnt int
Three variables are needed within the function. @position is an integer variable that will be used to traverse along the main string. @ustr will store the string to be returned and the @pcnt integer variable to check the index of the delimiter. 3.Variable initialization SET @position = 1 SET @pcnt = 1 SELECT @ustr = '' Initialize the variables 4.Main functionality WHILE @position <= DATALENGTH(@nstring) and @pcnt <= @index BEGIN IF SUBSTRING(@nstring, @position, 1) <> @deliminator BEGIN IF @pcnt = @index BEGIN SET @ustr = @ustr + CAST(SUBSTRING(@nstring, @position, 1) AS nvarchar) END SET @position = @position + 1 END ELSE BEGIN SET @position = @position + 1 SET @pcnt = @pcnt + 1 END END
4.1The main while loop is used to traverse through the main string until the word index is less than or equal to the index passed as input parameter. 4.2Within the while loop each character within the string is verified against the delimiter and if it does not match then local word count variable is checked against the input index parameter 4.3If the values are same ie., the input variable index and the word being processed in the while loop are the same then the word is stored in the @ustr variable. If the values does not match then the @position variable is incremented. 4.4If the character matches with the delimiter then the word count variable @pcnt is incremented along with the @position variable
5.Return the value RETURN @ustr
I hope this article would benefit those who are looking for a handy function to deal with Strings.
Feel free to send your feedback at dearhari@gmail.com
I have 5 dynamic rows each row consisting of 5 checkboxes & 5 dropdowns.I am concatenating the values of each controls in a row using a wildcard charater "~" and each row i am concatenating using "|".The complete string is then assigned to one hidden field and passed as sql parameter to the backend.
Please help in writing the split function to get the values of each checkboxes and dropdowns from the string in order to save them in separate columns.
Is it possible to split the following value in sql server ?
I have the value like 25 Email Accounts,50 Email Accounts in my sqlserver database. Here i need only the numeric value .ie 25,50.Is it possible? can any one give me the solution ..
I am using ASP.Net and C# backend is SQL Server 2000.
CREATE PROCEDURE [dbo].[ShowComboLocation]@Keyword varchar(50) ASSELECT TOP 100 PERCENT PropertyAreaID, PropertyAreaFROM dbo.iViewAllWHERE (PropertyArea LIKE '%' + @Keyword + '%')GOQuestion 1 isIf Keyword ="London WestEnd Harrods", I know my query will end up like this (PropertyArea LIKE 'London WestEnd Harrods')But I want to to individually search for 3 or 1-nth words therefore my query should end up like this(PropertyArea LIKE 'London')OR (PropertyArea LIKE 'WestEnd')OR (PropertyArea LIKE 'Harrods')i WANT TO perform this on my SQL STored Procedure,Can anybody provide code or links pls
I have a procedure that is going to be called through asp pages. This procedure carries out instructions depending on whether customers wants to insert, update or delete their portfolios. Rules are as follows: 1. It should not allow duplicate portfolio name to insert. 2. If customer has reached their max limit of 20 portfolio they can't add. They may have to delete or update the existing portfolio first. 3. all the error handling is done and returned as output parameters.
Now coming to the question at present I have one procedure that does all these things. Should I split up the procedure and have three procedures handling the events seperately: 1 Insert 2 Delete 3 update The reason I am concerned is 1 procedure being hit so many times by concurrent users with varying events. I am concerned about performance issue and slowing down of the page. I do not have exact numbers of users at this point. But they would be in thousands or more. Thanks for any suggestions or advice you all might have to share. Hiku
What I have is a table with a primary key. Then I have 5 other tables with a relating key. No problems there.
I need to create a relationship with the primary table (primary) key who's data field is 25 charachters. I need to parse that out and have 3 charachters go to one, 2 to the other and so on.