Best Way To Split A Dataset Into Manageable Chunks?
Sep 28, 2007
I have a table that's 25,000,000 records... about 10 fields. I need to export this data to a flat file in no more than 500,000 record chunks. I've tried the following algorithm, adding a flag field called "exported" with default value 0.
do:
- mark random 500,000 records, setting exported = -1
- export everything in that table where exported = -1
- set exported = 1 where exported = -1
loop
This was pretty slow, taking about 10 hours last night to run.
I find myself wanting a sort of a split dataset task in SSIS, being able to split records a chunk of records out of a dataset and handle them. Anyone have ideas for me?
View 5 Replies
ADVERTISEMENT
May 16, 2008
I have some data in the following format
Timestamp Value DividendRequirement
01/01/2008 100 100
01/01/2008 200 90
02/01/2008 123 100
02/01/2008 436 90
03/01/2008 399 100
03/01/2008 5046 90
03/01/2008 45 130
04/01/2008 100 100
04/01/2008 233 90
04/01/2008 12 130
What is required is to split data of this format into 3 separate datasets:
1. One dataset for DividendRequirement of 100,
i.e. select * from tableName where DividendRequirement = 100
2. One dataset for DividendRequirement > 100
i.e. select * from tableName where DividendRequirement > 100
3. One dataset for DividendRequirement < 100
i.e. select * from tableName where DividendRequirement < 100
I know that i can do it with 3 separate stored procedures using a different operator ('=', '>' and '<') in each one and that i can combine the 3 stored procedures into 1 using dynamic sql and pass the operator (or some number that maps to a particular identifier) as a parameter to the stored procedure. What i'm after though is a way to avoid dynamic SQL but still keep it as one stored procedure. Possibly some clever use of case statements or something along those lines?
Any ideas?
Thanks
View 5 Replies
View Related
Jun 15, 2007
Can I ask how to split the dataset into training and validation when running descision tree model?
View 3 Replies
View Related
Oct 19, 2006
hi,
i have a big table (120 million records) and i want to take all this table and to insert it into another table. since this BULK insert operation can make all kind of performance problem i would like to make the bulk insert via small chunks. the table does not have any idintity.
can someone give me an exapmle with rowcount or with a loop to make each time an insert into select statment and to insert in each time for example 5000 rows.
help is appriciated,
thx,
Tomer
View 2 Replies
View Related
Sep 13, 2007
I need to export records to a flat file using a dataflow task, but want no more than 50,000 records in each file. What's the best way to automate this?
View 1 Replies
View Related
Jan 21, 2004
Hi,
I have a text file (5 MB). It appears as a single line in a text editor. But actually it has records of 1500 byte length each.
I want to strip it down to 1500 byte records. So 1500*3500 = 5 MB (approx). The record size is always 1500 bytes.
Does anyone have a script that I can run on this file to achieve this break.
Thanks
View 3 Replies
View Related
Apr 3, 2008
I have a table based around requisitions, and each requisition has a number of positions. That number can change over time through updates to pertinent rows rather than through transaction-like records that record an entire history, and I'm only able to get a monthly snapshot of the table. What I decided to do is still use one table for OLAP (fact_requisitions) but add a column called period_key that refers to the month the data comes from. So if I have two months of data then the table has each requisition twice, possibly with differing position counts, and new requisitions from the second month are only present once. Then I tried to filter the MDX query like so:
SELECT {
([Dim TimeRequestClosed].[Year - MonthNumber].[Year_Text].&[2008].&[1],[Dim Requisitions].[Period].[Period Key].&[200801])
}
ON COLUMNS,
NON EMPTY
{
([Dim Location].[Region Name].MEMBERS, [Dim Location].[Period Key].&[200801])
}
ON ROWS
FROM
[Requisitions]
WHERE
[Measures].[Request Closed Date Count]
This query doesn't work even though the data is there, it just returns nulls. Am I going about this all wrong? If not, what might I be doing wrong, and how would I get the query to return more than one period (e.g. tell Dim Requisition to match up with Dim Location on the period key)?
View 2 Replies
View Related
Sep 7, 2006
From what I can see, the 'varbinary(max)' data type is not supported, and the 'image' data type is supposed to go away. Is there some other way to store large chunks (10MB to 100MB) of data into an SSEv DB?
If I have to use the 'image' data type to so this, does anyone have a code sample that would let me push an array() of numbers into an 'image' field, and unload an 'image' field into an array()?
TIA
Pat
View 7 Replies
View Related
May 7, 2002
Here's a little SP to break up those long-running, massively-locking, bring-app-to-a-halt queries. By default it does 500 rows at a time and allows for a maximum SQL query size of 4000 characters; it should be trivial to adjust those.
Cheers
-b
CREATE PROCEDURE p_BatchExecute (@vcSQL varchar(4000)) AS
set nocount on
DECLARE @iRows int
select @iRows=1
SET ROWCOUNT 500
WHILE @iRows>0
BEGIN
print 'Executing batch of 500...'
exec (@vcSQL)
set @iRows=@@ROWCOUNT
END
GO
View 3 Replies
View Related
Feb 16, 2012
We have data that consists of an employee number, a start time and a finish time, similar to the example below
EMP Â STARTTIME Â Â Â Â Â Â Â Â Â Â ENDTIME
00001 10-Feb-2012 06:00:00 10-Feb-2012 10:00:00
00002 10-Feb-2012 07:15:00 10-Feb-2012 10:00:00
00003 10-Feb-2012 08:00:00 10-Feb-2012 10:00:00
I am trying to come up with a procedure in SQL that will give me each 15 minute block throughout the day and a count of how many employees are expected to be at work at the start of that 15 minute block. So, given the example above I would like to return
10-Feb-2012 00:00:00Â Â Â Â 0
10-Feb-2012 00:15:00Â Â Â Â 0
10-Feb-2012 06:00:00Â Â Â Â 1
10-Feb-2012 06:15:00Â Â Â Â 1
[code]....
I'm not too worried if the date part is not included in the result as this could be determined elsewhere, but how can I do this grouping/counting?
View 7 Replies
View Related
May 6, 2004
I'm using the code below to send files that are in a blob file in my database to the browser client. The code sends the file in chunks in order to increase performance. The file I'm using to test with is 7MB. It works great on Windows XP with any browser. It takes virtually the same amount of time compared to downloading the file directly from the webserver. However, Windows 2000 and Mac OS X both take about 4x the amount of time it takes to download the file on XP machines. Why the performance difference? Is there anything I can do to fix this? I tried downloading the file directly from the webserver instead of getting it out of the database and it takes the same amount of time on all 3 OS. I had the same problem on Windows XP when I wasn't sending the file in chunks, but after using the code below, it started working for XP only.
Dim bufferSize As Integer = 24000
Dim outbyte(bufferSize - 1) As Byte
Dim retval As Long
Dim startIndex As Long = 0
Dim sql As String = "SELECT ..."
Dim cmd As New SqlCommand(sql, conn)
conn.open()
Dim dr As SqlDataReader = cmd.ExecuteReader(CommandBehavior.SequentialAccess)
If dr.Read() Then
' Reset the starting byte for a new BLOB.
startIndex = 0
' Read bytes into outbyte() and retain the number of bytes returned.
retval = dr.GetBytes(DocCol, startIndex, outbyte, 0, bufferSize)
Current.Response.Clear()
Current.Response.Buffer = True
Current.Response.ContentType = "application/octet-stream"
Current.Response.AddHeader("Content-Disposition", "attachment; filename=" & myfile" & "." & myextension)
Do While retval = bufferSize
Current.Response.BinaryWrite(outbyte)
Current.Response.Flush()
' Reposition the start index to the end of the last buffer and fill the buffer.
startIndex += bufferSize
retval = dr.GetBytes(DocCol, startIndex, outbyte, 0, bufferSize)
Loop
'Write the remainder of the last chunk
Dim remaining(retval) As Byte
Array.Copy(outbyte, 0, remaining, 0, retval)
Current.Response.BinaryWrite(remaining)
Current.Response.Flush()
Current.Response.Close()
End If
dr.Close()
conn.Close()
View 1 Replies
View Related
Jun 1, 2007
Using the SqlClient provider I'm trying to write big datachunks of maybe 20 MB each to SQL server to store in BLOBs using blobColumn.Write(...) using .NET 2.0 dbcommand object calling a Stored procedure
CREATE PROCEDURE [dbo].[putBlobByPK]
(
@id dKey
, @value VARBINARY(MAX)
, @offset bigint
, @length bigint
, @ModDttm dModDttm OUT
, @ModUser dModUser OUT
, @ModClient dModClient OUT
, @ModAppl dModAppl OUT
)
....
When doing this I can do this exactly 3 times than the application hangs (for ever).
When looking in the SQL Server log, I find the following to errors:
Error: 4014, Severity: 20, Status: 2.
A fatal error occurred while reading the input stream from the network. The session will be terminated.
I don't get this error on the client! OK, the session died.
What may be the problem?
I write big chunks like this to avoid many writes as the data shall be replicated later using peer to peer replication. And the more writes used for writing the total BLOB the more huge becomes the transaction log of the subscriber database.
TIA
Hannoman
View 1 Replies
View Related
May 26, 2015
I have a report with multiple datasets, the first of which pulls in data based on user entered parameters (sales date range and property use codes). Dataset1 pulls property id's and other sales data from a table (2014_COST) based on the user's parameters. I have set up another table (AUDITS) that I would like to use in dataset6. This table has 3 columns (Property ID's, Sales Price and Sales Date). I would like for dataset6 to pull the Property ID's that are NOT contained in the results from dataset1. In other words, I'd like the results of dataset6 to show me the property id's that are contained in the AUDITS table but which are not being pulled into dataset1. Both tables are in the same database.
View 0 Replies
View Related
Oct 1, 2015
I have a small number of rows in a dataset, Table 1. There is a CLOB on a large dataset, Table 2. They join on a PK. I would like to retrieve this CLOB and add it to the data flow for Table1. In short I want to emulate the following:
Table 1:Â Small table without CLOB, 10 rows.Â
Table 2: Large table with CLOB, 10,000,000 rows
select CLOB
from table2
where pk = (select pk from table1)
I want this to return the CLOBs for the small number of rows in Table 1. The PK is indexed obviously so it should be a fast look up.
Table 1 and Table 2 live on different Oracle databases. How do I perform this operation efficiently in SSIS? It seems the Lookup and Merge Join wont do this.
View 2 Replies
View Related
May 27, 2015
I have a report with multiple datasets, the first of which pulls in data based on user entered parameters (sales date range and property use codes). Dataset1 pulls property id's and other sales data from a table (2014_COST) based on the user's parameters.
I have set up another table (AUDITS) that I would like to use in dataset6. This table has 3 columns (Property ID's, Sales Price and Sales Date). I would like for dataset6 to pull the Property ID's that are NOT contained in the results from dataset1. In other words, I'd like the results of dataset6 to show me the property id's that are contained in the AUDITS table but which are not being pulled into dataset1. Both tables are in the same database.
View 3 Replies
View Related
May 21, 2007
I found out the data I need for my SQL Report is already defined in a dynamic dataset on another web service. Is there a way to use web services to call another web service to get the dataset I need to generate a report? Examples would help if you have any, thanks for looking
View 2 Replies
View Related
Oct 12, 2007
Is there any way to display this information in the report?
Thanks
View 3 Replies
View Related
May 7, 2008
Hi,
I have a stored procedure attached below. It returns 2 rows in the SQL Management studio when I execute MyStorProc 0,28. But in my program which uses ADOHelper, it returns a dataset with tables.count=0.
if I comment out the line --If @Status = 0 then it returns the rows. Obviously it does not stop in
if @Status=0 even if I pass @status=0. What am I doing wrong?
Any help is appreciated.
ALTER PROCEDURE [dbo].[MyStorProc]
(
@Status smallint,
@RowCount int = NULL,
@FacilityId numeric(10,0) = NULL,
@QueueID numeric (10,0)= NULL,
@VendorId numeric(10, 0) = NULL
)
AS
SET NOCOUNT ON
SET CONCAT_NULL_YIELDS_NULL OFF
If @Status = 0
BEGIN
SELECT ......
END
If @Status = 1
BEGIN
SELECT......
END
View 4 Replies
View Related
Apr 11, 2008
i have two datasets.one dataset have old data from some other database.second dataset have original data from sql server 2005 database.both database have same field having id as a primary key.i want to transfer all the data from first dataset to new dataset retaining the previous data but if old dataset have the same id(primary key) as in the new one then that row will not transfer.
but if the id(primary key) have changed values then the fields updated with that data.how can i do that.
View 4 Replies
View Related
Dec 19, 2006
Hi,
I have two datasets in my report, D1 and D2.
D1 is a list of classes with classid and title
D2 is a list of data. each row in D2 has a classid. D2 may or may not have all the classids in D1. all classids in D2 must be in D1.
I want to show fields in D2 and group the data with classids in D1 and show every group as a seperate table. If no data in D2 is available for a classid, It shows a empty table.
Is there any way to do this in RS2005?
View 2 Replies
View Related
Sep 3, 2015
Using this IIF statement:
=CountDistinct(IIF(Fields!Released_DT.Value = Fields!Date2.Value, Fields!Name.Value,
Nothing))
Released_DT = a date - 09/03/2015 or 09/02/2015
Date2 = returns another date value in this case 09/03/2015
What I'm trying to do is: count distinct number of people (Fields!Name.Value) if the Relased_DT = Date2.My IIF statement is returning a zero value.
View 4 Replies
View Related
Apr 5, 2007
Hi every body...
I have a probleme
I have a web Services which contains a method getValue(IDEq (int), idIndicator(int), startTime(dateTime), endTime(dateTime))
I need to call this method. But my problem is how pass parameter ?
I see the tab Param but it isn't work as I wait,... maybe I do a mistake...
I want that statTime and endTime are select by the user via a calendar for example...
now idIndicator and idEq was result of an other dataSet from a xml datasource...
But I don't how integrate dynamically... I try to enter a parameter via the param tab, and create and expression :
=First(Fields!idEq.Value, "EquipmentDataSet")
but when i execute the query, the promter display <NULL>...
So I don't know how to do and if it is possible !
I hope someone can help me !
Thank you !
View 3 Replies
View Related
Dec 3, 2007
Hi experts,
I'm not sure my design is normal or not. Please give me some advice.
I've a dataset and query by a field name 'companyid'.
select * from companyid where companyid = @icompany which @icompany is a input field.
if user select all, i'll send 0 to @icompany then I need to select all records.
question 1. How can I get all records? (i think about this query select * from companyid <> 0)
if user select for example companyid = 1, i'll send 1 to @icompany and the query work fine.
question 2. How can I change the query to adopt this 2 condition?
Thanks a lot,
Jeff
View 8 Replies
View Related
Feb 16, 2005
Quick question.
I've got a CHAR (70) field called NAME that has a first and last name separated by a space. I want to split it into two fields FIRST and LAST -- with all the characters to the left of the space a first name and all the characters to the right of the space as last name. I couldn't find a string function that would let me do this simply (it may be right in front of me and I missed it).
Thanks in advance.
Ray
View 14 Replies
View Related
May 9, 2007
I have a database with a "large" table containing date based information Basically they're reservations. I've thought about creating a new table and adding any records from past years to this table. For the most part only current reservation need to be searchable, but in some circumstances it would be useful to be able to search through the archive too. so, my questions!!!
Is 8,000 or so rows of data "large" and unwieldly in SQL terms?
Would splitting this data into 2 tables - one small table for current and future reservations and one larger archive table then using a UNION SELECT query to make archive information seachable be a significant improvment on server resources/load or am I making the whole thing more complicated than it need be as 8,000 rows of data is nothing to worry about.............
What did they say about a little bit of knowledge being a dangerous thing?
Thanks in advance of any guidance to a neophyte!!?
View 6 Replies
View Related
Jul 14, 2007
Hi to all
I have one problem regarding sp and pass value in sp
I am gating a value like Abc,Def,Ghi,
Now I want to split the whole pass value by “,�
And fire one for loop to store value in database
This things is done in asp.net web form but I want to do all process in sp
So please guide me how I am write sp .
The purpose is pass value one time so connection time is decrees and give fast perforce
View 3 Replies
View Related
Jan 8, 2008
SQL UDF split()
The objective of this article is to help the SQL developers with an UDF that can be used within a stored procedures or Function to split a string (based on given delimiter) and extract the required portion of the string.
Scripting languages like VB script and Java script have in-built split() functions but there is no such function available in SQL server. In my experience this function is really handy when you’re working on an ASP application with SQL server as backend, whereby you’ll need to pass the ASP page submitted values to the SQL stored procedure.
To give a simple example, in a typical Monthly reporting ASP page – the users would select a range of months and extract the information pertaining to this date range. Classic implementation of this model is to have an ASP page to accept the input parameters and pass the values to the SQL stored procedure (SP). The SP would return a result set which is then formatted in the ASP page as results.
If the date range is continuous ie. JAN07 to MAR07 then the SP can typically accept a ‘From’ and ‘To’ range variables. But I’ve encountered situations whereby the users select 3 months from the current year and 2 months from previous year (non-continuous date ranges). In such scenario the SP cannot have a date range as input parameters.
Typically an ASP programmer would do is by having a single date input parameter in the SP and call the SP within a loop in the ASP page. This is an inefficient way of programming as contacting the database server within an ASP loop could cause performance overhead especially if the table being queried is an online transaction processing table.
Here is how I handled the above situation.
1.Declared one string input parameter of type varchar(8000) (if you’re using SQL 2005 then it is advisable to use Varchar(Max))
2.Pass the ASP submitted values as string, in this case the months selected by user would be supplied to the SP as a string
3.Within the Stored Procedure I’ll call the split() function to extract each month from the string and query the corresponding data
The basic structure of the stored procedure is as pasted below:-
CREATE PROCEDURE FETCH_SALES_DETAIL (
@MONTH VARCHAR(MAX)
)
AS
BEGIN
DECLARE @MONTH_CNT INT,@MTH DATETIME
SET @MONTH_CNT=1
WHILE DBO.SPLIT(@MONTH,',',@MONTH_CNT) <> ''
BEGIN
SET @MTH = CAST(DBO.SPLIT(@MONTH,',',@MONTH_CNT) AS DATETIME)
--<<Application specific T-SQLs>>-- (BEGIN)
SELECT [SALES_MONTH],[SALES_QTY],[PRODUCT_ID],[TRANSACTION_DATE]
FROM SALES (NLOCK)
WHERE [SALES_MONTH]= @MTH
--<<Application specific T-SQLs>>--(END)
SET @MONTH_CNT=@MONTH_CNT+1
END
END
Dbo.SPLIT() function takes 3 parameters
1)The main string with the values to be split
2)The delimiter
3)The Nth occurrence of the string to be returned
The functionality of the UDF is as explained STEP by STEP:
1.Function Declaration
CREATE FUNCTION [dbo].[SPLIT]
(
@nstring VARCHAR(MAX),
@deliminator nvarchar(10),
@index int
)
RETURNS VARCHAR(MAX)
Function is declared with 3 input parameters:-
@nstring of type VARCHAR(MAX) will hold the main string to be split
@deliminator of type NVARCHAR(10) will hold the delimiter
@index of type INT will hold the index of the string to be returned
2.Variable Declaration
DECLARE @position int
DECLARE @ustr VARCHAR(MAX)
DECLARE @pcnt int
Three variables are needed within the function. @position is an integer variable that will be used to traverse along the main string. @ustr will store the string to be returned and the @pcnt integer variable to check the index of the delimiter.
3.Variable initialization
SET @position = 1
SET @pcnt = 1
SELECT @ustr = ''
Initialize the variables
4.Main functionality
WHILE @position <= DATALENGTH(@nstring) and @pcnt <= @index
BEGIN
IF SUBSTRING(@nstring, @position, 1) <> @deliminator BEGIN
IF @pcnt = @index BEGIN
SET @ustr = @ustr + CAST(SUBSTRING(@nstring, @position, 1) AS nvarchar)
END
SET @position = @position + 1
END
ELSE BEGIN
SET @position = @position + 1
SET @pcnt = @pcnt + 1
END
END
4.1The main while loop is used to traverse through the main string until the word index is less than or equal to the index passed as input parameter.
4.2Within the while loop each character within the string is verified against the delimiter and if it does not match then local word count variable is checked against the input index parameter
4.3If the values are same ie., the input variable index and the word being processed in the while loop are the same then the word is stored in the @ustr variable. If the values does not match then the @position variable is incremented.
4.4If the character matches with the delimiter then the word count variable @pcnt is incremented along with the @position variable
5.Return the value
RETURN @ustr
I hope this article would benefit those who are looking for a handy function to deal with Strings.
Feel free to send your feedback at dearhari@gmail.com
View 1 Replies
View Related
Feb 4, 2008
I have 5 dynamic rows each row consisting of 5 checkboxes & 5 dropdowns.I am concatenating the values of each controls in a row using a wildcard charater "~" and each row i am concatenating using "|".The complete string is then assigned to one hidden field and passed as sql parameter to the backend.
Please help in writing the split function to get the values of each checkboxes and dropdowns from the string in order to save them in separate columns.
Thanks
View 3 Replies
View Related
Jan 28, 2004
Hi,
Is it possible to split the following value in sql server ?
I have the value like 25 Email Accounts,50 Email Accounts in my sqlserver database.
Here i need only the numeric value .ie 25,50.Is it possible? can any one give me the solution ..
I am using ASP.Net and C# backend is SQL Server 2000.
Thanks and Regards
Arul
View 1 Replies
View Related
Oct 17, 2005
CREATE PROCEDURE [dbo].[ShowComboLocation]@Keyword varchar(50) ASSELECT TOP 100 PERCENT PropertyAreaID, PropertyAreaFROM dbo.iViewAllWHERE (PropertyArea LIKE '%' + @Keyword + '%')GOQuestion 1 isIf Keyword ="London WestEnd Harrods", I know my query will end up like this (PropertyArea LIKE 'London WestEnd Harrods')But I want to to individually search for 3 or 1-nth words therefore my query should end up like this(PropertyArea LIKE 'London')OR (PropertyArea LIKE 'WestEnd')OR (PropertyArea LIKE 'Harrods')i WANT TO perform this on my SQL STored Procedure,Can anybody provide code or links pls
View 1 Replies
View Related
Mar 21, 2001
I have a procedure that is going to be called through asp pages. This procedure carries out instructions depending on whether customers wants to insert, update or delete their portfolios. Rules are as follows: 1. It should not allow duplicate portfolio name to insert. 2. If customer has reached their max limit of 20 portfolio they can't add. They may have to delete or update the existing portfolio first. 3. all the error handling is done and returned as output parameters.
Now coming to the question at present I have one procedure that does all these things. Should I split up the procedure and have three procedures handling the events seperately: 1 Insert 2 Delete 3 update The reason I am concerned is 1 procedure being hit so many times by concurrent users with varying events. I am concerned about performance issue and slowing down of the page. I do not have exact numbers of users at this point. But they would be in thousands or more. Thanks for any suggestions or advice you all might have to share. Hiku
View 3 Replies
View Related
Mar 22, 2001
I want to know how to parse a fullname into a fname and lname.
View 2 Replies
View Related
Sep 19, 2002
What I have is a table with a primary key. Then I have 5 other tables with a relating key. No problems there.
I need to create a relationship with the primary table (primary) key who's data field is 25 charachters. I need to parse that out and have 3 charachters go to one, 2 to the other and so on.
I don't know how to do that, can you help?
View 1 Replies
View Related