Integration Services :: Updating A Table With Billion Rows
Aug 5, 2014It was an interview question, what's the best way to update a table with 10 billion rows.
View 16 RepliesIt was an interview question, what's the best way to update a table with 10 billion rows.
View 16 RepliesHow do I add only new rows to a destination table (when copying a table from another database every night) ?Every night I am copying a number of tables from one database to another.I only want to insert news rows (that are not in the destination table, but are in the source table) to the destination table.I might normally drop the destination table and just copy over the whole table, but in this case rows can be deleted from the source table, but I want to keep these old rows in the destination table (to maintain history). So I only want to add in rows to the destination table that have been added to the source table since last time.I guess I could copy the whole of the source table to a temporary table in the warehouse, then use a T-SQL merge command to compare and just add new rows to the destination table- but suspect that this is not the best way.
View 8 Replies View RelatedI have a flat file with 13K columns which I need to load in a wide table.
The flat file does not even have column names and no datatypes defined.
How to load data in the wide table?
Also if i choose to load the data in 13 different work tables.
How do I define datatypes in the flatfile connection manager in SSIS for 13000 columns ?
I've a SSIS 2008 parent/child package solution to manage data transfers between two different data sources, so we can copy multiple tables and capture how many rows were transferred and duration for each transfer. This solution was working fine up until last week, when I made some changes to allow the package to perform a source count using standard SQL determined by an expression, or SQL provided from configuration tables, I also changed the package to Truncate or not the destination table, again controlled by configuration settings in a table. The child packages which perform the data flows have not changed!
The day after the controlling package promotion to live, I saw the bizarre behaviour of the Package log stating all rows transferred, but the actual table counts were not what the log stated, see attached file. The package solution works ok on other servers and was ok in DEV, but there were less tables and rows transferred.Re-running the package gave the same errors, but on some of the same tables and some different ones.
As it is the child packages doing the transfers and nothing has changed in them. I cannot see how the log would be able to say all rows are transferred and yet not all of the rows are actually moved?
Process output - where you can see counts and log Table transfer controller (as txt not dtsx)
An example of the data transfer child packages (as txt not dtsx)When I set the ExecuteOutOfProcess = True the package worked fine, unfortunately, this is not a good solution as SSIS 2008 does not tidy up the Dtshost.exe processes it starts and I'd be left with a memory issue after a very short time, we transfer hundreds of tables each day. ( I could write a .net script in the controlling package to kill the child processes, but that would still have hundreds of processes running before I could end them, as we have three parallel streams to allow a bit better performance.
I have a conditional split in an SSIS package - one split is where if rows are returned according to a specific rule, then insert those rows into to a Recordset Destinationm which points to a variable of Object type.
How I can use this variable to email fellow users. For example, what I would like is if ANY rows are returned to the Object variable (1 or more), then I would like to execute an email SP that we have on our server.
Hi friends ,
Can any buddy tell me how can i update a particular table by integration services.... I just need to update some of column value
if i write query ...i need to write approx 35 update statement (Query)
So is there is any way by which i can replace existing data to my current data .
So I just got an email from Production Support saying an hour and a half downtime is unacceptable to move a half billion rows between 2 partitions because I am moving a clustered index and space is a consideration.
I can not use partition switching because the clustered index is changing.
This is what I am doing...
1. I am creating a new table with the new cluster on a new partiton
2. I am moving the records in 5K set based batches by doing a range search on the existing clustered index on the existing table.
3. I then reapply all of the nonclustered index from the original table to the new one.
4. I do a sp_rename swap out.
The same way I have done this many times before. Is there some new secret special sauce (other than partition switching) I can use?
I am facing an issue that Data flow task failing after loading 29000 rows out of 2lakhs rows.
I am loading data from .csv file to OLE DB Destination.
This data flow task is placed inside For each loop container.
is this issue because of any performance issue in SSIS packages such as buffer size.
find the error below:
DFT Load Data from FlatFile:Error: The conditional operation failed.
DFT Load Data from FlatFile:Error: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR.
The "DER Add Calc Columns" failed because error code 0xC0049063 occurred, and the error row disposition on "DER Add Calc Columns.Outputs[Derived Column Output].Columns[M_VALUE_NUM]" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
DFT Load Data from FlatFile:Error: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component "DER Add Calc Columns" (48) failed with error code 0xC0209029 while processing input "Derived Column Input" (49). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.
[code]....
Have Visual Studio 2008 R2 with SP 2 installed. Due to a merger we now have a MySQL database that we need to update from SSIS. Everything works except for the table insert or update. Would upgrading to SP 3 or SP 4 maybe useful with that?
We have installed the latest driver from MySQL. Have tried the ADO.Net and ODBC drivers with similar results when we try to update the database.
There is a requirement to insert if there is a new record/update the existing record to Sharepoint. Am able to do Insert/Update to SharePoint if the data quantity is minimum, i.e, records in hundreds.
link: [URL] ....
if the records/data is more(in thousands), while updating to sharepoint, am getting below error. I tried to keep the minimal batch size, but still getting the same error.Is there any setting on Sharepoint that can be set to increase updation capability?
Error details:
System.ServiceModel.ProtocolException: The content type text/html; charset=utf-8 of the response message does not match the content type of the binding (text/xml; charset=utf-8). If using a custom encoder, be sure that the IsContentTypeSupported method
is implemented properly. The first 1024 bytes of the response were: '
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
[code].....
what is the cause of this error?
I have an existing old SSIS Solution that needs to be updated with MDX query. In the query I am using every thing as is but changing only the filter condition. When I try to Preview the data on data flow task, I am getting below error.
Outputs[OLE DB Source Output] references an external data type that cannot be mapped to a Data Flow task data type. The Data Flow task data type DT_WSTR will be used instead.
Package without MDX query update is running fine and no issues. I am using SQL Server 2014 and SSDT 12.0.50318.0 version for your info.
I tried updating SSAS connection string with "Format=Tabular" which was missing earlier and still didn't work.
If, in an SSIS package, you put an instance of an 'Execute SQL Task' task in the Control Flow, in the Properties window, you can see the properties of the task, for example CodePage.
If you double click on the task, the Execute SQL Task Editor appears, with several of the properties which are also in the Properties window, including CodePage.
If, in the Editor, you update the value of CodePage, then click OK, the value of CodePage in the Properties window is updated immediately.
I have written a custom SSIS task, which also has the same properties in the Properties window and in the Editor. The Editor also has an OK button. When OK is clicked, the values of the task properties are updated. An example property is FolderToArchive. If I open the Editor, change the value of FolderToArchive and click the OK button, the value of FolderToArchive in the Properties window is NOT immediately updated.
If, however, I select the FolderToArchive field in the Propertiesd window, it is then updated with the value I entered in the Editor.
How do I get my task to update the values in the Properties window, after changing a value in the Editor, when I click the OK button?
I would have thought I would need something like, in pseudo-code,
Task.Parent.PropertiesWindow.Refresh
where task is of type Microsoft.SqlServer.Dts.Runtime.Task and Task.Parent is of type Microsoft.SqlServer.Dts.Runtime.Package.
Here is a requirement. Need to update the columns in the tables with the latest values available in CSV.
The file is based on department so the list of tables which is under this department all the corresponding tables needs to updated.
The CSV file which is dynamic in nature the number of columns changes it has header of the columns that needs to be updated.
The destination tables are listed under department table for each department. So I have to update the columns in the tables with the values in csv.
Hi;
I want to know, if a table has 1 billion recoreds then it will work eaisly with .net objects like datasets etc?
if not, what is the normal amount of data for sql server which is easy to manage for system.
Adnan
I will be receiving data from a company called Loan Performance, that has one file/table that will hold 1 billion rows. They send data by period, and I plan to load the data via BCP via NT/DOS scripts. The 1 billion rows represents data for 200+ periods.Are the following design plans feasible1. Partition table by period value, I'm not sure of the max number of partitions per table in 2005, but I think we have periods data back to 1992 and a new one gets created every month, so the possibility of having > 1000 partitions exists. I plan on just pre-creating partitions for future data, instead of dynamically creating when a new period is sent.2. Load data via BCP in DOS shell scripts that will drop index (by partition), BCP in data, and they re-create indexes by partition, is this possible ? and will I see a performance increase as opposed to one huge table (I'm pretty much sure I will). There is usually one periods data present per day, but sometimes the vendor resends all data (would get loaded on weekend).I'm a bit unsure of where to start being I never worked with this amount of data. I worked with partitioning in Oracle a long time ago.I plan on having an 2XQuadCore 2.66Ghz CPU with 32GB of RAM and SQL2005EE 64Bit connected to 1 Terabyte SAN Disk.Thanks all,PMA
View 7 Replies View RelatedI have a .xlsx file, in that file I get data from the 3rd row. Using SSIS I am converting .xlsx file into .csv file. I am able to convert it but in the .csv file the data are populating from the first row itself. I want to get data in 3rd row it self.
View 2 Replies View RelatedI'm working with integration services 2008 and I have a package where I selected a certain set of rows from an Oracle database and then I insert this data set into a SQL database. When the package is inserting in SQL database, it shows that all the rows were inserted, "green", but when I check the amount of data (I count), it's not the same as it was in Oracle.
For example: there are 15390 rows in Oracle and the packages inserts sometimes 9801, 8310, 9952, 9934, 9975, 5437, 9909 rows in SQL server.The package does not abort! It simply does not insert all the rows!
I have data rows ( 7 rows and hundreds of colums) obtained by using execute sql task and i placed the output in CSV. Then I also moved this data in csv into excel (first row of excel)using simple data flow task with flat file source(CSV) and excel destination connections. However, I need to push data into 3rd row of excel sheet(I need to write some description and titles in the first two rows) .I need to do this inorder to automate the process of producting the excel which has predefined pivot tables. I only need to update the excel sheet with raw data( 3rd row) which drives the pivot tables. How do I do this? How do i push into the third row instead of first?
View 6 Replies View RelatedWhile handling errors in a flat file I am routing the rows into the excel sheet to business for further processing. I need to send a mail attaching the excel sheet which contains the errored-out/In-Complete definitons. The problem is I need to send a mail if and only if there is at least one row in Excel sheet. How Can I count the rows exist in excel sheet? and send the rowcount to a variable and use that value of a variable to send a mail?
View 5 Replies View RelatedI got a flat file like:
Location CurrencyRates
ALBERTA #15U.S.$ 0.2930 1 Can$ 0.0900
BRITISH COLUMBIA #14U.S.$ 0.6891 2 Can$ 0.2117
MANITOBA #18U.S.$ 0.4557 3 Can$ 0.1400
If there a way I can use SSIS to transfer this file to something like:
locationUSDUSDrateflagCADCADrateALBERTA #15U.S.$ 0.29301Can$ 0.0900BRITISH COLUMBIA #14U.S.$ 0.68912Can$ 0.0900MANITOBA #18U.S.$ 0.45573 Can$ 0.1400
Hi,
I have a table called "tblProducts" with following fields:-
ProductID(Pk, AutoIncrement), ProductCode(FK), ProdDescr.
So to the above table I have added a new field/column named "ProdLongDescr(varchar, Null)"
So, I need to populate this newly added column with specific values for each row depending on "ProductCode" which is different forevery row. The problem is that I have 25 rows.So instead of Writing 25 individual update scripts, is there a way in which single query will do the same job instead of writing one update query for each row ?. If so can some one guide me how to achieve that OR point to me a good resource.
Below are a couple of Individual update scripts I Wrote. "ProductCode" is different for all 25 rows.
Update tblValAdPackageElement SET ProdLongDescr = 'Slideshows' WHERE ProductCode = 'SLID'
And szElementDescr='Slideshow'
if @@error <> 0
begin
goto ErrPos
end
Update tblValAdPackageElement SET ProdLongDescr = 'CategorySlideshows' WHERE ProductCode = 'SLDC'
And szElementDescr='CategorySlideshow'
if @@error <> 0
begin
goto ErrPos
end
Thanks,
I'm looking for some performance assistance on updating a column value in a table that contains approximately 50 million rows. I have a permanent table in another database that has the key column and value to be set. My query is listed below, but I'm afraid it will run quite awhile. Any suggestions would be appreciated.
update mytable
set column2 = b.column2
from mytable as a
join mytable1 as b
on a.column1 = b.column1
There is a one to one relationship between the two tables.
I have an EXECUTE SQL Task, which gets a result-set from SQLServer using OLEDB Connection.
The result set is mapped to an object variable , say @[User::FilePath]
There are 33 row is the above resultset.
Then, I have a For-each loop, inside which, I have a Script task .
I am trying to put the above @[User::FilePath] recordset into a DataTable using DataAdapter.Fill() function. I perform some read operations to its rows.
The problem is , in the First Iteration of For-Each-Loop, the number of rows in data-table shows 33.
But from the Next Iteration, it comes out to be 0. (ZERO!!)
This causes my package to fail.
I need to do something like this in SSIS:From one SQL table I need to get some id values, I am using a simple sql query:Select ID from Identifier where value is not null.I've got this result:As a final result I need to generate and set a variable in SSIS with the final value:
@var
= '198','120','ACP','120','PQU'
Which I need to use later in a odbc expression.How can I do this in SSIS?
I'm having a problem with a flat file source in that the package throws an error straight away. The error I get is as follows:
Information: 0x4004300C at Data Flow Task, SSIS.Pipeline: Execute phase is beginning.
Error: 0xC0202091 at Data Flow Task, Flat File Source [2]: An error occurred while skipping data rows.
Error: 0xC0047038 at Data Flow Task, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED.
The PrimeOutput method on Flat File Source returned error code 0xC0202091. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
I've tried multiple things to resolve this.. Changing the OutputColumnWidth in the flat file connection manager, checking the file for any irregularites in notepad++ (each line ends as it should with CRLF), the file is encoded in UTF-8 without BOM .
there is a way to write an if statement for a derived column to count rows?
for example:
Row 35: PRI 7010
Row 100: PRI 7011
formula that when it gets to row 100. it will go back and look at 35 and use that data if the 7010 = 7010, if not use 7011,,,
I'm doing a group by in an aggregate transformation. I have say 6 columns in the output and I'm grouping on all of them - how can I get duplicate rows in the output? If I do the same select and group by in SQL on the source data I don't get any duplicate rows. In fact out of 6000+ rows I only get 2 duplicates.
View 7 Replies View RelatedHello All
I am trying to figure out if what i am attempting to do is possible and whether or not my approach is wrong to begin with.
I am trying to build a custom report for our accounting system which is Traverse from Open systems. This is what i have done in the stored procedure thus far
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_NULLS ON
GO
ALTER PROCEDURE rptArFLSalesByCustItemized_sp
@custId pCustID,
@dateFrom datetime,
@dateThru datetime,
@itemIdFrom pItemId,
@itemIdThru pItemId
as
set nocount on
-- define some variables for previous year
declare @LYqty int, @LyAmt money, @LYfrom datetime, @LYthru datetime
-- set defaults
SET @itemIdFrom=ISNULL(@itemIdFrom,(SELECT MIN(itemId) FROM tblInItem))
SET @itemIdThru=ISNULL(@itemIdThru,(SELECT MAX(itemId) FROM tblInItem))
SET @LYfrom=DATEADD(YEAR,-1,@dateFrom)
SET @LYthru=DATEADD(YEAR, -1, @dateThru)
-- create small temp table to hold customer info
Create Table #tmpArCustInfo
(
custId pCustID,
custName VARCHAR (30),
)
-- populate customer temp table with info
Insert into #tmpArCustInfo
select custId, custName
from tblArCust
WHERE custId = @custId
-- create a temp table to hold the Data for each Item
Create Table #tmpArSalesItemized
(
itemId pItemId,
productLine VARCHAR (12),
pLineDesc VARCHAR (35),
descr VARCHAR (35),
LYQtySold int,
LYTDQtySold int,
QtySold int,
LYTDsales money,
totalSales money,
LastInvDate datetime,
)
-- populate the temp table with all of the inventory items
insert into #tmpArSalesItemized
select ii.itemId, ii.productLine, ip.Descr, ii.Descr, 0,0,0,0,0, NULL
from tblInItem ii, tblInProductLine ip
where ip.productLine = ii.productLine
AND ii.itemId BETWEEN @itemIdFrom AND @itemIdThru
-- update table with this years quantities
update #tmpArSalesItemized
SET QtySold = (select SUM(QtyOrdSell) from tblArHistDetail hd
where TransId IN (select TransId from tblArHistHeader where custId = @custId)
AND orderDate IN (select OrderDate from tblArHistHeader where OrderDate BETWEEN @dateFrom AND @dateThru)
AND hd.partId BETWEEN @itemIdFrom AND @itemIdThru
GROUP BY hd.partId
)
-- Return the temp tables results
select * from #tmpArSalesItemized, #tmpArCustInfo
drop table #tmpArSalesItemized, #tmpArCustInfo
return
GO
SET QUOTED_IDENTIFIER OFF
GO
SET ANSI_NULLS ON
GO
My problems begin where i want to start updating all of the Qty's of the QtySold field. I have managed to get it to write the same sum in every field but i cannot figure out how to update each row based on the sum of the qty found for that item in the tblArHistDetails table, trouble is too that there is no reference to the custId in that table either. The custId resides in tblArHistHeader and is linked to the details table via the TransId column. So really i need to update many rows based on criteria from 2 other tables.
Can anyone please help? I dont have a clue how to make this work, and most of what i have learned about sql thus far has been from opening other stored procs etc in the accounting system and just reading to see how the developers have done things.
Thanks
Jamie
I have to write a couple scripts that will update a couple columns in two separate tables and also insert a new row with the same data except for a few calculated or provided values ...... see specs below ...
1. tGradeHist Table Script One (Needs to be run first)
a. Read tGradeHist Table and Select rows with GradeEndDate = NULL and GradeStartDate = '1/1/2007 12:00:00 A.M.'
b. Calculate New Step Amount = StepAmount * Incr% (Round To Nearest Whole Dollar)
c. Create New Row for this table using information from row read above and insert new information where indicated :
GradeCode - Same
GradeLocationCode - Same
Step - Same
GradeStartDate - '7/1/2007 12:00:00 A.M.'
GradeEndDate - NULL
GradeCurrencyCode - Same
StepAmount - Result of b (above)
GradeFrequencyCode - Same
RangeMaximumAmount - Same
RangeMidAmount - Same
RangeMinimumAmount - Same
GradeCurrentFlag - 'True'
MarketMaximumAmount - Same
MarketMidAmount - Same
MarketMinimumAmount - Same
GradeGUID - Same
TSCOL - Same
d. Update Row read in a (above) with GradeEndDate = '6/30/2007 12:00:00 A.M.' and GradeCurrentFlag = 'False'
2. tPersonBasePayHist Table Script Two (Needs to be run second)
a. Read tPersonBasePayHist Table and Select rows with PersonBasePayEndDate = NULL
b. Calculate New PersonBasePayAmount = PersonBasePayAmount * Incr% (Round To Nearest Whole Dollar)
c. Create New Row for this table using information from row read above and insert new information where indicated :
PersonGUID - Same
PersonBasePayStartDate - '7/1/2007 12:00:00 A.M.'
PersonBasePayEndDate - NULL
PersonBasePayCurrencyCode - Same
PersonBasePayAmount - Result of b (above)
PersonBasePayFrequency - Same
PersonBasePayPayrollFrequencyCode - Same
BasePayReasonCode - 'SA'
ConductedBasePayReviewDate - Same
ScheduledBasePayReviewDate - Same
PayrollCode - Same
PersonBasePayCurrentFlag - 'True'
ApprovedByPersonGUID - Same
PersonBasePayGUID - Same
TSCol - Same
d. Update Row read in a (above) with PersonBasePayEndDate = '6/30/2007 12:00:00 A.M.' and PersonBasePayCurrentFlag = 'False'
I need to generate a csv file from another csv file, seems to be simple but let's go the trick thing:
Needs to have maximum 1000 lines, if I reach to this, I need to create another csv and fill that new one.
Exemplifying:
I have a csv file called fileA and this has 2000 lines and another csv called fileB with 1500 lines.
I need to loop a folder and get the fileA, create an output called FileAOutput and start to fill that, if I reach to 1000 lines, I need to create a FileAOutput_2 and fill the other 1000 lines...so I'll go to fileB and do the same thing, but in the second case, I'll have 500 lines in the second output.
I am using SSIS in SQL Server Enterprise 2005. I have two OLE DB data sources from two disparate databases (IBM DB2 and Microsoft SQL Server), some columns from each of which are to be included in the merged output results. I have noted the various requirements in the forum postings with regard to sorting the OLE DB sources and specifying the output source columns as being sorted, as well as the requirement that the join fields in the two sources be close/exact matches. Yet, when I run this in VS, while the work area reflects the expected number of rows being input into the Merge Join transformation, no count is reflected as output from that transformation into the final destination table.Specifically, my two data sources (IBM DB2 and MS SQL) are configured as follows:
IBM DB2 contains an SQL statement that uses Cast operations to create the result columns.and an ORDER BY clause to ensure that the output is sorted by the desired two columns.. The OLE DB source property setting for IsSorted is set to true; the Output Columns folder column definitions for "key_ source_dtsy" and "key_source_dtrt" have their SortKeyPosition properties set to 1 and 2, respectively. Those field are both defined as data type DT_STR, with lengths of 4 and 2, respectively. Below is the Path metadata from the Data Flow Path editor from the path from this source:
IBM DB2 source"Name" "Data Type" "Precision" "Scale" "Length" "Code Page" "Sort Key Position" "Comparison Flags" "Source
Component""ID_CODE" "DT_STR" "0" "0" "10" "1252" "0" "" "Source F0005 User Defined Codes""CODE_DESCR_1" "DT_STR" "0" "0" "30" "1252" "0" "" "Source F0005 User Defined Codes""CODE_DESCR_2" "DT_STR" "0" "0" "30" "1252" "0" "" "Source F0005 User Defined Codes""key_source_dtsy" "DT_STR" "0" "0" "4" "1252" "1" "" "Source F0005 User Defined Codes""key_source_dtrt" "DT_STR" "0" "0" "2" "1252" "2" "" "Source F0005
User Defined Codes:
MS SQL contains an SQL statement that takes the columns as they are in the MS SQL table (no Cast operations needed); it also uses an ORDER BY clause to ensure the output is sorted by the join columns. The OLE DB source property setting for IsSorted is set to true; the Output Columns folder columns for "key_source_dtsy" and "key_source_dtrt" have their SortKeyPosition properties set to 1 and 2, respectively. Those field are both defined as data type DT_STR, with lengths of 4 and 2, respectively. Below is the Path metadata from the Data Flow Path editor from the path from this source:
MS SQL source"Name" "Data Type" "Precision" "Scale" "Length" "Code Page" "Sort Key Position" "Comparison Flags" "Source Component""id_code_name" "DT_I2" "0" "0" "0" "0" "0" "" "Source CodeName in db dwVdFY""key_source_dtsy" "DT_STR" "0" "0" "4" "1252" "1" "" "Source CodeName in db dwVdFY""key_source_dtrt" "DT_STR" "0" "0" "2" "1252" "2" "" "Source CodeName in db dwVdFY"
The Merge Join transformation specifies an INNER JOIN using the columns named "key_source_dtsy" and "key_source_dtrt" from the respective data sources.I know there are alternative ways of accomplishing my intent (Lookup, port MS SQL table to IBM DB2 so join can occur in SELECT statement, etc.; however, I'd like to use this functionality and assume that it should work.
I have a ssis package which identifies duplicate records in access database. I have staged access database into sql sever and created ssis package. Now, I have final list of records which needs to be delete from access database and new records which are to be inserted into access database.
What do I need to do if I want to delete those duplicate records directly from access database using SSIS. I cannot truncate whole access database and reload. I just have to delete duplicate rows from access db and add new records.
This question is extension from the topic Updating table Rows with overlapping dates: [URL] .....
I am actually having a table as following:
Table Name: PromotionList
Note that the NULL in endDate means no end date or infinite end date.
ID PromotionID StartDate EndDate Flag
1 1 2015-04-05 2015-05-28 0
2 1 2015-04-23 NULL 0
3 2 2015-03-03 2015-05-04 0
4 1 2015-04-23 2015-05-29 0
5 1 2015-01-01 2015-02-02 0
And I would like to produce the following outcome to the same table (using update statement): As what you all observe, it merge all overlapping dates based on same promotion ID by taking the minimum start date and maximum end date. Only the first row of overlapping date is updated to the desired value and the flag value change to 1. For other overlapping value, it will be set to NULL and the flag becomes 2.
Flag = 1, success merge row. Flag = 2, fail row
ID PromotionID StartDate EndDate Flag
1 1 2015-04-05 NULL 1
2 1 NULL NULL 2
3 2 2015-03-03 2015-05-04 1
4 1 NULL NULL 2
5 1 2015-01-01 2015-02-02 1
The second part that I would like to acheive is based on the first table as well. However, this time I would like to merge the date which results in the minimum start date and End Date of the last overlapping rows. Since the End date of the last overlapping rows of promotion ID 1 is row with ID 4 with End Date 2015-05-29, the table will result as follow after update.
ID PromotionID StartDate EndDate Flag
1 1 2015-04-05 2015-05-29 1
2 1 NULL NULL 2
3 2 2015-03-03 2015-05-04 1
4 1 NULL NULL 2
5 1 2015-01-01 2015-02-02 1
Note that above is just sample Data. Actual data might contain thousands of records and hopefully it can be done in single update statement.
Extending from the above question, now two extra columns has been added to the table, which is ShouldDelete and PromotionCategoryID respectively.
Original table:
ID PromotionID StartDate EndDate Flag ShouldDelete PromotionCategoryID
1 1 2015-04-05 2015-05-28 0 Y 1
2 1 2015-04-23 2015-05-29 0 NULL NULL
3 2 2015-03-03 2015-05-04 0 N NULL
4 1 2015-04-23 NULL 0 Y 1
5 1 2015-01-01 2015-02-02 0 NULL NULL
Should Delete can take Y, N and NULL
PromotionCategoryID can take any integer and NULL
Now it should be partition according with promotionid, shoulddelete and promotioncategoryID (these 3 should be same).
By taking the min date and max date of the same group, the following table should be achieve after the table is updated.
Final outcome:
ID PromotionID StartDate EndDate Flag ShouldDelete PromotionCategoryID
1 1 2015-04-05 NULL 1 Y 1
2 1 2015-04-23 2015-05-29 1 NULL NULL
3 2 2015-03-03 2015-05-04 1 N NULL
4 1 NULL NULL 2 Y 1
5 1 2015-01-01 2015-02-02 1 NULL NULL