Using SSIS With SqlBulkCopy For ETL: How To Delete Processed Records From Staging Table
Jul 27, 2007
I'm looking at SSIS and SqlBulkCopy as a possible method to quickly insert and process large amounts of data, my current method uses the sp_xml_preparedocument and OPENXML to parse an xml document of the data I want to process and insert into the database, however I'm noticing the performance of SqlServer parsing the xml document isn't that good.
I found the C# SqlBulkCopy method (new in .NET 2.0) and I was thinking it would be faster to use that to load my data into a staging table and then use SSIS to extract the data from the staging table, process and transform it as necessary and finally load it into the final destination tables. I was able to create an Integration Services Project that selects the data from the staging table, does a bit of processing on one of the fields (by calling a stored procedure), and finally loads the processed data into the final table.
The problem with this is I need to clean out the rows that were extracted from the staging table and I'm not sure how I can accomplish this (and I can't just issue a "delete from staging_table" because there maybe new records in the staging table that were not processed), perhaps I can either delete each record as it is proccessed or somehow get the last proccessed identity id from the staging table and delete all records less than or equal to that id.
thanks in advance for any help you can provide, maybe there is an easy way to accomplish what I'm trying to do.
What's the best way to write key values of records processed in my SSIS 2012 package to the log provider chosen?My SSIS package deactivates widgets as well as thingies. It was just released into production this week, runs daily, and we'd like to keep a close eye on what it's doing for a couple of weeks, by that I mean on any day be able to quickly see which thingies and widgets were deactivated that morning. It typically deactivates less than 5 widgets and thingies per day.
We could dig through the database to see which were deactivated, but that only works if somebody hasn't manually reactivated it since it was deactivated. We need a log. This is a temporary watch we're doing, so we don't want to write to a table or make make any significant package changes, such as adding new tasks.It seems like writing the 5-or-so deactivated thingy and widget key values to the log is the best way to watch this package. What's the most efficient way to do this? I'm hoping to avoid a new loop and script component with "Dts.Log" calls, but I don't know any other way.
I have a large table with 100 Million records that has around 1 million duplicate records that need to be deleted.
I am running a script that creates a staging table called,DuplicateTable that collects all the duplicates and then I want to write a an effecient delete statement.
Is it possible to write something like:
delete from OrigTable O join DuplicateTable D on O.Key = D.key
Or do I have to run a loop on the DuplicateTable and run a delete statement record by record ?
I have a flat file with columns from a geographical hierarchy such as:
Country Zone State County City Store Sub Store , etc.
The file also has data columns for months to the right of the above columns such as:
Jul Aug Sept ......... basically 25 of these columns for two years' data for one product and another set of 25 columns for another kind of product. A typical record in the file looks like:
Country Zone State County City Store Substore
USA Southeast FL Hillsborough Tampa walmart Fletcher
I need to upload this data into a staging table in SQL Server 2005 using SSIS, I created a table with the geographical hierarchy columns but am trying to figure out a way to load the monthly data. I can create 50 columns for the 50 months ( 25 months for each product) but that would be very crude.
Is there a better way of inserting data from this flat file into a destination table? I need all the data in the staging table in one upload.
Hello, Maybe anyone have done that before? I have table where i store SOURCE_TABLE_NAME and DESTINATION_TABLE_NAME, there is about 120+ tables. i need make SSIS package which selects SOURCE_TABLE_NAME from source ole db, and loads it to DESTINATION_TABLE_NAME in destination ole db.
I made such SSIS package. set ole db source data access mode to table or view name variable. set ole db destination data access mode to table or view name variable. set to variables defoult values (names of existing tables) but when i loop table names is changed, it reports error, that can map columns, becouse in new tables is different columns.
Hi, Is this anyway to finding updated/ deleted recored using anyother data flow transformation tasks without using sql task. Can find the new records using merge join task.
Is there better way to merge master table using staging table?
I have been working on this without any success. Anybody out there to help?
1. There are new files uploaded in the FTP site everyday by other regions. 3 regions and 1 file each. That means 3 files at 3 different times. The file name will be that same day's date.csv. Example; today's file will be A_20071106.csv , B_20071106.csv and C_20071106.csv. Tomorrow's will be A_20071107.csv,B_20071107.csv and C_20071107.csv. How do I do this to run everyday and take the file only for that day from FTP straight into SQL Staging table?
2. Everyday, immediately after each file is uploaded, to indicate the FTP file is loaded successfully, there will be a 20071106.dummy and so on for all three files will show up on the same folder in FTP. I must make this package runable only if the .done file arrives for that days date and then execute the .csv file. Otherwise check after 30 minutes if the .dummy is there yet or not. Do this until the .dummy file comes. Then execute the package that is on that time. Then do the same for the other two for that particular day. The times are 4pm for A, 6pm for B and 8.30pm for C
3. Then, if there is a new row, it should INSERT, any changes (based on 5 columns), it should update and if there was a row yesterday and it is not there today, DELETE.
I'm trying to use Excel in SSIS to import the data from spreadsheet to a staging table. The package runs well from the web server using SSMS. But when I deploy and try to execute the package, I'm getting the below error. I've a question, whether I've to install the AccessDatabaseEngine driver in SQL database server or the web server where I'm executing the SSIS?
Error: The requested OLE DB provider Microsoft.Jet.OLEDB.4.0 is not registered. If the 64-bit driver is not installed, run the package in 32-bit mode.
SqlBulkCopy does not seem to have much flexibility. If your table has columns that dont allow duplicates, you are apparently screwed. Its a shame there is no switch or param setting you can issue the SqlBulkCopy class (if there is, please let me know!!)Example:Say I have a table "Cars" with fields "CarId" (identity) and "CarName" (varchar) and the CarName field has a unique constraint. Now, I have a DataTable that contains a bunch of CarNames to insert. If there are duplicates on CarName, the entire insert fails. This is nothing to do with the PK or identity field. The problem I have: I would much rather have an option to ignore or silently not insert that duplicate row, but continue to insert the rest of my data. Any known work arounds for this would be much appreciated. Maybe I am missing something?
I need to delete records from a table (Table1) which has a foreign key column in a related table (Table2).
Table1 columns are: table1Id; Name. Table2 columns include Table2.table1Id which is the foreign key to Table1.
What is the syntax to delete records from Table1 using Table1.Name='some name' and remove any records in Table2 that have Table2.table1Id equal to Table1.table1Id?
:confused: Urgent!! Hi there. I use MS SQL server. I would like to separate the data from one table to two tables refer to two reference tables and the following conditions:
Let say these two reference tables are called: Table A & Table B
Group A: 1. Same date in Table A & Table B 2. Same ID in Table A & Table B (ID is not unique) 3. Same name in Table A & Table B (Name is not unique)
Combine all of these three conditions for unique identifier.
I used the following SQL code to separate the required data that match the above conditions to the new table. (Code) select a.Project, a.Site, a.S_number, a.Field_ID, a.Method, a.Analyte, a.Result, a.Units, a.Qualifier, a.Dilution_Factor, a.Reporting_limit, a.Recovery_, a.Matrix, a.CAS_Number, a.Sample_Date, a.Received_Date, a.Prep_Date, a.Analysis_Date, a.Batch_ID, a.Data_Package_num_SDG, a.Lab_Sample_ID, a.Lab into APPL_union_exist from APPL_union_update a, Before_01012004_report b where a.Field_ID = b.[Field Sample] and a.Sample_Date = b.Collected and a.Analyte = b.Analyte
However, I don't know how to delete the data that copied to the new table in original table, or separate that to the new table. Wish someone can help me. Thanks a lot
I uploaded some data about 2 or 3 times and it keep appending it to thetable.Now I want to keep only first duplicate and delete rest of.Suppose part number 123 has been added 3 times so I want to keep only 1record.Thanks
Hello Frnds....Can anybody give the answer of this question as How to Delete duplicate records from Table ? I Know that with check option and also with Unique Constraint we can avoid to enter duplicate records in table but How to delete from table which does not have any constraints ?
Hello, I'm developing application which monitors network packets. The monitoring data are saved into table. Monitoring table maintains the data for fixed quantum time,for example during one 1 hour. So, every minute before or after insert new data, I delete the time-expired data. I doubt that the endless delete operation would results in some problems(increasing index,etc..).
Short Question: How do I delete all records from a destination table prior to appending new data to that table?
I am working with a SQL database that was migrated from MS Access. All relationships, primary keys, and identity columns have been set identically to the MS Access database values. The MS Access database is still being used as the database of record until the SQL database is fully functional with front-end, etc.
I want to delete the information stored in all the SQL tables, and then append the MS Access values to the SQL tables. I was able to write delete and append queries in MS Access to correctly transfer data to the SQL tables. However, I would prefer doing this through SSIS because I have several other sources of data to move to a SQL Server database and most of those other sources are not a MS Access database.
Due to relational entegrity settings, I need to delete the records from 8 tables in a specific order. I have tried independent control objects for each of the 8 tables with data flow objects of either "OLE DB Command" or "OLE DB Source" with the SQL command as "Delete From TableName". Results of the debug indicate everything is "green" but no records were deleted fromt the tables.
I have a scenerio where i need to run the scheduled package for every 10 minutes. Let me give some examples. Say once my ETL process is done, my SS_Batch table gets inserted once. Here i have two columns like SS_Batch_cd and SS_Create_TS respectively. Once the ETL process runs successfully this SS_Batch_CD column will have values as 'C' which means 'Completed'. Similarly when ETL process fails this SS_Batch_CD column will have values as 'F' which means 'Failed'. And Similary when ETL process is in progress this SS_Batch_CD column will have values as 'P' which means it is in 'Progress'. In SS_Create_TS column, it will have date like 2008-04-15'.
Note : This SS_Batch table will get inserted only once when ETL job is over (whether the job is runned sucessfully or not or in Progress) along with the date.
Actually i need to run the package for every 10 minutes because i dont know when the last cube was processed. If i get this last processed cube date then i can check this processed cube date with SS_Create_TS column and SS_Batch_cd in SS_Batch table. Say if Last processed cube date is greater than SS_Create_TS then i can refresh or process the cube. This validation i need to do to achieve my goal.
What are all the control flows should we need to have in SSIS to achieve this scenerio? Please give me briefly to solve this problem.
Hello, I have been serching for weeks to resolve this problem. I am new to ASP.NET and trying to make the migration from ASP which I have programmed in for years. I am using Microsoft Visual Web Developer 2005 Express Edition and SQL Express Edtion. I have been working through the Microsoft Video Training at http://msdn.microsoft.com/vstudio/express/beginner/learningpath/ and created a web site using Tier 3 Lesson 8 as the model. My new web site which is a simple phone book applicaiton lets me read the table and select the record without any problem. But the update form lets me edit but when I attempt to Apply the update I get the following error. Server Error in '/Phonebook' Application. ObjectDataSource 'ObjectDataSource1' could not find a non-generic method 'Update' that has parameters: FirstName, LastName, PhoneNumber, BossGroup, Department, BossPickup, ShowInPhonebook, Type, Original_FirstName, Original_LastName, Original_PhoneNumber, Original_BossGroup, Original_Department, Original_BossPickup, Original_ShowInPhoneBook, Original_Type, Original_ItemID. Description: An unhundled expception occured during the execution of the current web request. Please review the stack trace for more information about the error and where it originiated in the code. The Stack Trace basiclly showes the same error as above. Also, when I attempt to delete the record I do not get an error but the record does not delete. What is interesting is that I can add a record so I do not believe that it is a security permissions issue. I have the ISS Authinication Method Enable Anonymous Access set on with full control. If anyone has any insight as to why this is occuring please let me know.
I have a scheduled job which does an text file import in my database . The data gets appended in my table every day from this import job .
Since my table is growing every day , i want to truncate the table after the data has been collected for three months i.e 90 days . The table will be empty and the new data will flow in through the import .
Any thoughts how to do it through query and schedule it ???
I have a task to create a dts package, which will delete records in a SQL table, which are present on a sheet tab in Excel. I know how to transfer records from Excel to SQL table and vice versa, but not sure how I can delete the records in SQL table, which are present in Excel. Note that there are just 2 columns - one is a key and the other name.
DELETE FROM [SCS_NAV2009R2_PROD].[dbo].[Payroll Ledger Entry]
gives this error
Msg 9002, Level 17, State 4, Line 1
The transaction log for database 'SCS_NAV2009R2_PROD' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases
~ 7,000,000 records
now i have set the log to autogrow and set the max size to 2tb
We have a large test database with million of records for more than company site Code. Sometime we want to refresh the data of that database for one or more site Codes.
In order to do that I have to delete all records of the site code we want to refresh on the test database first then copy a new set of data from production database over. Since we refresh data based on the site code therefore I have to use the Delete command instead of Truncate.
Since this is a huge database with thousand of tables and million of records per table I have a performance issues with delete command. So what would be the best to delete a large number of records without writing any information to database log file?
FYI: The Recovery model of this database is Simple
Hi All, I have some clarifications on stopping my package once cube is refreshed or processed.
Below i have given steps for the transformation in my package
Let me give you what are all the dataflow transformations that i had given in my package.
1. Data Flow Task
2. Script Task 1- I have written code for getting the last processed cube (global variable has been declared for Last processed cube date - lastProcessedCube)
3. Script Task 2 - I have written code for SS_Batch table where i can get Create_Ts date that is assigned to another global variable - create_ts.
4. Analysis Processing Task.
In between Script Task 2 and Analysis Processing Task i have given @lastProcessedCube > @create_ts for Expression and Constraint under Precedence Constraint Editor
Actually i need to run package for every 10 minutes which i can do it in Job Schedule and need to refresh or process the cube daily. Is there any way to stop the package once when my cube is processed on that day. Again start the package for the next day.... Is it possible to do this? Please let me know.
Is there a way that I can reset the key field to 1 when using DELETE to clear a table? (Note: if there is a separate command that I could use after the DELETE, that is fine too.) Thank you for your help with this, -DJ
I am using a SQL Server Agent jobs that run each morning to update the records in a table to match what they should be for that day. I built them and tested it using a test table called "testtable1". It worked fine. But when I switched over to our production table, it fails saying the table has to be decaled. What would be the difference. The production table has a "@" in front of the name, is that causing issues?
USE [Live_build] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO BEGIN DELETE FROM @ZIPLIST INSERT INTO @ZIPLIST SELECT * FROM tblZip3DSWed; END