I have a simple data control task that has an OLE source and OLE target.
the source is a SQL query that returns 200m records this is then written straight out to a table. I have used a data flow task so that I can chunk up the inserts rather than using a INSERT INTO.....SELECT FROM.
I have ran it multiple times and it hangs once it reaches 18,961,020 records.
there is no locking on the database and I have even restarted the SQL instance to ensure that there was nothing else contending for resource.
I have changed gthe buffer size and rows per buffer to 100m and 100,000. Now it hangs before the 18.9m mark presumably becuase of the increased buffer size.
I notice that the SELECT statement continues to clock up io and cpu cycles but the BULK insert process has gone.
I have a SSIS (CTP June 2005) package with several data flow tasks. One data flow has 2 OLEDB data sources which are then unioned together, followed by a conditional split, then a derived column transformation, which feeds into an OLEDB destination. When I run the package, this particular data flow seems to run fine and then just stops. There are no errors and the records counts don't move. I've used data viewers to look at the data and it seems fine. I've switched the OLEDB destination with a flat file and execution runs fine, then switched back to OLEDB and it hangs again (over and over). There's nothing unusual about the OLEDB destination, and all the other OLEDB destinations work fine. It also seems to hang on the same destination row number count, but again that row has been verified as valid. In fact, I dropped the DB table and recreated it with no constraints and all fields nullable, but the problem persists. Help?
We're experiencing a problem where intermittently our SSIS packages will hang. There are no log errors or events in the event viewer. It will happen whether the package is executed from the SQL Job Agent or run from BIDs. When running from BIDs it appears to hang inside one of the data flows (several parallel pipes with sorts, merge joins etc...). It appears to hang in multiple pipes within the data flow component. The problem is reproducable, we just kill it and re-run, and it appears to hang in the same places.
Now here's the odd thing: as we simply open and close some of the components in the pipe line after the place it hangs, a subsequent run will go further in the pipeline before hanging. If we open and close all the components after the point it initially hung, the data flow will run fine, from there on out. When I say "open and close" I mean no changes are made, we simply double-click the component, like a merge join, then click 'close.'
To me this does not seem like a memory problem but likely something is wrong with the metadata, where opening a component and closing it somehow alters the metadata to "right it".
This seems to occur intermittently after we make modifications to the package. It's like if you make any mod, even unrelated to the data flow, you then have to go through and open and close every component in your package to ensure it will work. Again, no errors or warnings are fired.
Hello, I am new to SSIS. I am trying to write a simple package to export data from some SQL 2005 tables and into a flat file. In my data flow, I am using the OLE-DB data source and then the flat file destination.
This all works fine except that I cant get the package to write the columns out in the order I want. Even when I drive the OLE-DB source by a query, they columns are getting written to the flat file in a different order than I want.
How is SSIS determining what order to write the columns in and, more importantly, how can I change it to do it in the order I want? Please help if you can. As mentioned I am new to SSIS so please give clear+simple answers.
In Excel they are showing Sale as -ve quantity and purchase as +ve quantity.
The database has quantity always as +ve figure and a separate column "isPurchase" set to true or false depending on whether purchase or sale.
So I need Derived Column to return a bolean (or int) depending whether quantity is positive or negative. I tried each of the following in the Expression but all of them were invalid expressions.
CASE WHEN [TotalQuantity] > 0 THEN 0 ELSE 1 END
If [TotalQuantity] > 0 THEN 1 ELSE 0 END
IIF ([TotalQuantity] > 0, 1,0)
Can anyone help me with correct syntax, or correct Data Flow Transformation if Derived column is wrong.
Has anyone done this? I can't find anything in the documentation that describes this. The closest I get is to the InnerObject property of the TaskHost class. There is an example of programming a bulk insert task. But I can't find anything on programmatically setting the column mappings (source to dest) of a simple data flow task. Any help is appreciated!
I've built a simple custom data flow transformation component following the Hands On Lab (http://www.microsoft.com/downloads/details.aspx?familyid=1C2A7DD2-3EC3-4641-9407-A5A337BEA7D3&displaylang=en) and the Books Online (ms-help://MS.MSDNQTR.v80.en/MS.MSDN.v80/MS.SQL.v2005.en/dtsref9/html/adc70cc5-f79c-4bb6-8387-f0f2cdfaad11.htm and ms-help://MS.MSDNQTR.v80.en/MS.MSDN.v80/MS.SQL.v2005.en/dtsref9/html/b694d21f-9919-402d-9192-666c6449b0b7.htm).
All it is supposed to do is create an output column and set its value to the result of calling a web service method (the transformation is synchronous). Everything seems fine, but when I run the data flow task that contains it, it doesn't generate any output. The Visual Studio debugger displays it as yellow, with 1,385 rows going into it, but the data viewer attached to its output is empty. The output metadata looks just like I expect: all of my input columns plus the new column, correctly typed. No validation or run-time warnings or errors are reported.
I'll include the entire C# file below, which only overrrides the ProvideComponentProperties, Validate, PreExecute, ProcessInput, and PostExecute methods of the parent PipelineComponent class.
Since this is effectively a specialization of the DerivedColumn transformation, could I inherit from the class that implements the DC component instead of PipelineComponent? How do I even find out what that class is?
Thanks! Here's the code: using System; // using System.Collections.Generic; // using System.Text;
using Microsoft.SqlServer.Dts.Pipeline; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper;
namespace CustomComponents { [DtsPipelineComponent(DisplayName = "GID", ComponentType = ComponentType.Transform)] public class GidComponent : PipelineComponent { /// /// Column indexes for faster processing. /// private int[] inputColumnBufferIndex; private int outputColumnBufferIndex;
/// /// The GID web service. /// private GID.WS_PDF.PDFProcessService gidService = null;
/// /// Called to initialize/reset the component. /// public override void ProvideComponentProperties() { base.ProvideComponentProperties(); // Remove any existing metadata: base.RemoveAllInputsOutputsAndCustomProperties(); // Create the input and the output: IDTSInput90 input = this.ComponentMetaData.InputCollection.New(); input.Name = "Input"; IDTSOutput90 output = this.ComponentMetaData.OutputCollection.New(); output.Name = "Output"; // The output is synchronous with the input: output.SynchronousInputID = input.ID; // Create the GID output column (16-character Unicode string): IDTSOutputColumn90 outputColumn = output.OutputColumnCollection.New(); outputColumn.Name = "GID"; outputColumn.SetDataTypeProperties(Microsoft.SqlServer.Dts.Runtime.Wrapper.DataType.DT_WSTR, 16, 0, 0, 0); }
/// /// Only 1 input and 1 output with 1 column is supported. /// /// public override DTSValidationStatus Validate() { bool cancel = false; DTSValidationStatus status = base.Validate(); if (status == DTSValidationStatus.VS_ISVALID) { // The input and output are created above and should be exactly as specified // (unless someone manually edited the persisted XML): if (ComponentMetaData.InputCollection.Count != 1) { this.ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid metadata: component accepts 1 Input.", string.Empty, 0, out cancel); status = DTSValidationStatus.VS_ISCORRUPT; } else if (ComponentMetaData.OutputCollection.Count != 1) { this.ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid metadata: component provides 1 Output.", string.Empty, 0, out cancel); status = DTSValidationStatus.VS_ISCORRUPT; } else if (ComponentMetaData.OutputCollection[0].OutputColumnCollection.Count != 1) { this.ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid metadata: component Output must be 1 column.", string.Empty, 0, out cancel); status = DTSValidationStatus.VS_ISCORRUPT; } // And the output column should be a Unicode string: else if ((ComponentMetaData.OutputCollection[0].OutputColumnCollection[0].DataType != DataType.DT_WSTR) || (ComponentMetaData.OutputCollection[0].OutputColumnCollection[0].Length != 16)) { ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid metadata: component Output column data type must be (DT_WSTR, 16).", string.Empty, 0, out cancel); status = DTSValidationStatus.VS_ISBROKEN; } } return status; }
/// /// Called before executing, to cache the buffer column indexes. /// public override void PreExecute() { base.PreExecute(); // Get the index of each input column in the buffer: IDTSInput90 input = ComponentMetaData.InputCollection[0]; inputColumnBufferIndex = new int[input.InputColumnCollection.Count]; for (int col = 0; col < input.InputColumnCollection.Count; col++) { inputColumnBufferIndex[col] = BufferManager.FindColumnByLineageID(input.Buffer, input.InputColumnCollection[col].LineageID); } // Get the index of the output column in the buffer: IDTSOutput90 output = ComponentMetaData.OutputCollection[0]; outputColumnBufferIndex = BufferManager.FindColumnByLineageID(input.Buffer, output.OutputColumnCollection[0].LineageID); // Get the GID web service: gidService = new GID.WS_PDF.PDFProcessService(); }
/// /// Called to process the buffer: /// Get a new GID and save it in the output column. /// /// /// public override void ProcessInput(int inputID, PipelineBuffer buffer) { if (! buffer.EndOfRowset) { try { while (buffer.NextRow()) { // Set the output column value to a new GID: buffer.SetString(outputColumnBufferIndex, gidService.getGID()); } } catch (System.Exception ex) { bool cancel = false; ComponentMetaData.FireError(0, ComponentMetaData.Name, ex.Message, string.Empty, 0, out cancel); throw new Exception("Could not process input buffer."); } } }
/// /// Called after executing, to clean up. /// public override void PostExecute() { base.PostExecute(); // Resign from the GID service: gidService = null; } } }
I have a master securities table which has 7 fields. As a part of the daily process I am uploading flat files into database tables. The flat files contains the master(static) security data as well as the analytics(transaction) data. I need to
1) separate the master (static) data from the flat files,
2) check whether that data is present in the master table, if not then insert that data into the master table
3) If data present then move that existing record to an history table and then update the main master table.
All the 7 fields need to be checked to uniquely identify a single record in the master table.
How can this be done? Whether we can us a combination of data flow items or write a sql procedure to do all this.
I need to pass a parameter from control flow to data flow. The data flow will use this parameter to get data from a Oracle source.
I have an Execute SQL task in control flow to assign value to the Parameter, next step is a data flow which will need take a parameter in the SQL statement to query the Oracle source,
The SQL Looks like this:
select * from ccst_acctsys_account
where to_char(LAST_MODIFIED_DATE, 'YYYYMMDD') >?
THe problem is the OLE DB source Edit doesn€™t have anything for mapping parameter.
I have a For Each Loop container, and inside the container, I have a SQL Execute task, which runs first, and then I need to kick off 5 Data Flow Tasks. Do I need to connect the 5 DFTs to each other using the Green(Pipelines). How would you usually do this? Thx.
I have an Execute SQL Task that returns a Full Rowset from a SQL Server table and assigns it to a variable objRecs. I connect that to a foreach container with an ADO enumerator using objRecs variable and Rows in first table mode. I defined variables and mapped them to the columns.
I tested this by placing a Script task inside the foreach container and displaying the variables in a messagebox.
Now, for each row, I want to write a record to an MS Access table and then update a column back in the original SQL Server table where I retreived data in the Execute SQL task (i have the primary key). If I drop a Data Flow Task inside my foreach container, how do I pass the variables as input to an OLE DB Destination on the Data Flow?
Also, how would I update the original source table where source.id = objRects.id?
Thank you for your assistance. I have spent the day trying to figure this out (and thought it would be simple), but I am just not getting SSIS. Sorry if this has been covered.
Dear All! My package has a Data Flow Task. In Data Flow Task, I use a Script Component and a OLE BD Destination to transform data from txt file to database. Within Data Flow Task, I want to call File System Task to move file to a folder or any Task of "Control Flow" Tab. So, Does SSIS support this task? Please show me if any Thanks
I'm currently setting variables at the package level with an ExecuteSQL task. This works fine. However, I'm now starting to think about restartability midway through a package. It would be nice to have the variable(s) needed in a data flow set within the data flow so that I only have to restart that task.
Is there a way to do that using an SQL statement as the source of the value in a data flow?
OR, when using checkpoints will it save variable settings so that they are available when the package is restarted? This would make my issue a moot point.
Hi all! I recently started working with SSIS and one of the things that is puzzling me the most is what's the best way to go:
A small control flow, with large data flow tasks A control flow with more, but smaller, data flow tasksAny help will be greatly appreciated. Thanks, Ricardo
Why isn't there some documentation on how to do this. This should be really simple and it has taken me 2 weeks and I still haven't gotten an answer. Please Help Does anyone know the answner or some place where there is some documentation!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
I get the following error when I try to substitute the strings in the databasedetails collection with variables: Error: Object reference not set to an instance of an object. StackTrace: at Microsoft.SqlServer.Dts.Tasks.TransferObjectsTask.TransferObjectsTask.CheckLocalandDestinationStatus(Database srcDatabase, DatabaseInfo dbDetail) at Microsoft.SqlServer.Dts.Tasks.TransferObjectsTask.TransferObjectsTask.TransferDatabasesUsingSpAttachDetach()
I created the following variables: strDestinationDB = AirCL2Exp_new3 strDestinationDBPath = C:Program FilesMicrosoft SQL ServerMSSQL.2MSSQLDATAAirCL2Exp_new3_Data.mdf strDestinationLGPath = C:Program FilesMicrosoft SQL ServerMSSQL.2MSSQLDATAAirCL2Exp_new3_Data.ldf strSourceDB = AirCL2Exp strSourceDBPath = C:Program FilesMicrosoft SQL ServerMSSQL.2MSSQLDataDataNewAirCL2Exp_Data.mdf strSourceLGPath = C:Program FilesMicrosoft SQL ServerMSSQL.2MSSQLDataDataNewAirCL2Exp_Log.ldf
I then assigned those variable to DatabaseDetails Collection:
Inaddtion I also assigned the following to the two DatabaseFiles Collection: for 0: DatabaseFileSize = 0 DestinationFilePath = @strDestinationDBPath FileType = DatabaseFile SourceFilePath = @strSourceDBPath SourceSharePath = @strSourceDBPath
Hi, I'm trying to implement an incremental data pull (Oracle to SQL) based on Andy's blog: http://sqlblog.com/blogs/andy_leonard/archive/2007/07/09/ssis-design-pattern-incremental-loads.aspx
My development machine is decent: 1.86 GHz, Intel core 2 CPU, 3 GB of RAM. However it seems the data flow task gets hung whenever I test the package against the ~6 million row source, as can be seen from these screenshots. I have no memory limitations on the lookup transformation. After the rows have been cached nothing happens. Memory for the dtsdebug process hovers around 1.8 GB and it uses 1-6 percent of CPU resources continuously. I am not using fast load to insert new records into my sql target table. (I am right clicking Sequence Container 3 and executing this container NOT the entire package in the screenshots)
The same package works fine against a similar test table with 150k rows. http://i248.photobucket.com/albums/gg168/boston_sql92/7.jpg http://i248.photobucket.com/albums/gg168/boston_sql92/8.jpg
The weird thing is it only takes 24 minutes for a full refresh of the entire source table from Oracle to the SQL target table. Any hints,advice would be appreciated.
I am wondering if it is possible to use SSIS to sample data set to training set and test set directly to my data mining models without saving them somewhere as occupying too much space? Really need guidance for that.
I am working on importing an Excel workbook, saved as multiple CSV flat files, that has both group level data and related detail row on the same sheet. I have been able to import the group data into a table. As part of the Data Flow task, I want to be able to save the key value for the group, which I will use when I insert the detail rows.
My Data Flow has the following components: The flat file with the data, which goes to a derived column transformation to strip out extraneous dashes, which leads to the OLEDB Destination component.
I want to save the value as a package level variable, so that I can reference it in another dataflow.
Is this possible, and if so, at what point do I save the value?
Hi! I am developing a CF 2.0 application for WM 6.0. In the application I'm doing a replication between a Sql Server 2005 database and a Sql Server 2005 Compact Edition database. When I'm trying to syncronize the databases for the first time, e.g., creating a new database with AddSubscription(AddOption.CreateDatabase)I cannot do a save afterwards the synchronization procedure. The synchronization works just fine and I get the right data to my device and so, but when I try to a save, the database hangs doing the commit().
If I'm on the otherhand restarts the application and then do a ReinitializeSubscription(true), e.g., doesn't not create a new database with AddSubscription(AddOption.CreateDatabase), and calls Synchronize(), everything works just fine. Anyone who has an explanation of this? (I do a Dispose() each time).
Hello, I have not been able to locate information on the following problem. The first step I have in a packge (Execute SQL Command) is to delete the data from an MS Access database table. The package hangs at this step after all validation is complete. In the package, once the table data is deleted, it is repopulated in a later step. The deletion step and the repopulation step use the same connection manager.
There is no information in the log about an error. At the time the package ran, there was a lock file on the database with about six users connected. I'm not sure what version of Access the database was created in, but I have 2003 on my machine, and I cannot open the database.
I have several DTS packages that connect to various Oracle databases. An upgrade has recently been done to one of the databases from 7.3 to 8i. The other databases were always 8i. Last week, I could edit data transer tasks normally, this week, DTS hangs and I have to use task manager to kill the process. It worked fine last week. I can successfully run the packages, I just can't edit them. I have no trouble editing or running packages that connect to databases other than the one recently upgraded. I have tried both OLE DB and ODBC connections with the same results. Does anyone have any ideas on how to fix this?
I have to extract, dayly a list of contacts on a exchange server in a table on our EDW on sql server 2005. Is it possible to get the information directly from a dataflow or i will have to developpe a script task ?
I want to export data from SQL Server2005 to an Excel spreadsheet thru "Data Flow Task". I am using OLE DB for SQL Server for the source connection and a Connection To Excel as my destination source. The Excel spreadsheet (2003) exists and has the first row with column names. I don't have any warnings before trying to execute.
While executing the tasks, I got the error Error: 0xC0202025 at Data Flow Task, Excel Destination [427]: Cannot create an OLE DB accessor. Verify that the column metadata is valid. Error: 0xC004701A at Data Flow Task, DTS.Pipeline: component "Excel Destination" (427) failed the pre-execute phase and returned error code 0xC0202025.
After analysing I found in the DataFlow --> Excel destination --> Advanced Editor for Excel Destination, the default data type for txtRemarks shows as "Unicode string [DT_WSTR]". But this is supposed to be "Unicode text stream [DT_NTEXT]". Even if I change the data type in the design time, It doesn't accept.
I need to see inside a SSIS 2012 project a new SSIS installed component, but in the SSDT 2010 I cannot see the SSIS Data Flow Items tab for adding data source/data destination respect to the choose toolbox items pane.
Hi, all experts here, Do we always have to use SCD component for the loading of data into data warehouse to handle changes of rows? I am looking forward to hearing from you and thank you very much in advance for your help. With best regards,
I need to call a stored procedure to insert data into a table in SQL Server from SSIS data flow task. I am currently trying to use OLe Db Destination, but I am not sure how to map inputs to OLE DB Destination to my stored procedure insert. Thanks
I am getting the following error running a data flow that splits the input data into multiple streams and writes the results of each stream to the same destination table:
"This operation conflicts with another pending operation on this transaction. The operation failed."
The flow starts with a single source table with one row per student and multiple scores for that student. It does a few lookups and then splits the stream (using Multicast) in several layers, ultimately generating 25 destinations (one for each score to be recorded), all going to the same table (like a fact table). This all is running under a transaction at the package level, which is distributed to a separate machine.
Apparently, I cannot have all of these streams inserting data into the same table at one time. I don't understand why not. In an OLTP system, many transactions are inserting records into the same table at once. Why can't I do that within the same transaction?
I suppose I can use a UnionAll to join them back together before writing to a single destination, but that seems like an unnecessary waste and clutters the flow. Can anyone offer a different solution or a reason why this fails in the first place?
I need to know what a table's max row Identity is part way thru a data flow. I can't get it at the beginning of the data flow. I need to either (1) add it to the data buffer part way thru or (2) set it into a package variable and then reference the var in a script component.
I've not found a way to add a database column to the data buffer without doing a lookup for each row (too slow and not appropriate here) or some goofy oledb source and then merge join into the data buffer on a contrived join.
I've read questions about referencing package vars in scripts but I can't get that to work. DTS.Variables("varname").Value isn't recognised when I code it up.
Anyone have an idea or solution for either one of these? If you're gonna explain the script code, please include the entire snipet including the INCLUDEs, etc.
We have an install of SQL MDS 2016 CTP3 which I can access via the web browser and create models etc. Â I have uninstalled the Excel add-in for 2012 and installed the MDS 2016 CTP3 add-in on Excel 2010. Â I am able to create a MDS connection and the "Test" reports success in Excel.
However, when I try to connect to a model via the Master Data Explorer to load an entity, Excel hangs with the message "Loading objects for xxxx_Model". Â My only option is to close Excel via the Task Manager. Â
Where do I look for more detailed logs? Â Is there a switch I can use to provide a better debugging experience?
I am new to SSIS programming, so bear with me if my question seems naive to you gurus. I have a situation that needs to set the data source for a data flow from external .NET application ('external' means that the application will run on different process than the SSIS). I am trying to set the data source on which the data flow works from my C# application in a DataSet format. Ideal solution is not to save the DataSet to any file on harddisk (I know that will work, but has the overhead of writing, reading and managing the temp file). What I want to achive is that the business logic of picking data for SSIS Data Flow to process is controlled inside my C# application, the Data Flow just does what it does best - Transformation. Have any of you successfully done this before?. Thanks!