Dates In Slow Changing Dimension Dataflow Transformations Component
Mar 31, 2008
I€™m trying to populate a table with fields of date type [DT_DATE] using the Slow Changing Dimension Transformation component. When I add the date fields to the component it would not build the stream. The wizard fails and tells me the date fields are not of the same type. The fields in the destination table are of type €œdate€? and the input columns are of type [DT_DATE]. Am I missing something?
Has anyone written or come across a routine that validates start and enddate in a slow changing dimension? (eg verifies there is no overlap in any of the records).
I have a dimension table with effective dating; I'm loading historical transactional data and want to associate the correct surrogate key from the dimension with the fact table transaction. My dimension table has a start date but no end date, the end date is assumed by the start date of the next record with the same id. So I want the surrogate key which is the most current but whose date is not before the transaction date. I know I need a subquery but not too sure how to write it.
Is there a way to change the data source for a SCD Component without having to go back to reinsert the matches for Source and Destination columns. Note the underlying data table hasn't changed, just the server the table resides on. Whenever I change the data source I am noticing that I have to painfully go back and match columns one by one.
I am using the Ralph Kimball approach of having DIM_Effective_date, DIM_Expirey_date and Current_Flag.
When I use the SCD transformation it works very well for populating the Effective and Expiry dates but how do I make it insert an 'O' for the Current Flag (for expired records) and a 'C' for non expired dimension records?
I would like to know whether it is possible to add and updated date column in a slow changing dimension table using the slow changing dimension data flow transformation.
I would like to keep track of what record is updated in the dimension table based on the data being processed.
Hi all, In my DataFlow i set the "OLEDB Source" which is a table in my Extract Server and need to do some transformations and stage the table which will be a Dimension in the staging DB,
Q1-Now i need only 3 columns from the Source table, which transformation do i need to use to just extract the the 3 columns?
Q2- Two Columns of 3,which i will need to transform as it is-no changes at all and One of the column which has values like "BOSTON...." (I have a vague idea of what i need to do,need something solid suggestions/advices to kickoff,plan is to use this city column with a Replace function (as one of the forum member's Spirit1 adviced..thanks..!!))to take out the dots and then need to write a condition if BOSTON then Assign Code "BOS" which will be City_Code and this "City_Code" will have to be looked in City_Dimension to get the "City_Key_Number" for "Boston" and lastly the City_Code and City Key Number both have to be transformed to the destination Dimension.
I'm trying to find if there is a combination of dataflow transformations that will produce the following result
SELECT
period,
project,
task,
employee = CASE
when empid in (SELECT DISTINCT empid FROM EmpTable) then empid
else 'Deleted Employee'
end
FROM ProjectTable
I know I can create a dataflow task with this query as a data source and then send it to a destination, but I was wondering if that is the best way to do it or if there was a better way to do this using the data transformations available in SSIS.
I am writing a custom dataflow transformation component and I need to get the name of the preceeding component.
I have been trying to find a way to get a reference to the Package object, MainPipe object or IDTSPath90 object (connecting to the IDTSInput90 of my component) from my component because I think from there I can get to the information I want.
I was transfering more that 100,000 records from flat file to sql table
It took about 1 hour.Is this the way it is?????i used oledb command.
As the data passes by i got to insert to several table.Like i insert some of incoming data to one table then get the key from that table and insert rest of the data with the key field from previous table to another table.
In this case i felt OLedb would be best as we can use query.
I cannot use oledb destination as it has only error output(to insert some of incoming data and i want to have a look up to get the key but oledb des has only error output)
i cannot use sql destination as the database is sql server 2000.It dosent let me.
How can i increase the performance????Please let me know
This is trivial I'm sure but I'll be dogged if I can find someone who mentions how to do it. I am attempting to develop a Data Flow Transformation that appends a new column (a string value) into the current stream.
I have found plenty of references on how to replace an existing column but I'd really like to just add my new column in there. It doesn't need to be configurable, it can be a static column name. I'll take a solution that allows the column name to be set at design time, don't get me wrong but the magic I'm looking for is how to implement a new column in a stream.
Yes, I am well aware of the derived column task but I will be replacing a few hundred instances and I'd much rather just drag an item onto the designer than to drag a derived column, double click it, type in the column name, set the expression and then set the datatype, etc.
Anyone spare a moment to enlighten me?
Pardon the lack of formatting, this BB doesn't play with Opera (I know, I'm a heretic)
using System; using System.Collections; using System.Runtime.InteropServices; using Microsoft.SqlServer.Dts.Pipeline; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper; using Microsoft.SqlServer.Dts.Runtime;
namespace Microsoft.Samples.SqlServer.Dts { [ DtsPipelineComponent ( DisplayName = "Nii", Description = "This is the component that says Nii.", ComponentType = ComponentType.Transform ) ] public class Nii : PipelineComponent {
public override void ProcessInput(int inputID, PipelineBuffer buffer) { if (!buffer.EndOfRowset) { while (buffer.NextRow()) { try { // do something here to } catch (Exception e) { ComponentMetaData.FireInformation(0, ComponentMetaData.Name, "There was an error on row " + buffer.CurrentRow.ToString() + ". The error is: " + e.Message + " : " + e.Source + " : " + e.StackTrace, "", 0, ref fireEventAgain); } } } } }
I can't find anything on how to get to a global variable in a script component in the dataflow. I can get to it in a script task with no problem by using dts.variables but i doesn't appear you can do the dts variables in the script component.
I did add it to the readwrite variable list but I haven't been able to access it.
I allowed the SQL Destination Editor to design my table from Output of a flat text file. Everything was varchar(50), but that was cool, because I got to see the data in the new staging tables it created. I went back and tweaked the data types and sizes for various columns to be more appropriate through the table designer in SQL Server Management Studio.
After doing so I get an error trying to edit the package, specifically parts of the destination in the data flow. I get the error "An error occurred due to no connection. A connection is required when requesting metadata... blah blah". I pick the TEST CONNECTION in the Connection Manager, and it works fine.
I am sure this is probably a basic issue of mechanics of use that I don't yet know because I am completely new to SSIS. Can someone please provide a hint, perhaps what I did wrong, and also, if you can see it how to redeem the error. Thanks!
I have a custom component that takes in unicode stream and converts it to ascii text. However I would like to make my default string length and code page editable in the standard GUI editor. Right now I can set the default to 1000 characters, but when I try to change it, it says "Property value is not valid"
I've created a stand alone custom dataflow component in VB and I need to set up the connection to the Input and Output components and instantiate it. The only way I've seen this done is to create an entire package and Task then use the TaskHost wrapper object to instantiate the Mainpipe (IDTSPipline90 interface) so that you can create the IDTSPath90 interface and setup the connection to the input and output components...
After all that, all that I would like to know is whether it is possible to instantiate the mainpipe interface without creating a package programmatically? I've seen something Darren Green put in an answer to a thread, about accessing the Mainpipe interface in the UI, to the effect that you can access it through IServiceProvider using the interface IDTSPipelineEnvironmentService - I think that's it... But I'd like to know if there is a more straightforward route to instantiating and accessing the Task or data flow directly?
I am developing SSIS dataflow component. Extended user interface is based on class IDtsComponentUI. Connection properties are created in both standart and my extended editor (Extended user interface). To set up designtime connection I use standart and my extended editor.
A main PipelineComponent component have two runtime connection:
to set up IDTSRuntimeConnection90[0] connection using connections from a current package. After this operation the RuntimeConnectionCollection[0] is not null within the method PipelineComponent.AcquireConnections((object transaction)). ! But during next launching of Extended user interface the RuntimeConnectionCollection[0] is null within the method PipelineComponent.AcquireConnections((object transaction)). Why do I lose the connection? And why the connections which set up in my Advanced editor do not save in standart editor?
I have implemented a custom source component that can be used as the data source in the Data Flow task.
I have also created a custom UI for this component by using the IDtsComponentUI .
But my component does not have the capability of setting the custom properties via the DTS Variables using the Expression Builder.
I have looked around for samples on how to do this, but I can only find samples of how to do this for custom Control Tasks, i.e. IDtsTaskUI.
My question is, How can implement the Expression Builder in my custom Source component + custom Source UI. Or do you know of any samples which I can look at.
I've run into a problem building a new olap cube. It's taking 15 minutes or more when pulling on a dimension before I've even pulled on any measures. This happens in Excel as well as Management Studio. When it does display it seems to only show dimension members that it has data for. In addition, running profiler seems to indicate that it queries each of my partitions when it's doing this. I know that in my last cube if you pulled on a dimension, no members would show up yet and it would be fast. However I can't seem to find the property that's telling it not to query the partitions to determine what members to show. Any ideas?
I have one question regarding Slowly Changing Dimension component in SSIS. Does SCD also delete records in warehouse if they does not exist in source anymore, or does SCD only insert new and update existing records? Can someone explain me a little bit more about inferred members? Thanks.
This Table has the same granularity as the fact table as it’s one row per booking.However due to the nature of the data I would not want to incorporate this into the fact table.The Originating and Destination addresses are populated for each booking and are required for reporting.
Question:Should this be moved into a fast changing Dimension table.? or would there be a better way to incorporate this data.
I have a Type I SCD situation, ie, insert if new (by checking the business ID) and update if any attributes for a given business ID has changed.
The way I usually do this, (and I believe this is how most people do it), is I use a LOOKUP TASK to determine if the business ID exsist in the target table. If it doesn't then I insert. If the business ID exists, then I bring back the associated attributes and use a CONDITIONAL SPLIT TASK to compare if any of the incoming attributes are different. If there are changes, then update.
In doing this comparision, I often run into situations where I end up comparing a NULL value to something, which does not result in FALSE, but a NULL result. To get around this, I first check for NULLs and convert them into something valid before I do the comparision, but this results in a messy comparison expression, especially if I have to compare a lot of attributes.
So, how do you guys handle this?
As an alternative, I am looking into the SLOWLY CHANGING DIMENSION TASK, which I also have some questions on, but I would like to first address the above. Thank you.
I have a package using Slowly changing dimension in the data flow task. It works fine if the number of records are less but for a large file the package fails with the "Violation of Primar Key" error even though there are no duplicate records in the table. for eg i have a table with employee database with a composite primary key comprsing of Name and Employee Id. I need to do an UPSERT depending on the Name and Employee Id combination. I have a file with 100,000 records and when i try to execute the package it gives an error 'cannot insert duplicate data' even though the combination does not exist in the database.
We have been using tasks generated from the SCD wizard. We have smaller dimensions (< 30,000 rows) that work well. Our Product Dimension package is giving us performance problems (taking 7 hours to do 600,000 rows when 80,000 records are updated; the rest new inserts). It is similar to the smaller dimensions. Several columns are type 1 and are doing update statements; several are type 2 doing updates and inserts. The package had a complicated view as the initial task, but we have since modified to use a SQL command with variable and now the initial read appears quick, but is chunking in 10,000 record increments and taking the 7 hours (never let finish previously). So the package is pretty basic now (reading a source, a small derive and data conversion, a small lookup (cached 30,000 records) for a description, then the SCD). Before I start replacing what the SCD generates with stored procedures, anyone have any suggestions as to what might be the issue? We believe we have increased the number of type 2 columns and the SCD definately has more to do than just an insert or update, but 7 hours for 600,000 records seems excessive. Interestingly, the source task never turns green. Previously when we had a Merge Join it completed the read and bottlenecked at a sort and a Merge Join. Now that has been removed and simplified, and all tasks remain yellow with the 10,000 (actually 9,990 I think) chunks appearing at the source, then the SCD before the next chunk appears to be read. On the general release (not the beta). Thanks in advance!
i am using SCD to insert or update .my source and destination table are Oracle and i am using Orcale OLEDB provider . i am getting the following error while executing the package.what could be the solution
[Slowly Changing Dimension [58]] Error: An OLE DB error has occurred. Error code: 0x80040E5D. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E5D Description: "Parameter name is unrecognized.".
Just wondering if any of you implemented a (Kimball type 2) dimension structure, in which a ParentID column exists which points to a record from the same dimension table, using a SCD objects in SSIS. The ParentID column would have to be "Historical".
The challange here is that you would need to go through the table twice somehow, because if I would do a lookup of the parent record in the first run, I wouldn't be sure if I got the right parent record.
I have a Company Dimension table that consists of various sources. One source will provide me address information, another source will provide industry info, etc. I created a historical load package that will pull all of this together so that I have all the necessary data related to a company in one record. All is well.
Since my company data is coming from various sources, how can I tell the SCD to update certain fields but not others for a type 2 change. In essence, I would like to "pull forward" the data that was in the original database row and then update it with only the changes coming from the proper source data. For example, if an address changed I will get the new address from the source but will not have the industry info. I would like to create the new record with the new address but also keep the industry data in tact. Is this possible?
Currently I will get the new record with the new address but will have null values for the industry data.
I am trying to move data from a transactional database to a data warehouse using a slowly changing dimension. The transactional data comes from a view in SQL server that takes <60 seconds to run and returns about 60k rows. The warehouse table is currently 80k rows long (and growing), and contains 7 historical (type 2) dimensions. When I execute the package in BIDS the DataFlow Task begins to execute, and shows that between 20k and 30k rows have been pulled from the data source into the SCD Transform in the first hour before it simply stops doing anything. This is not to say execution stops; it continues. There is no error thrown. No warning given. System resources are 98% free. The database is not being hit at all. And yet, I have let the package sit 'still' as it were for over 8 hours, and nothing ever happens.
4/8/2008 9:36 4/8/2008 9:36 PrimeOutput will be called on a component. : 1715 : Union All 4/8/2008 9:36 4/8/2008 9:36 A component has returned from its PrimeOutput call. : 1715 : Union All 4/8/2008 9:36 4/8/2008 9:36 PrimeOutput will be called on a component. : 2912 : Staged Queues 4/8/2008 9:36 4/8/2008 9:36 Rows were provided to a data flow component as input. : : 2970 : DataReader Output : 70 : Slowly Changing Dimension : 81 : Slowly Changing Dimension Input : 9947 4/8/2008 9:37 4/8/2008 9:37 A component has returned from its PrimeOutput call. : 2912 : Staged Queues 4/8/2008 9:37 4/8/2008 9:37 A component has returned from its PrimeOutput call. : 2912 : Staged Queues 4/8/2008 9:59 4/8/2008 9:59 Rows were provided to a data flow component as input. : : 1718 : New Output : 1715 : Union All : 1716 : Union All Input 1 : 3825 4/8/2008 9:59 4/8/2008 9:59 Rows were provided to a data flow component as input. : : 1688 : Historical Attribute Inserts Output : 1682 : Get End Date : 1683 : Derived Column Input : 645 4/8/2008 9:59 4/8/2008 9:59 Rows were provided to a data flow component as input. : : 1702 : Derived Column Output : 1692 : Update End Date : 1697 : OLE DB Command Input : 645 4/8/2008 10:01 4/8/2008 10:01 Rows were provided to a data flow component as input. : : 1759 : OLE DB Command Output : 1715 : Union All : 1758 : Union All Input 2 : 645 4/8/2008 10:01 4/8/2008 10:01 Rows were provided to a data flow component as input. : : 2970 : DataReader Output : 70 : Slowly Changing Dimension : 81 : Slowly Changing Dimension Input : 9947 4/8/2008 10:24 4/8/2008 10:24 Rows were provided to a data flow component as input. : : 1718 : New Output : 1715 : Union All : 1716 : Union All Input 1 : 3859 4/8/2008 10:24 4/8/2008 10:24 Rows were provided to a data flow component as input. : : 1688 : Historical Attribute Inserts Output : 1682 : Get End Date : 1683 : Derived Column Input : 641 4/8/2008 10:24 4/8/2008 10:24 Rows were provided to a data flow component as input. : : 1702 : Derived Column Output : 1692 : Update End Date : 1697 : OLE DB Command Input : 641 4/8/2008 10:26 4/8/2008 10:26 Rows were provided to a data flow component as input. : : 1759 : OLE DB Command Output : 1715 : Union All : 1758 : Union All Input 2 : 641 4/8/2008 10:26 4/8/2008 10:26 Rows were provided to a data flow component as input. : : 2970 : DataReader Output : 70 : Slowly Changing Dimension : 81 : Slowly Changing Dimension Input : 9947 4/8/2008 10:49 4/8/2008 10:49 Rows were provided to a data flow component as input. : : 1718 : New Output : 1715 : Union All : 1716 : Union All Input 1 : 3969 4/8/2008 10:49 4/8/2008 10:49 Rows were provided to a data flow component as input. : : 1688 : Historical Attribute Inserts Output : 1682 : Get End Date : 1683 : Derived Column Input : 662 4/8/2008 10:49 4/8/2008 10:49 Rows were provided to a data flow component as input. : : 1702 : Derived Column Output : 1692 : Update End Date : 1697 : OLE DB Command Input : 662 4/8/2008 10:49 4/8/2008 10:49 Rows were provided to a data flow component as input. : : 1793 : Union All Output 1 : 1787 : Get Start Date : 1788 : Derived Column Input : 9947 4/8/2008 10:49 4/8/2008 10:49 Rows were provided to a data flow component as input. : : 1814 : Derived Column Output : 1797 : Insert Destination : 1810 : OLE DB Destination Input : 9947 4/8/2008 15:34 4/8/2008 15:34 The pipeline received a request to cancel and is shutting down. 4/8/2008 15:34 4/8/2008 15:34 Thread "WorkThread1" received a shutdown signal and is terminating. The user requested a shutdown or an error in another thread is causing the pipeline to shutdown. 4/8/2008 15:34 4/8/2008 15:34 Thread "WorkThread1" received a shutdown signal and is terminating. The user requested a shutdown or an error in another thread is causing the pipeline to shutdown. 4/8/2008 15:34 4/8/2008 15:34 The pipeline received a request to cancel and is shutting down. 4/8/2008 15:34 4/8/2008 15:34 Thread "WorkThread1" has exited with error code 0xC0047039. 4/8/2008 15:34 4/8/2008 15:34 Thread "WorkThread1" has exited with error code 0xC0047039.
Notice the time difference between the last OnPipelineRowsSent event and the first OnError event (when I clicked the stop button): 5 hours! In all that time, SSIS did not log a single event, or use more than 2% of my processor or exceed 1GB page file or hit the database even once! I am assuming this means it is simply not doing anything. It is not failing, nor is it executing, it is just sitting there.
Has anyone experienced a similar problem? Does anyone know how I might troubleshoot this? Thanks in advance for any help, and let me know if I need to clarify. Also, I am new to SSIS, so if I am missing something obvious, go easy on me! Thanks.
The warehouse I am writing packages for has sort of a "Type 1.5" design for most of its DIMs that I am trying to get to work with the slowly changing dimension object.
Basically it should behave like a type 1 with updates in place BUT send the old prior rows/values to an "archive" server to hold the historical data. Unlike a Type 2 this data will not be used for any processes - but it needs to be kept for historical reseach and auditing.
Any ideas to easily do with the SCD wizard? I thought using the wizard as a type 1 and then after the wizard is done attaching the "Historical Attribute Inserts Output" to the archive db/table would do the trick but that output from the SCD object never has data. I could manually do it with a lookup and so forth but I thought I'd check in here first to see if I am just overlooking something with the SCD object.