Slowly Changing Dimension - Always Shows Rows As Changed?!
Aug 29, 2006
I created a simple type 1 slowly changing dimension, setting all the columns to "Changing Attibute". The first time I run the package it sees all rows as new and imports them into the dim as it should. Next time I run it put 100% of the rows into the "Changing Attribute Updates" and runs an update on all 90,000 rows - updating the rows to exactly what they were before
If I take the DIM and the Source in SQL and join on every row the join succeeds (meaning the rows match perfectly).
Shouldn't the SCD object just ignore the rows if they match? Or does it assume that ALL incoming rows are either new or changed? (if so why is there an output called "Unchanged Output"?). Is there some "gotcha" I am missing??
We have been using tasks generated from the SCD wizard. We have smaller dimensions (< 30,000 rows) that work well. Our Product Dimension package is giving us performance problems (taking 7 hours to do 600,000 rows when 80,000 records are updated; the rest new inserts). It is similar to the smaller dimensions. Several columns are type 1 and are doing update statements; several are type 2 doing updates and inserts. The package had a complicated view as the initial task, but we have since modified to use a SQL command with variable and now the initial read appears quick, but is chunking in 10,000 record increments and taking the 7 hours (never let finish previously). So the package is pretty basic now (reading a source, a small derive and data conversion, a small lookup (cached 30,000 records) for a description, then the SCD). Before I start replacing what the SCD generates with stored procedures, anyone have any suggestions as to what might be the issue? We believe we have increased the number of type 2 columns and the SCD definately has more to do than just an insert or update, but 7 hours for 600,000 records seems excessive. Interestingly, the source task never turns green. Previously when we had a Merge Join it completed the read and bottlenecked at a sort and a Merge Join. Now that has been removed and simplified, and all tasks remain yellow with the 10,000 (actually 9,990 I think) chunks appearing at the source, then the SCD before the next chunk appears to be read. On the general release (not the beta). Thanks in advance!
Hi , I am using the Integration Services slowly changing dimension to move data from a SQL Server 2000 database table to a SQL Server 2005 table.
The problem is the package is not tracking changes, it is spending a lot of time doing lookups (i'm guessing this cost it's real slow), but ends up creating new records when there has not been a change.
I'm quite sure the business key is set up correctly (I'm using the PK from the source table).
The database I am transferring from has non Unicode data types (ie varchar and char) and the destination database has Unicode data types (ie nvarchar).
Also some of the fields in the dB are NULL - does this have an effect (ie one null doesn't equal another null)? Or shouldn't that matter?
Hi , I am using the Integration Services slowly changing dimension to move data from a SQL Server 2000 database table to a SQL Server 2005 table.
The other problem is the package is not tracking changes it is spending a lot of time doing lookups (it's slow), but ends up creating new records when there has not been a change.
I'm quite sure the business key is set up correctly (I'm using the PK from the source table).
The database I am transferring from has non Unicode data types (ie varchar and char) and the destination database has Unicode data types (ie nvarchar).
Also some of the fields in the dB are NULL - does this have an effect (ie one null doesn't equal another null)? Or shouldn't that matter?
Can you help me to resolve my problem ? I have to do a simple daily backup system. Source : Flat File; Destination : SQL Server. I want to use the Slowly Changing Dimension component to backup only the new and updated row from my source (Flat File) and put them into SQL Server.
But how can I do to detect deleted rows from my source ?
Any suggestions ?
If it's not clear enough, please ask for more details !
I have one question regarding Slowly Changing Dimension component in SSIS. Does SCD also delete records in warehouse if they does not exist in source anymore, or does SCD only insert new and update existing records? Can someone explain me a little bit more about inferred members? Thanks.
I have a Type I SCD situation, ie, insert if new (by checking the business ID) and update if any attributes for a given business ID has changed.
The way I usually do this, (and I believe this is how most people do it), is I use a LOOKUP TASK to determine if the business ID exsist in the target table. If it doesn't then I insert. If the business ID exists, then I bring back the associated attributes and use a CONDITIONAL SPLIT TASK to compare if any of the incoming attributes are different. If there are changes, then update.
In doing this comparision, I often run into situations where I end up comparing a NULL value to something, which does not result in FALSE, but a NULL result. To get around this, I first check for NULLs and convert them into something valid before I do the comparision, but this results in a messy comparison expression, especially if I have to compare a lot of attributes.
So, how do you guys handle this?
As an alternative, I am looking into the SLOWLY CHANGING DIMENSION TASK, which I also have some questions on, but I would like to first address the above. Thank you.
I have a package using Slowly changing dimension in the data flow task. It works fine if the number of records are less but for a large file the package fails with the "Violation of Primar Key" error even though there are no duplicate records in the table. for eg i have a table with employee database with a composite primary key comprsing of Name and Employee Id. I need to do an UPSERT depending on the Name and Employee Id combination. I have a file with 100,000 records and when i try to execute the package it gives an error 'cannot insert duplicate data' even though the combination does not exist in the database.
i am using SCD to insert or update .my source and destination table are Oracle and i am using Orcale OLEDB provider . i am getting the following error while executing the package.what could be the solution
[Slowly Changing Dimension [58]] Error: An OLE DB error has occurred. Error code: 0x80040E5D. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E5D Description: "Parameter name is unrecognized.".
Just wondering if any of you implemented a (Kimball type 2) dimension structure, in which a ParentID column exists which points to a record from the same dimension table, using a SCD objects in SSIS. The ParentID column would have to be "Historical".
The challange here is that you would need to go through the table twice somehow, because if I would do a lookup of the parent record in the first run, I wouldn't be sure if I got the right parent record.
I have a Company Dimension table that consists of various sources. One source will provide me address information, another source will provide industry info, etc. I created a historical load package that will pull all of this together so that I have all the necessary data related to a company in one record. All is well.
Since my company data is coming from various sources, how can I tell the SCD to update certain fields but not others for a type 2 change. In essence, I would like to "pull forward" the data that was in the original database row and then update it with only the changes coming from the proper source data. For example, if an address changed I will get the new address from the source but will not have the industry info. I would like to create the new record with the new address but also keep the industry data in tact. Is this possible?
Currently I will get the new record with the new address but will have null values for the industry data.
I am trying to move data from a transactional database to a data warehouse using a slowly changing dimension. The transactional data comes from a view in SQL server that takes <60 seconds to run and returns about 60k rows. The warehouse table is currently 80k rows long (and growing), and contains 7 historical (type 2) dimensions. When I execute the package in BIDS the DataFlow Task begins to execute, and shows that between 20k and 30k rows have been pulled from the data source into the SCD Transform in the first hour before it simply stops doing anything. This is not to say execution stops; it continues. There is no error thrown. No warning given. System resources are 98% free. The database is not being hit at all. And yet, I have let the package sit 'still' as it were for over 8 hours, and nothing ever happens.
4/8/2008 9:36 4/8/2008 9:36 PrimeOutput will be called on a component. : 1715 : Union All 4/8/2008 9:36 4/8/2008 9:36 A component has returned from its PrimeOutput call. : 1715 : Union All 4/8/2008 9:36 4/8/2008 9:36 PrimeOutput will be called on a component. : 2912 : Staged Queues 4/8/2008 9:36 4/8/2008 9:36 Rows were provided to a data flow component as input. : : 2970 : DataReader Output : 70 : Slowly Changing Dimension : 81 : Slowly Changing Dimension Input : 9947 4/8/2008 9:37 4/8/2008 9:37 A component has returned from its PrimeOutput call. : 2912 : Staged Queues 4/8/2008 9:37 4/8/2008 9:37 A component has returned from its PrimeOutput call. : 2912 : Staged Queues 4/8/2008 9:59 4/8/2008 9:59 Rows were provided to a data flow component as input. : : 1718 : New Output : 1715 : Union All : 1716 : Union All Input 1 : 3825 4/8/2008 9:59 4/8/2008 9:59 Rows were provided to a data flow component as input. : : 1688 : Historical Attribute Inserts Output : 1682 : Get End Date : 1683 : Derived Column Input : 645 4/8/2008 9:59 4/8/2008 9:59 Rows were provided to a data flow component as input. : : 1702 : Derived Column Output : 1692 : Update End Date : 1697 : OLE DB Command Input : 645 4/8/2008 10:01 4/8/2008 10:01 Rows were provided to a data flow component as input. : : 1759 : OLE DB Command Output : 1715 : Union All : 1758 : Union All Input 2 : 645 4/8/2008 10:01 4/8/2008 10:01 Rows were provided to a data flow component as input. : : 2970 : DataReader Output : 70 : Slowly Changing Dimension : 81 : Slowly Changing Dimension Input : 9947 4/8/2008 10:24 4/8/2008 10:24 Rows were provided to a data flow component as input. : : 1718 : New Output : 1715 : Union All : 1716 : Union All Input 1 : 3859 4/8/2008 10:24 4/8/2008 10:24 Rows were provided to a data flow component as input. : : 1688 : Historical Attribute Inserts Output : 1682 : Get End Date : 1683 : Derived Column Input : 641 4/8/2008 10:24 4/8/2008 10:24 Rows were provided to a data flow component as input. : : 1702 : Derived Column Output : 1692 : Update End Date : 1697 : OLE DB Command Input : 641 4/8/2008 10:26 4/8/2008 10:26 Rows were provided to a data flow component as input. : : 1759 : OLE DB Command Output : 1715 : Union All : 1758 : Union All Input 2 : 641 4/8/2008 10:26 4/8/2008 10:26 Rows were provided to a data flow component as input. : : 2970 : DataReader Output : 70 : Slowly Changing Dimension : 81 : Slowly Changing Dimension Input : 9947 4/8/2008 10:49 4/8/2008 10:49 Rows were provided to a data flow component as input. : : 1718 : New Output : 1715 : Union All : 1716 : Union All Input 1 : 3969 4/8/2008 10:49 4/8/2008 10:49 Rows were provided to a data flow component as input. : : 1688 : Historical Attribute Inserts Output : 1682 : Get End Date : 1683 : Derived Column Input : 662 4/8/2008 10:49 4/8/2008 10:49 Rows were provided to a data flow component as input. : : 1702 : Derived Column Output : 1692 : Update End Date : 1697 : OLE DB Command Input : 662 4/8/2008 10:49 4/8/2008 10:49 Rows were provided to a data flow component as input. : : 1793 : Union All Output 1 : 1787 : Get Start Date : 1788 : Derived Column Input : 9947 4/8/2008 10:49 4/8/2008 10:49 Rows were provided to a data flow component as input. : : 1814 : Derived Column Output : 1797 : Insert Destination : 1810 : OLE DB Destination Input : 9947 4/8/2008 15:34 4/8/2008 15:34 The pipeline received a request to cancel and is shutting down. 4/8/2008 15:34 4/8/2008 15:34 Thread "WorkThread1" received a shutdown signal and is terminating. The user requested a shutdown or an error in another thread is causing the pipeline to shutdown. 4/8/2008 15:34 4/8/2008 15:34 Thread "WorkThread1" received a shutdown signal and is terminating. The user requested a shutdown or an error in another thread is causing the pipeline to shutdown. 4/8/2008 15:34 4/8/2008 15:34 The pipeline received a request to cancel and is shutting down. 4/8/2008 15:34 4/8/2008 15:34 Thread "WorkThread1" has exited with error code 0xC0047039. 4/8/2008 15:34 4/8/2008 15:34 Thread "WorkThread1" has exited with error code 0xC0047039.
Notice the time difference between the last OnPipelineRowsSent event and the first OnError event (when I clicked the stop button): 5 hours! In all that time, SSIS did not log a single event, or use more than 2% of my processor or exceed 1GB page file or hit the database even once! I am assuming this means it is simply not doing anything. It is not failing, nor is it executing, it is just sitting there.
Has anyone experienced a similar problem? Does anyone know how I might troubleshoot this? Thanks in advance for any help, and let me know if I need to clarify. Also, I am new to SSIS, so if I am missing something obvious, go easy on me! Thanks.
The warehouse I am writing packages for has sort of a "Type 1.5" design for most of its DIMs that I am trying to get to work with the slowly changing dimension object.
Basically it should behave like a type 1 with updates in place BUT send the old prior rows/values to an "archive" server to hold the historical data. Unlike a Type 2 this data will not be used for any processes - but it needs to be kept for historical reseach and auditing.
Any ideas to easily do with the SCD wizard? I thought using the wizard as a type 1 and then after the wizard is done attaching the "Historical Attribute Inserts Output" to the archive db/table would do the trick but that output from the SCD object never has data. I could manually do it with a lookup and so forth but I thought I'd check in here first to see if I am just overlooking something with the SCD object.
Is there a way to change the data source for a SCD Component without having to go back to reinsert the matches for Source and Destination columns. Note the underlying data table hasn't changed, just the server the table resides on. Whenever I change the data source I am noticing that I have to painfully go back and match columns one by one.
I have a problem with the SCD-transformation in SSIS. I have a variable that holds the batchid for the current batch and I want to add this variable to the datapipline in the Data Flow Task.
This is done by using a Derived Column, so far so good. The problem occurs in the Slowly Changing Dimension transformation where I do som evaluations of changed columns BUT I don´t want to do any evaluation of the batchid-variable because then all historical batchid will be updated.
I only want to update the batchid for row that have changed in the current batch.
Is it possible to do this in any way without adding the Derived Column after the SCD transformation??
We're using slowly changing dimensions to control a number of data tables in our system. Each table has five or six business keys, but the indexes of the tables are built so they're as efficient as possible (i.e. the fields with the highest diversity are listed first). How does the SCD wizard determine the order of the business key fields? Is there a way I can view or manipulate the statement the SCD task is using to make sure either (a) the indexes match the statement, or (b) the statement matches the indexes?
On the face of it, the Slowly Changing Wizard seems a great idea, but is it really any good in the longer term? After you have selected your Type 1 and/or Type 2 changing columns ... then you have "customised" the generated data flow and carefully saved it ... how do you change the column definitions in the future?
If I want to add another column and make it Type 2 how do I go about it? If I run the "Wizard" doesn't this just destroy all my previous customisation work?
I have tried looking in the .dtsx xml file ... hmmm. I notice it's not really editable without some inside information. All those magic numbers in there ... I've fallen foul of them in the past with cut/paste or trying to INSERT a data flow task on the IDE ... luckily I back up my projects on a regular basis. I now have quite a large collection of Projects with the suffix "_corrupt". Are things going to get better?
I have a SSIS package which contains a number of slowly changing dimension transformations. While the majority work I have one which gives me the following error 'Error: The variable "System::LocaleID" is already on the read list. A variable may only be added once to either the read lock list or the write lock list. '. This error only occurs if the destination table holds data. If I truncate the table and reload the data then the package complete successful. The only difference I can see between this dimension transformation and the other dimension transforms is that the one in question has 2 business keys while the rest have 1.
We're currently running into an interesting situation where it seems that a Slowly Changing Dimension Transformation believes that 'Å“' (ASCII #156) is the same as 'oe' (ASCII #111 + ASCII #101).
To make a long story short, one of our integration package updates some Product table based on the result of a Slowly Changing Dimension Transformation with three outputs: Unchanged, New and Changing Attribute Updates. Among the columns leading to a row redirection in the Changing Attribute Updates output is some French Description column defined in SQL Server as a varchar(60) (external column) and in the SSIS package as a (DT_STR, 60, 1252) (input column). Now, when the SCD Transformation compares the word 'coeur' (external column) with the word 'cœur' (input column) in this French Description column for a given row, the row is redirected to the Unchanged output (no other columns changed...) instead of to the Changing Attribute Updates output.
Is my example clear enough? Any idea what could explain this unfortunate result? Note that from a strict French point of view, 'Å“' instead of 'oe' is a typographical fantasy and that in my example above, the word 'coeur' ("heart") is really spelled 'c' + 'o' + 'e' + 'u' + 'r', but we're talking programmed comparison here, not linguistic, right?
In our application we have created a SSIS package which extracts data from staging table and places the same in destination table. We have created a slowly changing dimension for the same. Slowly changing dimension uses a composite business key of two columns to decide whether it is a old record or a new record.
Problem : On execution of the package it copies duplicate records with same business keys instead of updating the same. Also the same does not happen for all records. For few records update works fine but for others it inserts a new duplicate record.
I will appreciate if anybody can guide me where I am doing something wrong.
Hello I use SSIS to load a Unicode file into a single table I Use a "slowly changing dimension" task to load the destination table and when i map a column (DT_WSTR) to a column with the datatype nvarchar(max) i have an error message that say that i can't map theses columns because there have not the same datatype.
I find a workaround : i map all my cols except the colums that must fill the cols with datatype nvarchar(max) , and after i modify manually the 2 subtask generated by the "slowly changing dimension" task (the insert and the update) and with this way i don't have error messages It works fine but is it the good way?
I have as simple IS package with a Flat File input and SCD Transform. The package is being used to load a Dimension Table, obviously.
The Table was designed in Analyical Services using the Dimension Wizard which very nicely adds End Date, Start Date, Status and Original ID to the Dimension Table design. I then Generate Relational Scheme... and Viola there is a Dim Table in my data warehouse database!
I next attempt to use the IS package with SCD Transform to load said Dim Table. The SCD Transform Wizard does a nice job of using the End and Start dates and Status fields in my new Dim Table, but I can not find any way to use the Original ID field?
I guess I was expecting that since the SCD Transform Wizard 'knows' how to use all the other fields in this Dim table why not the Original ID field? I know what the Original ID field is for - that's not the issue. I just can not figure out how to 'make' my new IS Data Flow with said SCD Transform use the Original ID field ?
I think slowly changing dimension object is not a good choice to update my dimension. It's running slower than I expected. my dimension records has surrogate keys from a control table that SCD is looking up whenever it encounters a new record.
I have an existing SSIS that uses SCD on a rather large table. I am also migrating to SQL2014 Enterprise. My question is, is it possible to use SCD and Columnstore(CS) together? I know that I can drop the CS and add a business key then insert data. Then drop the business key and add the CS back.
We have successfully built a SSIS ETL implementation for a data mart. Most of our dimension loads are using the slowly changing dimensions task. We successfully built every package and had 1 complete successful load.
We have made a modification to one of the tables that now has 16 mil records in it. I open the dimension's package and it takes 30 sec - 1 minute to validate. When I go to the Data Flow tab and attempt to open the Slowly Changing Dimension task, SSIS will freeze up. I will get no response for hours. I my testing so far SSIS has yet to come back.
My CPU usage is below 5%. devenv.exe memory usage is at 113 Mb. My computer is using 400mb / 1Gb of memory. The status bar at the bottom says "ready". I have let it sit for up to 2 hours with no results. When I kill the process in the Windows Task Manager it ends instantly.
I have also tried deleting the task and adding a new Slowly Changing Dimension Task. When I select the table from the drop down I get the same results. I have noticed this with all my million + record tables that use the slowly changing dimension task but not the smaller tables. I am also seeing the problem when I am in "Work Ofline" mode.
I am accessing a SQL 2005 server over a VPN connection. I am connecting to the server using a Native OLE DB OLE DB for SQL Server connection. My computer is running Windows 2000 SP4, Intel Pentium M1.86 Ghz, 1 Gb of RAM.
Can a Slowly Changing Dimension Task be used on large tables? I can understand the process taking a while to run but I do not understand why I can not edit the task.
Probably not the right forum - pointers would be appreciated - but I'll give it a try anyway:
I'm in the process of designing a relational database to be used in a BI scenario - ie. dimension and fact tables. The data will eventually be used to feed cubes in Analysis services, however end users will probably be allowed to run reports aginst views of the relational database.
I'm currently looking at the employee dimensions and my first try would designate AGE as a SCD Type 2 attribute. As a result every employee gets at least one new record every year as AGE increases. Given that BIRTHDATE is specified should I drop AGE from the tables and recreate it as a computed attribute in database views and/or cubes?
Anybody who knows how inferred members update function in slowly changing dimension? I came across this when I started using some of the functions of the Slowly changing dimension object.
Apologies if this has been raised in the past, but 6 hours of web searching today hasn't turned up anything!
I'd like to use the Slowly Changing Dimension (SCD) Wizard to keep track of tables in my relational database. This means 200+ tables. I don't want to step through the UI Wizard for each table. Ideally I'd like to be able to create the SCD transformation in code, but I can find no good examples for doing this. The MSDN examples here are too brief and don't allow me to expand out to the level I need.
As in any database, columns come and (very rarely, go), and having a programmatic solution to this would mean that I could be flexible and cope with these situations.
So, my question is: Has anyone implemented SCD functionality in code, or have any code examples that do this, that I might learn from. Or, any tips/pointers if I'm barking up totally the wrong tree.
I am testing the SCD with some sample data and am seeing duplicate rows getting inserted into my table. When configuring the wizard I am prompted to select a key type (Business or Not a key column). I assumed that the SCD wizard would NOT insert a row that already has a business key in the dimension table but that doesn't seem to be the case??? Is that by design?
Often when I open an SCD component that was previously configured I find that the configurations are lost. The "Input Columns" and the "Key Type" columns on the first screen ("Select a Dimension Table and Keys") are blank. The "Dimension Columns" column is still populated.
If I run the package, the SCD component appears to function the way it was configurated to function. However, the settings cannot be viewed.
Has anyone experienced anything similar? Any ideas why this happens?
I have created a sample dataflow to parse the employee details (empid,empname,empaddr) from a flat file to oracle 9i database table named employee(columns : empid,empname,empaddress - All are varchar2(15)) using SLOWLY CHANGING DIMENSION transformation for insert/update on the table.
EMPID as Businees key
EMPNAME and EMPADDR as changing attributes.
Connection string is using Microsoft oledb provider for oracle.
TITLE: Microsoft Visual Studio ------------------------------
Error at Data Flow Task [OLE DB Command 1 [2007]]: An OLE DB error has occurred. Error code: 0x80040E51. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E51 Description: "Provider cannot derive parameter information and SetParameterInfo has not been called.".
Error at Data Flow Task [OLE DB Command 1 [2007]]: Unable to retrieve destination column descriptions from the parameters of the SQL command.
Warning at {CF5DCB64-279E-45A4-A9A8-FF2FBB130980} [Insert Destination [1972]]: Cannot retrieve the column code page info from the OLE DB provider. If the component supports the "DefaultCodePage" property, the code page from that property will be used. Change the value of the property if the current string code page values are incorrect. If the component does not support the property, the code page from the component's locale ID will be used.
Errors were encountered while generating the wizard results: Error at Data Flow Task [OLE DB Command [1996]]: An OLE DB error has occurred. Error code: 0x80040E51. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E51 Description: "Provider cannot derive parameter information and SetParameterInfo has not been called.".
Error at Data Flow Task [OLE DB Command [1996]]: Unable to retrieve destination column descriptions from the parameters of the SQL command.
Error at Data Flow Task [OLE DB Command 1 [2007]]: An OLE DB error has occurred. Error code: 0x80040E51. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E51 Description: "Provider cannot derive parameter information and SetParameterInfo has not been called.".
Error at Data Flow Task [OLE DB Command 1 [2007]]: Unable to retrieve destination column descriptions from the parameters of the SQL command.
For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%u00ae+Visual+Studio%u00ae+2005&ProdVer=8.0.50727.42&EvtSrc=Microsoft.DataTransformationServices.Design.SR&EvtID=ScdWizardGenerationErrors&LinkId=20476
I have a Slowly changing dimension that I am using to populate a dimension table. My problem is this when I run the package and any of the fields are marked as Historical Attributes it will add an additional row regardless of the fact that the incoming data and the data in the warehouse match exactly.
I've tried several things to fix this problem but so far none of them have worked. Some of the things I have tried that haven€™t worked are to match all the data types (which I have to do anyways) I've tried trimming the strings, I've also tried adding just one column
I am using a data conversion to convert them from varchar (the source datatype) to nvarchar(the warehouse datatype)
I'm at a dead end here and don't know where to go any help would be greatly appreciated.