Mapping Surrogate Keys Of Level 2 Dimensions To Fact Table In SSIS Data Flow
Aug 16, 2007
Hi,
I use lookups to map surrogate of level 1 dimensions to my fact tables in SSIS.
But how to handle a level 2 dimension with a ValidFrom and a ValidUntil date field?
I do not use an IsCurrent column, because this could problem with late arriving facts.
- In dts I used an SQL statement like this:
update SA
SET SA.DimProdRef = Dim.RecordID
FROM SAWarenEingang SA, DimProd Dim
where SA.ProduktNumber = Dim.ProduktNumber
and SA.ArtikelkontoBewegungsdatum between Dim.ValidFrom and Dim.ValidUntil
Now in SSIS I want to handle the whole thing in the data flow without using a staging table:
- Using Lookups: I would have to pass the date column for each inside the fact table into the lookup. That does not work.
- Using Execute SQL in the data flow: would be very slow, because the statement will be executed for any line in the dataflow
My question here is as the dimension has SCD type 2 on it and every time when there is a change the persn_key gets a new key value but the fact table still points to oldest key.how to update the surrogate key on fact table to the current key value? As per the requirement fact surrogate key must be pointing to current active record on the dimension.
Hi, all experts here, Do we always have to use SCD component for the loading of data into data warehouse to handle changes of rows? I am looking forward to hearing from you and thank you very much in advance for your help. With best regards,
What are the general guildlines for choosing these settings, like paralle vs. sequential, different error configurations? The default selection for processing multiple dimensions is paralle and use default error configuration. But the default for processing multiple partitions is sequential. I cannot find anything helpful other than the definitions from the online help. TIA.
Is there a way to define measure group on fact1-details-table using TimeDim. Date info is only in fact1 table and not in details table.
Is namedquery joining the 2 fact tables the best solution in this case? There is so much redundancy using one fact table, so underlying sqldb uses 2 tables.
I have developed some packages to load data into "Fact" tables in the data warehouse. Some packages are OK, other ones not. What is the problem?: some packages load fact tables with lots of "Lookup - Data Flow Transformation" into the "data flow task" (lookup against dimension tables) but they are very very slow, too much slow to be choosen as a solution.
Do you have any other solutions to avoid using "Lookup - Data Flow Transformation"? Any other solution (SSIS, TSQL and so on....) is welcome to speed up the Fact table loading process.
I am modelling two fact tables of Actuals and Budget which are at different granularity, Actuals are at day, customer and product sub category level. Budgets are at month, Region and Product category level.
Month, Region and Product Category is present in Date, Region and Product Category dimension respectively. I have only three dimensions as Customer, Product and Date. Linking those dimensions to Actual Fact table is not an issue, what is the best way and options are there to link budget fact table to those three dimensions.
I want to create an import table for daily rows with an integer column like 20150430 for the date, called DayKey. This table would do one date per day. It would then be imported into a STAGE table which would have the same columns and would have all of the import rows for every day.My question would be this: I want to be able to have an integer Primary Key unless there is a better idea. I could make the STAGE table use an auto-incremented value for the key. Then, when I load the import table which is truncated every day, I could take the NEXT value of the key from the STAGE table and increment by 1.
Let's say the last value in STAGE is 1000, then the next value that would be in IMPORT would be 1001 and incrementing up. Then these would be added to the STAGE table with the associated keys. There is no chance of anyone or anything else adding to the STAGE table any other way.
I have tables like the one below for my Stage and dimension tables:
Stage Table
accountid
name
address
Dimension Table
accountkey ---- surrogate key (DW key)
accountid ---- business key (transaction's primary key)
name
address
I used slowly changing dimension to detect the changes for the records inside my Dimension table. But I had a problem when a new record exists in the stage table. The accountkey is set as the primary key and it gets its value from a different table which stores the last account key that was created. I cannot load all the changes unless i have a business key. Is there a way that i can get the "last key" from a different table in the data flow area and then supply it together with the other fields in the new output branch of the slowly changing dimension?
I have run into an issue that seems very simple but I am new to SSIS and DBs in general and therefore cannot solve it. I would like to take data from a SQL Server table, lookup a key from a dimension table and update the same SQL Server table with this data. Is there anyway to do an update using SSIS?
I have a small problem in parameter mapping for Execute SQL Task. I am using a delete statement with 2 conditions. Followed by another Execute SQL Task which contains commit statement.
delete from tname where c1 = ? and c2 =?
where c1 is number(4) datatype and c2 is of varchar2(20) datatype in oracle.
The connection manager i am using is ORacle OLE DB provider. I am passing 2 global variables i.e g_v1 of Int32 and g_v2 of String Type.
In the parameter mapping of the Executing SQL task, i am mapping these 2 variables for c1 and c2 and changed the datatypes inside parameter mapping as Numeric for c1 and Varchar for c2.
I also set the property as ByPassPrepare = True.
When i am executing the package i getting INVALID NUMBER ERROR. i believe the SSIS is unable to perform the implict datatype converison.
For the next run, i changed the g_v1 varible datatype to Double and also i changed the parameter mapping for c1 as Doble datatype. This time it is working fine. I can see the Green signal for the 2 SQL Tasks.
But when i connected to Oracle check the count in the table, the data is not getting deleted.
Also, I set the property RetainSameConnection = TRUE for oracle connection manager. I am not able to trace this logical error.
The same is working fine in my local machine. But i am facing the problem when i deployed the same on the client machine.
Is there any problem with parameter mapping? What should be equialent Datatype for Oracle NUMBER datatype that should be used inside the SSIS package while declaring the global variable and inside the parameter mapping.
I created a Fact Table with 3 Keys from dimension tables, like Customer Key, property key and territory key. Since I can ONLY have one Identity key on a table, what do I need to do to avoid populating NULLs on these columns..
As u can see there is two company references in my fact table, and the schema is in snowflake. My customer requirements state that the Contracts' amounts can be aggregated/filtered for/by, ServiceProviderCompany, its city/profession or ClientCompay, its city/profession.
First thing came in to my mind is to dublicate whole dimension structure (one for serviceproviders, one for clients), which i thought that there should be another way around?
Hi, Is there a way to accomplish one- many or many -one or many - many column mappings in the SSIS data flow task or using any other tasks. We were able to do this in DTS Transform data task. Also is it possible to edit the mapping like: dest column1 = Right(dest column1, 3)
I am relatively new to SSIS/SSAS. I have searched the forums but cannot find an answer to my question.
I created a cube in SSAS and have deployed it. Now I am trying to use SSIS to populate the cube. I have setup a DS that points to the SSAS instance - it uses OLEDB Provider for Analysis services 9.0.
When I try to use a data flow task OLE DB source to truncate the dimension/cubes I do not see the DS in the list to select?
I am finding it hard to get into the SSIS way of organizing the processing.
I am building a health care application that marries transaction-level data (health care services provided) with person-level characteristics that have a time-dimension. The person-level characteristics are diseases that the person has (these disease all have a start and some have an end date). The diseases are stored in a table in which the foreign keys are a person-identifier, a time identifier (month/year) and a surrogate for the disease. Persons can have more than one disease at a time (the diseases are NOT mutually-exclusive). There are no measures in this table. The transaction table has a foreign key for person and time (month/day/year), a procedure code (the type of service rendered) and money (the cost of the services).
How do I answer the following questions:
What is the total cost of care (the sum of all service costs) last year for persons with "disease A"?
What is the total cost of care last year for persons with "disease A" AND "disease B"?
What is the total cost of care last year for persons with "disease A" OR "disease B"?
I've tried a factless fact table but can't get it to work. If anyone has the right solution and can communicate to me before I slit my wrists, I would be greatly appreciative!!!
I'm having my first go at developing a destination adapter which will send data to an update Web Service.
I've got some rather big gaps in my understanding. I've been following the various samples I've found on the net and have validated my mapping and picked up all the available column names and datatypes which are appearing in the Input and Output Properties tab of the Advanced Editor but I only have a tab for "Input Columns" and not "Column Mappings".
Which method defines the availble columns for the user to map?
Let me know if I haven't given enough information.
I am in the process of building a fact table in a staging area. The data in the host system has numerous composite keys, so I have replaced all the composite keys in the dimensions with surrogate keys (integer) which are generated using an identity at load time. When I load the staging (fact) table, I have set the default value of all the foreign keys to 0. What I must do now is update all the foreign key values with the surrogate key values from the dimensions. I'm using an update command and the original gid values from the source system in the where clause...i.e. UPDATE X SET x.key_1 = y.key_1 FROM TableA X WITH (NOLOCK) INNER JOIN TableB Y WITH (NOLOCK) ON x.org_id = y.org_id AND x.bus_id = y.bus_id AND x.prov_gid = y.prov_gid AND x.log_gid = y.loc_gid;
This seems to work fine for most tables. However, I am now trying to update a table that has over 10 million rows and approximately 30 foreign keys. The script runs for hours. I ususally stop it after about 8 hours when it still hasn't completed. Since the keys are dynamic and they could possibly change during each load process, I can't add them during the load process.
Is there a better way to update these keys. I need to regenerate the fact tables every night and taking this much time to reload a fact table is just not practicle. I've indexed the alternate keys on all the dimensions and have also indexed the gids on the target fact table. Am I doing something wrong? Have I over indexed the target table? Please help! Thanks Jerry
This is the code iam using to get the incremental surrogate keys:
Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain Inherits UserComponent 'Declare a variable scoped to class ScriptMain Dim counter As Integer
Public Sub New() 'This method gets called only once per execution 'Initialise the variable counter = 1093 End Sub
'This method gets called for each row in the InputBuffer Public Overrides Sub Input_ProcessInputRow(ByVal Row As InputBuffer) 'Increment the variable counter += 1
'Output the value of the variable Row.instance = counter End Sub
End Class
--'Instance' is my surrogate feild name
but iam getting an error saying that InputBuffer is not defined ..Any idea?
If I want to add two more incrementive fileds ,where i have to add it?
Sorry if it sounds silly ,iam very new to this scripting.
I built my first tabular model and see that my fact tables are also appearing as dimensions. In Multi dimensional mode i could choose which are the dimensions. How do i do that in tabular model.
My Requirement is Update Table 1 set Column::No=Table 2.ID
based on Exact Match of
Table1.Name=Table2.Name and
Table1.Add=Table2.Add
It means Get back the Id for Source Table 1
2nd Data flow Source(Table1:Name, Add,No) |
--LOOKUP(Table2:Name, Add::Matched Look Columns Name, Add and Tick Mark on ID) |(Match)
-->OLEDB Command: update Table1 set N0=? where RowID=?(Here Param_0= NO ,Param_1=RowID)
Here My Issue is if Table 1 had Duplicates(same Name, Add, but Row Id is different it is Updating Same ID for Table 1.No It means Get Back ID correctly not updating Result::
Table 1:
------- ----- ---- ---- Name Add No RowID ------- ----- ---- ------- aa #a-1,India 1 10 bb #a-1,India 2 11 aa #a-1,India 1 12
I wish to use the same data to update 2 different tables.
There is no green arrow output from the OLE DB data destination, so I can't have another component following on from the first insert.
This means I have to use the Multicast to 'copy' the data prior to the first table insert.
I can then use the data to perform inserts to both tables.
However, there is an FK constraint between these two tables, so I need to wait until the first table insert has finished before performing the second table insert.
How can I do this? How can I make the second insert dependent on the first?
I am looking for a way to leave a Data Flow Task destination table name as-is, and have SSIS auto-create the table if it doesn't exist already.
I searched on this in the forums but based on the question it's difficult to kow if it has been answered or not.
Details:
I am writing some SSIS packages that need to be executable on another server. Many of the Data Flow Tasks copy data (such as from a Fuzzy Grouping transformation, and lots of other stuff) into a new table. But the other server will not have these tables set up for the first run.
My current solution is to check information_schema.tables and drop IF EXISTS. But, then the Data Flow Task will not work (becase table does not exist). So, I script to new window a create table statement based on the existing table that I use in my dev environment. This is a hack and I want to find a better method.
It is quite possible (although unlikely) that the source columns could be changed in the future, or some query used to pull the data might be modified. If this happens, then I would need to change the CREATE TABLE Execute SQL task. I want my package to accommodate without having to modify it.
When I use the Import/Export Wizard, I can select a table name from the drop down list OR type in a new name. When I type in the new name, it assumes I want to create the table. NOW, is there a way to mimic this in BI Developer Studio? Yep, I saved the Wizard version of the SSIS package and all it does is run a CREATE TABLE statement first.
I am looking for a way to leave a Data Flow Task destination table name as-is, and have SSIS auto-create the table if it doesn't exist already.
I'm trying to create a DM model using TS algorithm, to predict sales for different products and channels but I can only get it to work using one of those two "dimensions" or columns the other one is ignored (This is, my fact table contains a key for time, a key for channel a key for product and the metrics and the model only seems to allow working with time, the metrics and only one of the other dimensions product or channel ..) Am I missing something?
Say you have a fact table with a few columns that all reference the same key column in a dimension table, you want to write a view to return the information for those keys?
USE MyTestDB; GO SET NOCOUNT ON; IF OBJECT_ID ('dbo.FactTemp' ,'U') IS NOT NULL DROP TABLE dbo.FactTemp;
[Code] ....
I'm using very small data at the moment, and the query plan and statistics don't really say which way.
We have created SSIS package to load a text file into a table. Source system shares 10 text files and recently they stopped generating data for one of the text file (comping empty), after few months they will start generating the data for the empty file batch processing.
The Issue here is Data Flow task is getting failed while loading empty text file into table. How to handle this empty file load issue in SSIS package.
The all-level of dimensions doesn't show up in the PivotTable Field List? I have reports where I want to show one member of a dimensions compared to the total of the dimension (and not the total of the members shown). But I can't select the ALL-level. Is there any way to do this?
I need help from you data warehouse / SSIS experts out there! I have a Transaction Fact Table with dollar amounts as the measurements. The grain is one row per transaction. I want to roll this up into a Monthly Periodic Snapshot based on 5 keys. I am having no problem where there is transaction data for each month.
However, the problem I am having is - how do I gracefully insert the Monthly rows for the five keys where there was no activity in the transaction fact table - I am sure there is a slick way to do this with SSIS but I am definitely having a mental block on how to accomplish this. Any help would be appreciated!