Data Warehousing :: Populating Fact Tables With Surrogate Key From Dimension Table?
Sep 11, 2015How do I correctly populate a fact table with the surrogate key from the dimension table?
View 4 RepliesHow do I correctly populate a fact table with the surrogate key from the dimension table?
View 4 RepliesI created a Fact Table with 3 Keys from dimension tables, like Customer Key, property key and territory key. Since I can ONLY have one Identity key on a table, what do I need to do to avoid populating NULLs on these columns..
View 3 Replies View RelatedI have dimension data like this
persn_key persn_id address is_active updated_date
1 10 NYC 0 2015-11-04 14:19:54.817
2 10 Chicago 1 null
and Fact table like
fact_key persn_key units_purchased
1 1 10
persn_key is the surrogate key between tables.
My question here is as the dimension has SCD type 2 on it and every time when there is a change the persn_key gets a new key value but the fact table still points to oldest key.how to update the surrogate key on fact table to the current key value? As per the requirement fact surrogate key must be pointing to current active record on the dimension.
I have a Fact Table with a ID column as Primary key and clustered index is created. And also I have 4 dimensions FK's of data type INTEGER. And finally, I have one aggregation measure in the Fact Table.
Now, my situation is How can I improve the speed of querying the fact table by creating any of the below indexes?
1. XML
2. Spatial
3. Clustered
4. Non-Clustered
Hi All,
What is the best way to move data from Online system tro data warehouse?
I have created 3 dimension tables(product,date and customer tables) and
I wanna create fact table and get foreign keys from dimension tables.
What is the best method to do that in SSIS?
thanks
Hi,
I use lookups to map surrogate of level 1 dimensions to my fact tables in SSIS.
But how to handle a level 2 dimension with a ValidFrom and a ValidUntil date field?
I do not use an IsCurrent column, because this could problem with late arriving facts.
- In dts I used an SQL statement like this:
update SA
SET SA.DimProdRef = Dim.RecordID
FROM SAWarenEingang SA, DimProd Dim
where SA.ProduktNumber = Dim.ProduktNumber
and SA.ArtikelkontoBewegungsdatum between Dim.ValidFrom and Dim.ValidUntil
Now in SSIS I want to handle the whole thing in the data flow without using a staging table:
- Using Lookups: I would have to pass the date column for each inside the fact table into the lookup. That does not work.
- Using Execute SQL in the data flow: would be very slow, because the statement will be executed for any line in the dataflow
Any ideas?
Best regards,
Stefoon
Hi,
I have tables like the one below for my Stage and dimension tables:
Stage Table
accountid
name
address
Dimension Table
accountkey ---- surrogate key (DW key)
accountid ---- business key (transaction's primary key)
name
address
I used slowly changing dimension to detect the changes for the records inside my Dimension table. But I had a problem when a new record exists in the stage table. The accountkey is set as the primary key and it gets its value from a different table which stores the last account key that was created. I cannot load all the changes unless i have a business key. Is there a way that i can get the "last key" from a different table in the data flow area and then supply it together with the other fields in the new output branch of the slowly changing dimension?
cherriesh
thanks!
I am having a torrid time trying to populate a FACT table in ORACLE.
I have so far tried two approaches -
1. Identified the change type for records - meaning I mark the record as inserts or updates based on the availability of the key in the FACT table. I used a conditional split and then used two OLE DB destination tasks to load the FACT table. But this failed as I was pretty much inserting on both the conditions (inserts and updates). So this attempt would count as a batch update process attempt.
2. Tried to also write a stored procedure in ORACLE destination database and then use that stored procedure to execute record by record using OLE DB Transformation task. But this did not fly as well due to the fact that SSIS could not understand the oracle SP and could not parse the information out.
Thank you
Hi,
I am new at SSIS and I am trying to create a Datawarehouse using SSIS. I have the data files as flat files I have the Dimensional Model ready on Paper and Now I need to use the SSIS for the ETL process.
I am trying to figure out how to make dimension tables in SSIS? I mean I want to create the 5 Dimension tables and then create a Fact table out of it but I cant understand where to start? Can any one tell me how we create Dimesion tables in SSIS. Like one of the dimesion tables I need to create uses 2 flat files and is like a flattened dimension, How would I create this in SSIS?
Even if there is any tutorial which shows this step by step do let me know. I would really appreciate any guidance on this.
Thanks,
Sarang
Hi there, my question is really simple. I want to setup an automatic task in SSIS that drops the tables in the target database and substitutes them with tables from the source database. We are talking about two or three dimension tables and one fact table. The dimension tables are pretty small. The fact table will contain, at maximum, 300,000 rows and 12 columns. I do not use delta or flag historisation btw. What tasks in SSIS would you suggest to use?
BTW I'm new to SSIS... ;-) Thanks in advance!
Hi ,
I have situation where I get data from SRC Flat file and have to load Dimensional table and also fact table, using same data flow(have no other choice since I have to unpivot some src data). Since I have to load both tables in same data flow, I have to have a way to put load ordering constraint (I know informatica allows that). Does any one have any idea on how this can be done in SSIS?
I would be really grateful.
Thanks
While populating dimsnion table the requirement is that the 1st record in all dimension should be " Not Available"
and if the fact table has a null value for that dimension it shouldbe tied to " Not Available" record in dimension
Please help hwo to do both the things.
JJ
I am modelling cube in SSAS. Cube has around 20 dimensions and 6 fact tables. Some of the dimensions are common among the fact tables. e.g. Time dimension. Fact_PNL has 3 date columns for those we have 3 role playing dimensions in the dimension usages.
Another fact table has 5 date columns for them as well we have separate role playing dimensions in dimension usage tab. We have a common dimension Company which is foreign key in all fact tables. We might need to combine the data from multiple facts to get final output.
Should i create 6 role playing dimension for each of the fact table or use the same dimension for all fact tables?
Role playing dimensions should be created when we have multiple columns pointing to the same dimension ?
I have a table that is increasing quite largely each day. By now, I have average 300 million of records over 2.5 years. Before we received our new interface, the data we received was aggregated and thus not that big.The problem is that the table is so huge that I cannot use the Slowly Changing Component. I was thinking about making a temp table where I load the incremental data before I load it into the final data mart table.Based on this temporary table I use a script to compare the temp data with the already existing data in the data mart. However, this requires a compare of each records (300 mil records).
View 3 Replies View RelatedDear all,
Now I create datawarehouse for my client, I have SSIS a lot for ETL process, I a problem that some fact table need to be updatetable and there is a lot of data of this, I need some efficent way to load this data to data warehouse.
I have read your article about SCD in SSIS (Slowly Changing Dimensions in SQL Server 2005).
I think the purpose of SCD for Dimension table. If I have some fact table that need rows to be updatetable can you give me an example, best practice, the efficient way or fastet way to load fact table that can be updatetable?
If you have link or link about this problem please reply my email. Thanks
My datasource from ORACLE and my datawarehouse in mssql2005
Regards,
Hendrik Gunawan
I have a fact table with few flag columns.
What is the best way to bring them to dimension?
Do I need to create dimension(dummy) from fact table for each flag or all flags in single dimension?
Hi All,
I am just curious to know how I can load data from a data warehouse to an Analysis Service Cube (both to the fact tables and dimensions).
Does any body have some way to achieve this?
I appreciate if any body provide me a good material which describe this scenario.
Sincerely,
--Amde
I am working on a model where I have a sales fact table. Each fact record has four different customer fields (ship- to, sold-to, payer, and bill-to customer). I have one customer dimension table that joins to the sales fact table four times (once for each of the customer fields above). When viewing the data in Excel, I would like to have four hierarchies (ship -to, sold-to, payer, and bill-to customer) within Customer.
Is there a way to build hierarchies within my Customer dimension based on the same Customer table? What I want is to view the data in Excel and see the Customer dimension. Within Customer, I want four hierarchies.
I am developing a BI solution on SQL Server 2008 R2 and how to handle multiple referances to the same dimension from a fact table!
Here is the scenario;
Fact_Contracts (# M)
ServiceProvider_CompanyID, Client_CompanyID, Amount_USD
Dim_Company( hundreds)
ID, CityID, ProfessionID, CompanyName
Dim_City
ID, CityName
Dim_Profession
ID, ProfessionName
As u can see there is two company references in my fact table, and the schema is in snowflake. My customer requirements state that the Contracts' amounts can be aggregated/filtered for/by, ServiceProviderCompany, its city/profession or ClientCompay, its city/profession.
First thing came in to my mind is to dublicate whole dimension structure (one for serviceproviders, one for clients), which i thought that there should be another way around?
Hi,
we have a problem with "one-to-many relations between fact table and dimension table". Take the example of table "LOGGEDFLAW" which is related one-to-many to the table "LOGGEDREASON. "LOGGEDFLAW" includes the column "FLAWKEY" and "LOGGEDREASON" includes the column "REASONKEY" and essentiallay the column "FLAWKEY" as foreign key. Now assume that we have the following records in there:
LOGGEDFLAW
1) FLAW1
2) FLAW2
LOGGEDREASON
1) REASON1,FLAW1
2) REASON2,FLAW1
3) REASON3,FLAW2
Now assume, that "LOGGEDFLAW" is the facttable and "FLAWCOUNT" is the measure with the source column "FLAWKEY" in which we want to count the number of FLAWs. As you see in the example the number of FLAWs is 1 for "FLAW1" and "FLAW2". Microsoft Analysis Server generates the value of 2 for the number of FLAWs "FLAW1" because of the one-to-many relationship to the table "LOGGEDREASON". In the attached ZIP File you find :
- a MDB File with the described example
- a screenshot from the cube constructed in AS
- a screenshot from the result table generated with AS.
The question: How is it possible to calculate the measure "FLAWCOUNT" correctly, ignoring the records generated by the one-to-many relationship?
Best regards,
Thorsten
I have picked an exmple from this forum, to help me explain my current problem...
"I'm looking for a solution to import data from a flat file into an normalized data modell. To explain it a little simpler think about to following:
The Data Souce is a CSV-File with FirstName, LastName and Category. Sample data could be
Dirk; Bauer; sailing
Peter; Bauer; fishing
Marc; Bauer; reading
In my data modell I have defined the 2 tables "Person" and "Category":
Table "Person"
----------------
[PersonID] [int] IDENTITY(1,1) NOT NULL
[CategoryID] [int] NOT NULL
[FirstName] [nvarchar](50)
[LastName] [nvarchar](50)
Table "Category"
----------------
[CategoryID] [int] IDENTITY(1,1) NOT NULL
[CategoryName] [nvarchar](50)
Now I like to read my first row from the source and lookup a value for the CategoryID "sailing". As my data tables are empty right now, the lookup is not able to read a value for "sailing". Now I like to insert a new row in the table "Category" for the value "sailing" and receive the new "CategoryID" to insert my values in the table "Person" INCLUDING the new "CategoryID".
I think this is a normal way of reading data from a source and performing some lookups. In my "real world" scenario I have to lookup about 20 foreign keys before I'm able to insert the row read from the flat file source.
I really can't belief that this is a "special" case and I also can't belief that there is no easy and simple way to solve this with SSIS. Ok, the solution from Thomas is working but it is a very complex solution for this small problem. So, any help would be appreciated...
Thanks,
Dirk"
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=74752&SiteID=1
Could someone help me creating the dimension table?
Thanks!!
I have a large flat file that comes to me. I first import the flat data in to a SQL table for ease of use. Then i put it into a more permanent table with the proper references to dimension tables. I want to build a dimension table out of information from my flat file. I have a dimension table with columns, [Org Client], and [Client#] where [org client] is the name of the client. Both of these columns appear in my flat file but i want to use only the client# in my permanent table. How extract distinct values of client # and [org client] into a dimension table?
My idea was to select distinct values of client# and use some type of foreach loop to go through each client# and use a query to select the TOP(1) values of [org client] where client# = x. Would this work and if so how do I go about setting this up?
I'm really hoping there is a simpler way than this. Thank you all for your time.
I have a Fact table that contains several degenerate string values that I have pulled into a Fact Dimension.
When I browse the cube and cut one of the measures by an attribute from the Fact Dimension, I am getting incorrect data.
In other words, when I query the fact table directly via SQL and apply the same filters, I see the data I am expecting to see. But cube browse with same filters yields different results.
How can this happen since the fact dimension has a 1:1 relationship with the fact table.
I do have the Dimension Usage configured properly.
Is this an aggregation thing? Attribute key thing? What am I missing?
I need to copy data from warehouse tables to master tables of different SQL instances. Refresh need to done once in an hour. What is the best way to do this? SQL agent jobs or SSIS packages?
View 3 Replies View Related
Hi,
I have a dimension called 'Caller Type' with the following attributes:
CallerTypeKey ---- surrogate key
CallerTypeID
CallerTypeDesc
CreatedByKey ---- foreign surrogate key from User Dimension
I used Script Task to get the last used key and increment it so i can use it for new records in my dimension. however, my dimension is linked to a User Dimension and I need the surrogate key of that once I insert the new record to CallerType Dimension.
How would I do that?
cherriesh
can someone help me with th best way to look up a date in date dimension and populate the date id in fact.
in the source date is dd/mm/yyyy
and in date dimension columns are date id , year , quarter , month, day
I am sorry for asking such a broad question, but I have been working on this and from what I can gather it can be done. My problem is that much of it has gone right over my head and I am getting more confused the more I read... I'm really, really confused...
Basically, I have a dataGridView that is populated with a number of fields from Table1 (ID, NameID, Status, Phone, Notes). This works fine, BUT I would like to access Table2 and have, where ID in Table2 = NameID in Table1, it load the First Name & Last Name into the dataGridView. I am able to load the information from Table 2 like so: SELECT NameFirst + ' ' + NameLast from Table2", but I can't get both Tables to work correctly.
I would like the dataGridView to be layed out like this:
ID NameID Name (NameFirst + NameLast) Status Phone Notes
I can't for the life of me understand or get this to work (Or for that matter even understand what I am trying to do...
Also, I am using Access 2007.
I would greatly appreciate some help, and possibly some explanation in laymans terms so that I might be able to understand this. I have read a lot about this, but for whatever reason it is just soooooo over my head that I can't follow it whatsoever.
Here is the code as it stands now:
//Populate the DataGridView
string conString = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + Environment.CurrentDirectory + @"DB.accdb;Jet OLEDBatabase Password=MyPassword;";
// create and open the connection
OleDbConnection conn = new OleDbConnection(conString);
OleDbCommand command = new OleDbCommand();
command = conn.CreateCommand();
// create the DataSet
DataSet ds = new DataSet();
// run the query
command.CommandText = "SELECT ID AS [#], NameID AS [Name], Status AS [Status], Phone AS [Phone], Notes AS [Notes] FROM Table1 WHERE ID = " + textBox13.Text + ";";
OleDbDataAdapter adapter = new OleDbDataAdapter();
adapter = new OleDbDataAdapter(command);
adapter.Fill(ds);
// close the connection
conn.Close();
bindingSource1.DataSource = ds.Tables[0];
dataGridView1.DataSource = bindingSource1;
// set the size of the dataGridView Columns
this.dataGridView1.Columns[0].Width = 10;
this.dataGridView1.Columns[1].Width = 100;
this.dataGridView1.Columns[2].Width = 100;
this.dataGridView1.Columns[3].Width = 100;
this.dataGridView1.Columns[4].Width = 176;
Any help and information is greatly appreciated.
Thanks Again,
Say you have a fact table with a few columns that all reference the same key column in a dimension table, you want to write a view to return the information for those keys?
USE MyTestDB;
GO
SET NOCOUNT ON;
IF OBJECT_ID ('dbo.FactTemp' ,'U') IS NOT NULL
DROP TABLE dbo.FactTemp;
[Code] ....
I'm using very small data at the moment, and the query plan and statistics don't really say which way.
In 2000, BCP seemed the way to go. DTS packages would also work. My question is, in 2005, what is the best choice? I seem to remember that BCP ignored all referential integrity constraints, and applying them afterwords was a royal pain. I'm not a BCP expert by any means. Running this at the command line means using the DOS prompt correct?
What is 2005's answer to this?
A few questions:
1) We have numerous fact tables with surrogate keys which reference just one dimensional surrogate key. How does this work?
2) Are the ‘facts’ feeding data TO the ‘dimensions’ (back end warehousing)? Or are the ‘Dimensions’ feeding facts to the ‘facts’ tables for lookups!?
Nb: Im very inexperienced at database design.
Im really also using this thread to get contacts for future harder questions!
Thanks kindly
I have a fact table that has terminations. Fields include EmployeeName, TermDate, TermReason, and HireDate, et al.
I need to make EmployeeName available to drillthrough, and since it's a varchar field I can't make it a measure, so it has to be a dimension attribute. My question is, should I leave the fact table as it is and use SSAS to create a dimension that contains only EmployeeName and the link to TerminationID? Or should I redesign the OLAP tables so that EmployeeName is in a separate table?
Hi,
I have a situation where I have 4 tables:
1. 2 Dimensional tables(Parent), DIM1 with 50000 rows, and DIM2 with 1000 rows
2. Fact 1 with 50 columns, 25 Million rows and with FK to DIM1 and DIM2
3. Fact 2 with 40 columns, and 25 Million rows and with FK to DIM1 & DIM2 tables.
Actually the fact 1 and fact 2 have same related data but since our Analysis cube person wanted the fact table not to have more than 50 columns we divided the tables into 2, but they have the same compound key.
Above said, I have a situation where I have to select all the columns, in both fact tables, and do a group by. I wrote the query and ran "Analyze Query in the Database Engine Tuning Advisor" for it. It gave bunch of recomendations about the statistics and indexes which I created. When I executed the query the result came up in matter of seconds, which was good.
In the query I had a condition having MarketName='Bridgeview' and DateID = 344 (FK of today-1).
When I wanted the data for last 30 days I changed to DateID in ( > FK of today -32 and < FK of today), the query responded and worked fine.
But when I changed the query to get MarketName='Aurora' (other than I used when I ran Tuning Advisor), the result returned is empty set. When I removed the MarketName condition, it is supposed to return all markets' data, but it returns only Bridgeview data.
I know the data is in the table for all markets, since reports are rendered from these fact tables for all of these markets(also ran queries to check the fact table data).
I am unable to point out the reason why the query behaves like this. It responds to the date change, but not to the MarketName change.
I really appreciate if anyone can help me point out the problem.
Thanks,
Venkat
Hello Everybody,
Can anybody please let me know the procedure for loading data into Dimensions & Facts? And what is the sequence of loading?
Thanks in Advance
Rajesh