Hi
I am working as datawarehous architect with a large concern and i designed a datamart witha fact table surrounded by 5 dimension tables. My PM who do not know datwarehousing has abruptly changed my design and instead of 5 dimension tables has increased to 17 tables which just 1 column each in them.
This is hell of a design because if it is a single column dimension then there can not be any hierarchy in the dimension table , better will be to push this column as a fact.
What should I do now. Moreover he has asked me to not to use SSIS and code everything in stored procedure.
Please guide.
Jigjan
I have added a Slowly Changing Dimension transformation to an SSIS package and have launched the Wizard to edit it. After selecting the source (a SQL Server 2005 instance), if I select a very large table (9+ million rows), I'm encountering two strange behaviors: 1. The wizard hangs for several minutes before displaying the columns from that table. 2. The wizard does not display the primary key column. This, of course, is the column I want to designate as the "business key", but can't because it's not displayed. I know this is more like a fact table than a dimension, but this is not a data warehouse. This is just a very large table, and I need to update a field in certain records based on the contents of a source text file. Is there another transformation I should use to perform updates?
When i add a dimension to the cube dimension without any relation in my dimension usage to any measure group my units are going down.However when i remove the dimension from the cube am getting the correct values.
I have been working with DTS and ETL in data warehousing projects for several years and my question is this. You can only update a dimension column with SSIS by using TSQL-update statements.
There is no way to do this except issuing TSQL from the control flow or the data flow?
This subject is not mentioned in Wrox SSIS book nore in Kirk Haseldens book.
When you run the SCD task in the data flow you will get an OLEDB command that actually do this, issue a TSQL-statement.
I am working on a model where I have a sales fact table. Each fact record has four different customer fields (ship- to, sold-to, payer, and bill-to customer). I have one customer dimension table that joins to the sales fact table four times (once for each of the customer fields above). When viewing the data in Excel, I would like to have four hierarchies (ship -to, sold-to, payer, and bill-to customer) within Customer.
Is there a way to build hierarchies within my Customer dimension based on the same Customer table? What I want is to view the data in Excel and see the Customer dimension. Within Customer, I want four hierarchies.
we have a problem with "one-to-many relations between fact table and dimension table". Take the example of table "LOGGEDFLAW" which is related one-to-many to the table "LOGGEDREASON. "LOGGEDFLAW" includes the column "FLAWKEY" and "LOGGEDREASON" includes the column "REASONKEY" and essentiallay the column "FLAWKEY" as foreign key. Now assume that we have the following records in there:
Now assume, that "LOGGEDFLAW" is the facttable and "FLAWCOUNT" is the measure with the source column "FLAWKEY" in which we want to count the number of FLAWs. As you see in the example the number of FLAWs is 1 for "FLAW1" and "FLAW2". Microsoft Analysis Server generates the value of 2 for the number of FLAWs "FLAW1" because of the one-to-many relationship to the table "LOGGEDREASON". In the attached ZIP File you find :
- a MDB File with the described example - a screenshot from the cube constructed in AS - a screenshot from the result table generated with AS.
The question: How is it possible to calculate the measure "FLAWCOUNT" correctly, ignoring the records generated by the one-to-many relationship?
In my data modell I have defined the 2 tables "Person" and "Category":
Table "Person" ---------------- [PersonID] [int] IDENTITY(1,1) NOT NULL [CategoryID] [int] NOT NULL [FirstName] [nvarchar](50) [LastName] [nvarchar](50)
Table "Category" ---------------- [CategoryID] [int] IDENTITY(1,1) NOT NULL [CategoryName] [nvarchar](50)
Now I like to read my first row from the source and lookup a value for the CategoryID "sailing". As my data tables are empty right now, the lookup is not able to read a value for "sailing". Now I like to insert a new row in the table "Category" for the value "sailing" and receive the new "CategoryID" to insert my values in the table "Person" INCLUDING the new "CategoryID".
I think this is a normal way of reading data from a source and performing some lookups. In my "real world" scenario I have to lookup about 20 foreign keys before I'm able to insert the row read from the flat file source.
I really can't belief that this is a "special" case and I also can't belief that there is no easy and simple way to solve this with SSIS. Ok, the solution from Thomas is working but it is a very complex solution for this small problem. So, any help would be appreciated...
I have a large flat file that comes to me. I first import the flat data in to a SQL table for ease of use. Then i put it into a more permanent table with the proper references to dimension tables. I want to build a dimension table out of information from my flat file. I have a dimension table with columns, [Org Client], and [Client#] where [org client] is the name of the client. Both of these columns appear in my flat file but i want to use only the client# in my permanent table. How extract distinct values of client # and [org client] into a dimension table?
My idea was to select distinct values of client# and use some type of foreach loop to go through each client# and use a query to select the TOP(1) values of [org client] where client# = x. Would this work and if so how do I go about setting this up?
I'm really hoping there is a simpler way than this. Thank you all for your time.
While populating dimsnion table the requirement is that the 1st record in all dimension should be " Not Available"
and if the fact table has a null value for that dimension it shouldbe tied to " Not Available" record in dimension Please help hwo to do both the things. JJ
I can manipulate this in a pivot table for reporting however I need this to be a table for reporting by filterable searchable information. What can I do short of defining each field with case logic, or union queries.
Now I create datawarehouse for my client, I have SSIS a lot for ETL process, I a problem that some fact table need to be updatetable and there is a lot of data of this, I need some efficent way to load this data to data warehouse. I have read your article about SCD in SSIS (Slowly Changing Dimensions in SQL Server 2005). I think the purpose of SCD for Dimension table. If I have some fact table that need rows to be updatetable can you give me an example, best practice, the efficient way or fastet way to load fact table that can be updatetable? If you have link or link about this problem please reply my email. Thanks My datasource from ORACLE and my datawarehouse in mssql2005
What is the best way to move data from Online system tro data warehouse? I have created 3 dimension tables(product,date and customer tables) and I wanna create fact table and get foreign keys from dimension tables. What is the best method to do that in SSIS?
I have date and float in attribute in a dimension table. If deploy my cube and I try to create a report with RS, I can't format this data. The value seems to be a String so I have to do a CDate or a CDbl before formating it. I have no problem when I try to format my measures.
I have a situation with a table that was created for a transactional
system with a 3 columns key. The table is similar to the following:
countrystatecitydescription 11221City A from country 1 and state 12 11321City A from country 1 and state 13 21422City B from country 2 and state 14 21522City B from country 2 and state 15
Now I'm trying to create a dts package that would allow me to build a
city dimension table with unique codes (keys) for each city. What kind of
transformation should I use to translate the old codes (based on the
country-state-city key) into the new ones and preserving the data
Is it possible to design a Cube with a Dimension "DimEmployee" (SCD Type2) and Query the following: What "title" had the person "xyz" at time "2015-01-01"? - See Example data on Images.
I have also a DimTime with Day granularity. The DimEmployee is not at day granularity.
I have some troubles getting done a lookup... data i get from a legacy system has to be cleaned up and split up into normalized form. i get invoice data and want to split up invoices into a separate table as customers. therefore i lookup if the customer of the current line is not already in my newly created customer-table. if it isn't there (error output of lookup transformation) i insert it into the customer-table.
The problem is that after the customer is inserted into the customer-table, it still is not found by the lookup transform because the lookup uses caching to hold the reference table (and so is the exact same customer inserted again and again and again). Is there any way to disable caching and let the lookup transformation do a select for every new row it gets? or at least refilling the cache if some event happens?
The scenario is the data comes from various sources and its staged into staging database. From this staging database it goes into data warehouse database. Everyday this staging database is truncated and repopulated from various sources. I've a dimension table called DimCustomers which consists of around 300,000 rows and has lots of different types of SCD columns. It takes around 4-5 hours to load data from staging to this dimension table. Currently I'm using a For Loop container which uses a store proc to extract 15000 rows each time and populate my dimension tables. First couple of loops it goes off quickly but as and when the number reaches half of the count it slows down and hence it takes around 4-5 hours to load data.
What would be the best approach to populate this kind of dimension table.
I want to create a table that has a datetime column, data type=datetime and I want the date to start form 01/01/2004 00:00:00 to 01/01/2008 00:00:00 the increment will be by minute like
01/01/2004 00:01:00
01/01/2004 00:02:00
01/01/2004 00:03:00 and so on up to 01/01/2008 00:00:00
I will use this time dimension in bussiness objects.
Has anyone written or come across a routine that validates start and enddate in a slow changing dimension? (eg verifies there is no overlap in any of the records).
Somebody suggested I use the slowly changing dimension object to keep a table current between servers and databases.
I have two databases servers. One 2000 and the other 2005. I was told because security reasons I should not link them
The two tables are not identical. The destination table only has a handle of the columns and in some cases columns are not the same size.
Once a day. When new rows are introduced in the source table, I'd like to introduce those rows into the destination table. When rows are modified, I'd like to apply changes to the distination table.
I started playing with the slowly changing dimensions object, but am frankly confused connected the dots. Will I have an output oledb object for each command object?
Am i using the right the object for objective?
Are there any good tutorials or videos out there that will cover this?
I created a Fact Table with 3 Keys from dimension tables, like Customer Key, property key and territory key. Since I can ONLY have one Identity key on a table, what do I need to do to avoid populating NULLs on these columns..
As u can see there is two company references in my fact table, and the schema is in snowflake. My customer requirements state that the Contracts' amounts can be aggregated/filtered for/by, ServiceProviderCompany, its city/profession or ClientCompay, its city/profession.
First thing came in to my mind is to dublicate whole dimension structure (one for serviceproviders, one for clients), which i thought that there should be another way around?
Bitmask fields! I am capturing row changes manually via a high frequency ETL task. It works effectively however i am capturing the movement of multiple fields. A simple example, for Order lines, i have a price, a discount and a date. I am capturing a 001, 010, 100 respectively for each change.
I would like my users to be able to select from a dimension which has the 3 members in it and they can select one, multiples, or all values (i.e. only want to see rows that have had the date and price changed).
Obviously if i only had 3 columns i would use bit's and be done with it, i have many different values (currently around 24 and growing).
I have SSAS 2008 Enterprise sp2 (not R2) running on a production server. One of the dimensions is defined as real-time Rolap - set to clear the cache whenever the table changes.
I placed triggers on that table to be sure, so I know no changes occur on that table sometimes for 10 or 15 minutes at a time - and yet I get 100's of change notifications (I noticed them on a profiler session). It seems as though the notifications come as often as the dimension gets queried.
I think this is causing one of my longer running queries to fail because of locking issues. I assume this is because the queries are getting killed when they're in middle of using an outdated cache.
The strange thing is that I cannot reproduce this on the dev servers. The main difference between the dev and production servers is that the production servers are behinfd a cluster. I don't know if that could make any difference.
I really need to know why I'm getting so many notifications.
For example,I have a table "authors" with a column "author_name",and it has three value "Anne Ringer,Ann Dull,Johnson White".Here I want to create a new table by using a select sentence,its columns come from the values of the column "author_name".
can you tell me how can I complete this with the SQL?
My question here is as the dimension has SCD type 2 on it and every time when there is a change the persn_key gets a new key value but the fact table still points to oldest key.how to update the surrogate key on fact table to the current key value? As per the requirement fact surrogate key must be pointing to current active record on the dimension.
We are trying to do some utilization calculations that need to factor in a given number of holiday hours per month.
I have a date dimension table (dimdate). Has a row for every day of every year (2006-2015)
I have a work entry fact table (timedetail). Has a row for every work entry. Each row has a worked date, and this column has a relationship to dimdate.
Our holidays fluctuate, and we offer floating holidays that our staff get to pick. So we cannot hard code which individual dates in dimdate as holidays. So what we have done is added a column to our dimdate table called HolidayHoursPerMonth.
This column will list the number of holiday hours available in the given month that the individual date happens to fall within, thus there are a lot of duplicates. Below is a brief example of dimdate. In the example below, there are 0 holiday hours for the month of June, and their are 8 holiday hours for the month of July.
I have a pivot table create based of the fact table. I then have various date slicers from the dimension table (i.e. year, month). If I simply drag this column into the pivot table and summarize by MAX it works when you are sliced on a single month, but breaks if anything but a single month is sliced on.
I am trying to create a measure that calculates the amount of holiday hours based on the what's sliced, but only using a single value for each month. For example July should just be 8, not 8 x #of days in the month.
Listed below is how many hours per month. So if you were to slice on an entire year, the measure should equal 64. If you sliced on Jan, Feb and March, the measure should equal 12. If you were to slice nothing, thus including all 15 years in our dimdate table, the measure should equal 640 (10 years x 64 hours per year).
Need some help building a query that does the following :
I have 2 Time Dimensions ; Time (Transdate) and ClosedDate (ClosedDate)
In my report/query, if [Time].CurrentMember = [Time].[YMD].[YMD].[2006].[200610].[20061031] I want to FILTER out all ClosedDate < [ClosedDate].[YMD].[YMD].[2006].[200610].[20061031]
Both Time Dimensions are Year -> Month -> Day and have the same Members.
I have every option available, using calculated Members and/or Measures to do this.
The report I'm creating is Aging of Receivables : Balance / 30 days / 60 days / etc.. But for the Aging, I need to filter like explained above.