Star Schema Design For Company Headcount With Headcount Targets
Nov 16, 2007
I am cutting my teeth on star schema design. I have a simple star schema I am building for Headounct analysis at work. I have a factless fact table where a row represents a head in the company. Each head is toed to a particulat week in a Date dimension tabel. There are additional dimensions for things like gender, ethnicity, marital status, age, etc. Now in my department dimension - it's hierarchical. In the DimDepartmnet there is a department which belongs to a company. Comapnies belong to divisions. Now the fun part. Each division has a headcount target for each year. Up to this point I am in a perfect star schema (no snow flaking). How would I integrate in this concept of a headcount target for each division for a given year?
We are using cognos on top of this star schema to provide reporting and analysis services if that is relevant. From the Star Schema design stand point... any thoughts?
Hi all I have the following table schema and their details as follows:
Code Block CREATE TABLE #Tbl_Project ( ID varchar(40), Name varchar(50) ) GO CREATE TABLE #Tbl_User_Project ( ID int, ProjectID varchar(40), Role varchar(20) ) GO INSERT INTO #Tbl_Project VALUES('PRJ1','Project 1') INSERT INTO #Tbl_Project VALUES('PRJ2','Project 2') INSERT INTO #Tbl_Project VALUES('PRJ3','Project 3') GO INSERT INTO #Tbl_User_Project VALUES(1,'PRJ1','PM') INSERT INTO #Tbl_User_Project VALUES(2,'PRJ1','TL') INSERT INTO #Tbl_User_Project VALUES(3,'PRJ1','TM') INSERT INTO #Tbl_User_Project VALUES(4,'PRJ2','PM') INSERT INTO #Tbl_User_Project VALUES(5,'PRJ2','TL') INSERT INTO #Tbl_User_Project VALUES(6,'PRJ3','PM') INSERT INTO #Tbl_User_Project VALUES(7,'PRJ3','TL') INSERT INTO #Tbl_User_Project VALUES(8,'PRJ3','TL') GO --Required Output --ProjectID ProjectName PM:TL:TM Ratio HeadCount --PRJ1 Project 1 1:1:1 3 --PRJ2 Project 2 1:0:1 2 --PRJ3 Project 3 1:0:2 3 DROP TABLE #Tbl_Project,#Tbl_User_Project
This is just sample of the values. The ratio obtained should be as follows: Assuming for a Project 1 there are 2 - PM's , 6-TL's , 15 - TM's then the ratio should be 1:3:7.5 . So please help me in this regard.
I have tried the following out and would appreciate feedback from experienced users regarding if the following is a good/bad approach:
After bring all the data in my Data Mart, I have created a view which has all the data in a big flat table (totally unnormalized). Then based on this BIG FLAT UNNORMALIZED VIEW :) I have created my various dimensions using the 1st option i.e. Star Schema.
Based on the little testing that I have done, I seem to be getting the correct results across various dimensions... However, can someone kindly comment on this approach and the pros/cons.
We have an OLTP database and operational reporting is carried out on a replica server / database. We have plans to build a new data warehouse and an analysis services cube.
Question 1:Should a cube be designed to extract data from a physical star schema rather than a logical one (3NF relational (ODS?) using a data source view to derive the star)? I'm guessing for performance it's better to pull data from similar structures (physical facts and dimensions as required by analysis services) but is the difference significant?
Question 2:Depending on the answer to q1, is it bad practice to ETL data from a staging database (replica > staging) directly to a star schema (multiple data sources and cleansing / business rules required)? Or should it be processed from staging to an ods and only then to a star schema (physical or logical). I still don't know if an ODS is required but I guess the consideration for this decision is whether the business would require daily operational (or ad hoc) reporting on the consolidated data sources (without needing historical DW functionality).
Let me explain what I'm trying to do and see if I can get any suggestions. I have some tables in a database to track "Required Reading" so that users who login must read documents by a certain date.
My first idea was to have a table called Reading that would store each UserID and every DocID from the Documents table. Then when the User logged in I would do a SELECT * FROM Reading WHERE UserID = 'User1' to filter down to all of the documents for the current user. So if I have 10 users and 10 documents then I would have 100 records in the Reading table.
The only problem with that is that every user logged in would have the Reading table open at the same time but no two users should ever be editing the same record. The user would have an exclusive recordset based on his UserID. I just don't know if this would cause a conflict.
The second idea is to have the same Documents table but when a user logs in, create them a unique table. So if User1 logs in I would check to see if a User1 table exists and if not create it. The same for User2 etc.
User1 ------------------ DocID nvarchar(16) (PrimaryKey) Read bit ReadDate smalldatetime
User2 ------------------ DocID nvarchar(16) (PrimaryKey) Read bit ReadDate smalldatetime
Can you tell me how to represent the problem below?
I want to be able to store different types of information for products in different categories.
So Category A maybe TV's and category B maybe Sofa's and Cateogry C maybe Car. There will however be some common data.
So for a TV I may want to store screen size etc.. and for a Sofa if its leather and for cars maybe whethers its a Saloon, 4x4 etc.. The common data maybe Manufacturer, Price etc..Obviousely each peice of info will be of a different data type.
How best do I represent this in a database because I will have lots of categories of products and the products will be used for filtering too. So I need to create a solution which is not too costly on performance.
To give you an example of what I'm intending on doing..lets look at desktop computers which will have the following product specific info to filter on: http://computing.kelkoo.co.uk/ctl/do/compareProducts?back=%2Fc-111801-desktop-computers.html&catId=111801&pid1=18445816&pid2=12461415 and for TV's we have http://audiovisual.kelkoo.co.uk/ctl/do/compareProducts?back=%2Fc-100311823-lcd-tvs.html&catId=100311823&pid1=18052959&pid2=18704336
or http://shopcompare.eu/ash/search.php?phrase=GPS and http://shopcompare.eu/ash/search.php?phrase=palmtops&cid=28
Thanks in advance, any guidance even to any online tutorial would be appreciated.
I wonder instead of just brainstorming, there probably isa very standard and a simple way to do database schema design.let's say we are doing a website. the user can go overand type in the movie name and zipcode, and the websitewill return all the theaters showing that movie and at whattime, for theaters in THAT zipcode only (for simplicity).so how do we just start and use a standard method thatcan be simple and very accurate to ensure good tables design?
QUESTIONs:What schema is the best for high speed search for a classified web application? Is our schema design looks OK? It is a STAR schema and will be used for OLTP type app. Is this OK? or Are we missing something? Please let me know if you need more diagramatic description. BACKGROUND:Our group is making a classified website (like classified.yahoo.com) where people can place online ad to sell items. like cars, computers, electronics etc. Users will fill out webform for each category(car or computers) with all attributes of the item forsell to post an add.
Main 3 operations the web users will perform: Quick Searching(most frequent): category=car, subcategory=sedan, country=USA city=LosAngeles Zip=empty Advanced Search(less frequent): User can include all fieldsattributes or then can user a subset to query to do advanced search like : category=car, subcategory=sedan, country=USA, city=Los Angeles, Year= in(00, 03), transmission=Auto, engine=V6, Maker=Honda, Model=Accord, color=Red and Price < 10000 category=car, subcategory=SUV, country=USA, city=NY, Year= in(99, 00, 03), transmission=Auto, Maker=Toyota, and Price between 11000 and 14500
Insert(least frequent): By filling out web form. For Car, the form will have different dimensionsattributes (year, make, model, transmission, mileage, color, price etc)
Current Schema design: Set of core dimensionlookup tables: stores corecommon attributes for lookups eg. status={open, new, expired}, country={USA, Canada, India,...} One custom Lookup table: for all custom attribute lookup: transmission{auto, manual}, engine type{V4, V6} for car, processor{PII, PIII, PIV), RAM(512MB, 1GB, 2GB} for Computer & so
3 Fact Tables: Main factPivot table that stores all sell common attributes eg. price, title, year_made, post_date, expire_date, user_name, description etc. One fact table to store custom string, int, float, date field values of ads. One fact table to store custom dropdown field selection values
Concerns and issues: Looking at the schema, it seems to be a STAR schema with multiple fact tables where all core lookup tables connected to the main Pivot table and custom lookup table connected to the 2nd and 3rd fact table.
Quick search only queries the Pivot fact table. While Advanced search query requires to join 3 fact tables. Both query requires to join 3 fact tables with all dimension tables(15 to 20 each having avg of 20 values) to get the look up names so that users sees text instead of ids. Search speed is the Main concerns. Insertionupdate speed doesn't matter that much as that is less frequently done.
Here is the current schema. product_category - product_category_id - other fields, ... product - product_id - product_category_id - ....other fields
Now, for each product type, it has a distinct set of specifications. For example: Books type, will have author, publisher, pages, etc. Apparels type will have color, size, materials, etc. I don't think I can put all those specifications into one table - the product table above. How to design the schema to store product specifications? Should I create a separate spec table for each product type, such as book_product_spec, apparel_product_spec, electronics_product_spec, etc.? Please advise. Thanks.
I get to page 116 in "Data Mining with SQL Server 2005", but when I try to deploy my model, I get these errors:
Error 1 Error (Data mining): The 'Movie Bayes' mining model cannot have more than one predictable attribute. 0 0
Error 2 Errors related to feature availability and configuration: The 'Multiple prediction targets' feature is not included in the 'Standard Edition' SKU. 0 0
Hello, we have some tables and views that are not assigned to the standard schema of dbo.Whenever I try to use a design time control and/or SQLDataSource connection control, it does not like objects other than those using the dbo. schema.Any ideas on how to resolve this issue?For example, her are a couple of views:dbo.vwCustomersdbo.vwCustomerDetailsord.vwOrdersord.vwOrderDetailsthe views with dbo.* I have no problem with.the views with ord. it tells me the view(s) do not exist.Thanks in advance!
in simple words it's about versioning at record level.ExampleTableEmployee - EmployeeId, EmployeeName,EmployeeAddress, DepartmentId,TableDesignationMap - EmployeeId, DesignationId, EffectiveDate,validityTableDepartment - DepartmentId, DepartmentTableDesignation - DesignationId, designationVia Modify-Employee-Details screen following are editableEmoyeeNameEmployeeAddressDepartmentDesignationthis screen should allow user to navigate through changes history.Example :Version -1EmoyeeName John SmithEmployeeAddress 60 NewYorkDepartment AccountsDesignation AccountantVersion -2EmoyeeName John SmithEmployeeAddress 60 NewYorkDepartment AccountsDesignation Chief Accountant - changedVersion -3EmoyeeName John SmithEmployeeAddress 60 NewYorkDepartment Sales - changedDesignation Marketing Manager - changedQuestion :What is the best proposed database design for maintaining historyrecords bound with version and retrieval techniqueBest RegardsSasanka
I'm using SQL Server 2008 R2 Management Studio. I have a view that I'd like to index. Obviously, that requires that the view be schema bound to the underlying table. I have found myriad explanations for how to programmatically create a schema bound view, but I've never created a view like that before, and since I'm more a mechanical engineer than a database manager, I'd like to be able to do it the 'easy' way, by just creating the view in Management Studio by going to the database, right clicking on 'Views', clicking on 'New View...' and then, hopefully, checking some box somewhere that schema binds the table to the view.
Hi There,This is related to a ms access database but since I use the SqlDataSource control I thought I should post here.I have a project that I was working on with this ms access db and using sql controls, everything was working just finesince one day I started getting "Object reference not set to an instance of an object" messages when I try to designa query or retrieve a schema, nothing works at design time anymore but at runtime everything is perfect, its a lotof work for me now to create columns,schemas and everything manually, I've tried reinstalling visualstudio, ado componentsbut nothing seems to fix it, did this ever happen to any of you guys?any tip is really appreciated thanks a lot
My clients are not intersted in using Auto generated keys. They are also get data from many sources they would like to use something like customerID. Dose anybody know of any reason why we should not do that? Also they are concern about sql server not being able to handle the datawarehouse in future because they are expecting it to grow in terabites. Dose anyone have advice on that? I was thinking of putting the fact table on a different file group don't if it will help.
I am re-engineering the data warehouse and my client is currently using autogenerate keys, their concern is that after a certain amount of keys (can't remember the figure) sql server starts having problems, dose anyone know how i should handle it when i am doing the designing? thanks any input will be appreciated
I'm designing a DW, and i have some doubts relative to the Distributed Transaction when modeling a star schemma.
My problem is: I have a main dtsx package in wich i call all the child packages in order to create the (Fact and Dimension Tables).
(1) First i have several child packages that create and populate all the Dimension Tables (with the latest values from the relational DB).
(2)Then i have several child packages that create all the fact tables, in this process i use the surrogate keys from the dimension tables (obtained in step 1).
The problem here is , " How do i use the multiple transaction ?" , if i put a "required" Transaction Option on the parent package, then after calling the child packages that creates the dimension tables. The values are not commited, so they are not available when i later execute the childs packages related with the fact tables.
How can i use transaction when modelling a star schemma, in order to have a full roll back or a full commit in all tables (Dimensions and Fact Tables).
we are just starting to do some testing on sql server EE with dimensional models.....we have had one or two problems we have been able to solve using the new peformance dashboards etc.
However, as is inevitable, we are seeing strange behaviour of a query....in a star join it seems to be doing an eager spool and trying to spool the entire fact table to tempdb....hhmmm....
Rather than ask one question at a time.....we have DBAs who went to classes etc at MSFT and the client is some level of MSFT partner.
Could anyone point me to the best documentation for understanding the optimiser and how to influence it to get it to do the right thing in optimising plans for star joins?
So this has got to be considered a major, major flaw in how SSRS interacts with Oracle. I'm using the "Oracle" data provider, but I've also tried using Microsoft's OLE DB data source, and some others, and in no case does SSRS hand off to Oracle a query that does NOT have bind variables. In other words, typically query parameters get passed off to Oracle as bind variables.
The incredibly major problem that this causes is that it disallows Oracle's use of star transformation queries which is the primary method by which to get fast responses to a data warehouse/star schema, in fact a prime authority on this subject (Bert Scalzo, Oracle DBA Guide to Data Warehouse and Star Schemas, p.86 -- obvioulsy not using Oracle 7x was the first) lists it as in effect the #1 consideration.
So what gives? In effect SSRS cannot be used against large scale Oracle data warehouses? I've had success with Business Objects being able to access Oracle star transformations.
So a guess my question is how the heck can use SSRS in a big, Oracle-based data warehouse?
For star_transformation join plans, the following parameters must also be considered: ... No BIND VARIABLE in SELECT statement
http://www.orafaq.com/usenet/comp.databases.oracle.server/2003/09/28/2305.htm Star transformation is not supported for tables with any of the following characteristics: * Queries that contain bind variables
I have a company table and a address table. I have created a linked table with two foreign keys from the company table and the address table respectively.
This set up allows me to apply more than one address per company which is fine. My problem is that I need a way to prevent a company from entering exactly the same address twice.
In the address table i have the following fields:
AddressKey - PK CompanyKey - FK Address 1 - 3 Town County Country PostCode
In the company table I have the following fields.
CompanyKey - PK FirstName Secondname
And finally in the link table I have the following:
CompanyKey - FK AddressKey - FK.
Now, if i entered the following into the address table Assuming that company id of 1 was already entered into the company table.
Dear All,How to store a separated company-calendar for each client-company (about500).Company-calendar: workingdays, non-workingdays, meetings, etc.Thanks,Filips
We deal with multiple vendors who provide us information via text/xml files. Vendor A may provide financial data, vendor b provides litigation data, vendor c provides ratings data. Our current structure has databases for each vendor with its own company table which basically makes all this data disconnected. Of course each vendor has its own proprietary company id to make records unique.
All of the data is based on companies so the grain of data would be at a company level. I would like to be able to link this information together by creating a dimensional model that has a single company table (DimCompany) and has facts populated based on the type of data we receive. Would this be the right sequence of events?
1. My initial load (historical) would have to look at all these data sources and create one company record in my DimCompany table. This table would then link to all other fact tables to provide a single view of company info. I would imagine this would have to be a fuzzy lookup since one company will be in all sources.
2. On subsequent loads (incremental) I would probably have to do a lookup of companies in the dimension via the proprietary code and add if the company wasn't there.
Any advice on tackling this issue would be greatly appreciated especially if SSIS was used in the process.
Locally I develop in SQL server 2005 enterprise. Recently I recreated my db on the server of my hosting company (in sql server 2005 express).I basically recreated the tables and copied the data in it.I now receive the following error when I hit the DB:The 'System.Web.Security.SqlMembershipProvider' requires a database schema compatible with schema version '1'. However, the current database schema is not compatible with this version. You may need to either install a compatible schema with aspnet_regsql.exe (available in the framework installation directory), or upgrade the provider to a newer version.I heard something about running aspnet_regsql.exe, but I dont have that access to the DB. Also I dont know if this command does anything more than creating the membership tables and filling it with some default data...Any other solutions/thought on what this can be?Thanks!
I have a database that allows a client to have multiple addresses, could be a home, business, vacation, etc... I'm trying to deside if I should put the Company Name and Title fileds in the tblClientAddresses table or keep it with the tblClient table. We do mailings to clients and if the mail is going to a home address the company and title should be excluded. If I put them in the tblClients table I really have no way of knowing if we are mailing to a home or a company. If i put the company and title with the address it makes sence that this would solve the issue.
Are there any forseeable problems with only moving one of my company's departments SQL database server to sp2 and leaving the others on sp1? From an admin tools perspective? Our internet groupd would like to move on to sp3 and the others would like to wait?
My Name is Nick Soutter, I am the owner of a small game company, Aepox Games(We're in the middle of a name change from "Lamar Games"),www.lamargames.net.Our first commercial game, Andromeda Online (www.andromedaonline.net) isgoing into beta soon. It runs on an evaluation edition of SQL Server 2000(our intention is, when it launches, we earn the money to buy a copy beforethe evaluation expires).We have been testing Andromeda Online, and found that saves to the databasetake about 10 seconds (we were anticipating less than 1). We felt we needsomebody experienced in optimizing sql databases to help us optimize thedatabase, and get it running in the best method for our particularapplication.Our program accesses the database in Java, and people with understanding inhow to optimize java connections would be a tremendous help.My company is small, and we honestly cant afford much. Everybody on thisproject, from the sound guys to the graphic artist, has worked for 1/10 to1/100 of the value of the job. We're simply a starting company looking fordedicated people who are willing to work more for credit than money.We can offer credit on our website(http://www.andromedaonline.net/credits.html) to anybody who helps us, butlittle more (maybe $100, but we're very over budget, and in desperate needof help). Because of how we intend the game to run (with maybe 100-200concurrent games running online), a 10 second save time is simplyunacceptable.Anybody who would be willing to help us, please send a resume toJoin Bytes!. Experience would be nice, but not a requirement.We're looking for someone who can talk with our programmer about the typesof calls made to our SQL database, and then can log into the DB and optimizeit to run as fast as possible considering our specific needs.Thank you for your time.Nick SoutterAepox (Lamar) Games
I plan to use Visual Web Developer 2005 Express Edition create web application which access SQL database(Express Edition). Is there any license require for this 2 software if I use its in company? Please provide information to me. Thank you!