Split Wide, Denormalized Table Into Normalized Structure
Aug 27, 2002
Thanks for reading.
This is pretty long, hopefully it isn't rambling.
I'm building a system that imports data from several source, Excel files, text files, Access databases, etc. using DTS. The entire process revolved around MS SQL Server, by the way.
I figured I would create denormalized tables that mirror the Excel and flat files, for example, in structure, import data to those, clean up and remove duplicates there, then break those out into my normalized table structure later.
Now I've finished the importing part (though this is going to happen once a week) and I'm onto breaking up the denormalized tables.
I'm hesitating because I'm not sure I've made the best decisions in terms of process, etc.
I've decided to use cursors to loop over the denormalized tables and use batch insert statements to push data out to the appropriate tables.
Any comments? Suggestions? All is welcome.
I'm specifically interested in hearing back on the way I've set up the intermediate, denormalized tables and how I'm breaking them up using cursors (step 2 of the process below). Still, all comments are welcome. As are suggestions for further reading.
Thanks again...
simplified example
(my denormalized tables are 20 - 30 colums wide)
denormalized table:
===================
name, address, city, state, cellphone, homephone
I'm breaking up the denormalized tables like this (*UNTESTED*):
=================================================
DECLARE @vars.... (one for each column in my normalized table structure, matching size and type)
DECLARE myCursor CURSOR
FAST_FORWARD FOR
SELECT name, address, city, state, cellphone, homephone
FROM _DNT_myWideTable
INTO
WHILE @@Fetch_Status = 0
BEGIN
-- grab the next row from the wide table
FETCH NEXT FROM myCursor
INTO @name, @address, @city, @state, @cellphone, @homephone
-- create the person first and get the ID with @@IDENTITY
INSERT INTO tblPerson (name) VALUES (@name)
SET @personID = @@IDENTITY
-- use that ID to coordinate inserts across other tables
INSERT INTO tblAddress (FK_person, address, city, state, addressType)
VALUES(@person, @address, @city, @state, 'HOME')
INSERT INTO tblContact (FK_person, data, contactType)
VALUES(@person, @cellphone, 'CELLPHONE')
INSERT INTO tblContact (FK_person, data, contactType)
VALUES(@person, @homephone, 'HOMEPHONE')
I am pretty new to SSIS, so please excuse me if this is a trivial question.
I have a denormalized database table in an Access database that I need to import into several different tables in a SQL 2005 database. You can think of the Access table as a CustomerOrders table. For example customer related information (i.e. CustomerName, CustomerID, etc...) is repeated with each record in the Access table. When this data gets moved to the SQL 2005 database, I need to insert one record for each distinct CustomerName/Customer ID record into a Customers table. I then need to insert and link every "Order" record into an "Orders" table.
I am sure that this is probably a pretty common task, but I have not found any examples or articles explaining this particular situation. What ways can this be done?
I was thinking I need to loop through each DISTINCT Customer record in the Access (source) table and insert a Customer record into the destination database's Customer table. I would then need to iterate through each row of the Access (source) table and "Lookup" the appropriate CustomerID/Key Field and insert an "Order" record.
The Access table contains over 75,000 rows of data. I am looking for the most appropriate way of doing this with SSIS (so that I don't have to write a custom application to do this!). Any help, input, links, articles, etc. is appreciated!!
There will only be a max of 3 Country entries for each Employee. So I want to select the EmployeeId and get the three CountryIds so it would look like this:
I have a wide fact table that I'm feeding to an SSAS cube. I was advised that splitting the measure group into two will improve performance when querying the cube.
I cannot find any documentation that supports this, in fact I get a blue curved line suggesting that I merge the measure groups since they have the same dimensionality and granularity.
I guess the best practice is what the blue line states, but without knowing the internals of SSAS I can undestand that a smaller measure group may be easier to handle, or create more specific aggregations for.
I have a de-normalized table that I need to export to XML using For XML, but put all of the related rows under the same node.The table is alot more complicated than the example below, but for proof of concept purposes, i'll keep it really simple:
Is there an existing option that deals with this automatically, or do I essentially need to do a group by to output the campaign element, and then union an ungrouped select to output the price element?
Hello,I am currently working on a monthly load process with a datamart. Ioriginally designed the tables in a normalized fashion with the ideathat I would denormalize as needed once I got an idea of what theperformance was like. There were some performance problems, so thedecision was made to denormalize. Now the users are happy with thequery performance, but loading the tables is much more difficult.Specifically...There were two main tables, a header table and a line item table. Thesehave been combined into one table. For my loads I still receive them asseparate files though. The problem is that I might receive a line itemfor a header that began two months ago. When this happens I don't get aheader record in the current month's file - I just get the record inthe line items file. So now I have to join the header and line itemtables in my staging database to get the denormalized rows, but I alsomay have to get header information from my main denormalized table(~150 million rows). For simplicity I will only include the primarykeys and one other column to represent the rest of the row below. Thetables are actually very wide.Staging database:CREATE TABLE dbo.Claims (CLM_ID BIGINT NOT NULL,CLM_DATA VARCHAR(100) NULL )CREATE TABLE dbo.Claim_Lines (CLM_ID BIGINT NOT NULL,LN_NO SMALLINT NOT NULL,CLM_LN_DATA VARCHAR(100) NULL )Target database:CREATE TABLE dbo.Target (CLM_ID BIGINT NOT NULL,LN_NO SMALLINT NOT NULL,CLM_DATA VARCHAR(100) NULL,CLM_LN_DATA VARCHAR(100) NULL )I can either pull back all of the necessary header rows from the targettable to the claims table and then do one insert using a join betweenclaims and claim lines into the target table OR I can do one insertwith a join between claims and claim lines and then a second insertwith a join between claim lines and target for those lines that weren'talready added.Some things that I've tried:INSERT INTO Staging.dbo.Claims (CLM_ID, CLM_DATA)SELECT DISTINCT T.CLM_ID, T.CLM_DATAFROM Staging.dbo.Claim_Lines CLLEFT OUTER JOIN Staging.dbo.Claims C ON C.CLM_ID = CL.CLM_IDINNER JOIN Target.dbo.Target T ON T.CLM_ID = CL.CLM_IDWHERE C.CLM_ID IS NULLINSERT INTO Staging.dbo.Claims (CLM_ID, CLM_DATA)SELECT T.CLM_ID, T.CLM_DATAFROM Staging.dbo.Claim_Lines CLLEFT OUTER JOIN Staging.dbo.Claims C ON C.CLM_ID = CL.CLM_IDINNER JOIN Target.dbo.Target T ON T.CLM_ID = CL.CLM_IDWHERE C.CLM_ID IS NULLGROUP BY T.CLM_ID, T.CLM_DATAINSERT INTO Staging.dbo.Claims (CLM_ID, CLM_DATA)SELECT DISTINCT T.CLM_ID, T.CLM_DATAFROM Target.dbo.Target TINNER JOIN (SELECT CL.CLM_IDFROM Staging.dbo.Claim_Lines CLLEFT OUTER JOIN Staging.dbo.Claims C ON C.CLM_ID =CL.CLM_IDWHERE C.CLM_ID IS NULL) SQ ON SQ.CLM_ID = T.CLM_IDI've also used EXISTS and IN in various queries. No matter which methodI use, the query plans tend to want to do a clustered index scan on thetarget table (actually a partitioned view partitioned by year). Thenumber of headers that were in the target but not the header file thismonth was about 42K out of 1M.So.... any other ideas on how I can set up a query to get the distinctheaders from the denormalized table? Right now I'm considering usingworktables if I can't figure anything else out, but I don't know ifthat will really help me much either.I'm not looking for a definitive answer here, just some ideas that Ican try.Thanks,-Tom.
I have an allergy table which has a patientid and an allergy id. i would like to create a view(or SQL statement) that will give me a crosstab of a patient and there allergies(like below)
I'm working on a couple projects and I've recently been trying to make everything fully normalized so updates are easier and I'm just wondering if there's a standard way to query and update normalized tables. For example: If I have table People with columns ID, FirstName, LastName, Height, Weight, ShoeSize, I can normalize that into two tables. Table People has ID, FirstName and LastName. Table PeopleDetails has PeopleID (FK), Property and Value. That way i can add more properties later right at the presentation layer if I like. Essentially I moved the data from being horizontal to being vertical. But doing a simple search for people means I have to search the details table and return a LOT more records (one each for Height, Weight and ShoeSize) not to mention any more details I might add later. With a lot of details, it seems like your performance would take a big hit and your code would get really complicated as your looping through a vertical dataset to find the properties you want. Or is there some other standard way of doing that? I'm just hoping that someone else has solved these problems and there's a standard set of functions out there for selecting and updating this kind of DB structure. Anyone?
I have a "source" table that is being populated by a DTS bulk importof a text file. I need to scrub the source table after the importstep by running appropriate stored proc(s) to copy the source data to2 normalized tables. The problem is that table "Companies" needs tobe populated first in order to generate the Identity ID and then usethat as the foreign key in the other table.Here is the DDL:CREATE TABLE [dbo].[OriginalList] ([FirstName] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[LastName] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[Company] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,[Addr1] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,[City] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,[State] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,[Zip] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,[Phone] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL) ON [PRIMARY]GOCREATE TABLE [dbo].[Companies] ([ID] [int] IDENTITY (1, 1) NOT NULL ,[Name] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL) ON [PRIMARY]GOCREATE TABLE [dbo].[CompanyLocations] ([ID] [int] IDENTITY (1, 1) NOT NULL ,[CompanyID] [int] NOT NULL ,[Addr1] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,[City] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,[State] [varchar] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,[Zip] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,[Phone] [varchar] (14) COLLATE SQL_Latin1_General_CP1_CI_AS NULL) ON [PRIMARY]GOThis is the stored proc I have at this time that does NOT work. Ituses the last Company insert for all the CompanyLocations which is notcorrect.CREATE PROCEDURE DataScrubSP ASBegin Transactioninsert Companies (Name) select Company from OriginalListIF @@Error <> 0GOTO ErrorHandlerdeclare @COID intselect @COID=@@identityinsert CompanyLocations (CompanyID, Addr1, City, State, Zip) select@COID, Addr1, City, State, Zip from OriginalListIF @@Error <> 0GOTO ErrorHandlerCOMMIT TRANSACTIONErrorHandler:IF @@TRANCOUNT > 0ROLLBACK TRANSACTIONRETURNGOThanks for any help.Alex.
Requirement: I wanted to create another table based on the column values. For eg: I have to take the employee id and check for what value he has under owns column in the table. I take only 3 values and then these values should go to the newly created columns (owns1, owns2,owns3). if there is no value for any of these columns it should have null values loaded in them. The result of the modification should look like this:->
Note: eventhough employeeid 3 owns more than 3 things we only take 3 of what he owns and populate to above coloumns. In addition to it, the column OWNS will have more than 500 different values in them.
Its kind of urgent and if anyone knows how to , Can you please help me on this. Thanks a lot. -- Ragulan ;)
I normalized the below tables but I am finding it difficult to copy data to the new tables. Â How do I copy data from existing table to the normalized tables? see the table structure below and other supporting information:
SKU_DATA(SKU,SKU_Description,Department,Buyer) Note: this table already has data in it. CREATE TABLE SKU_DATA ( SKU     Integer NOT NULL,
[code].....
The table structure above have two three determinants( SKU,SKU_Description and Buyer).  SKU and SKU_Description are candidate keys. Primary key is SKU.
Normalization : SKU_DATA(SKU,SKU_Description, Buyer) Â BUYER(Buyer,Department)
When rendered, the pages with the small tables on have a lot of white blank space at the right of the table. This is probably caused by the big table on page 3.
This report is distributed by email in Excell format. So on sheet 1 and 2 there are a lot of white cells on the right of the tables. When trying to print, they just want to use the "landscape" option and the "fit to page" option. Because of the empty white cells, the fit to page option reduces the first 2 tables to a very small table which covers only 50 % of the page width. The other 50 % is reserved for the empty cells.
Off course, I know that deleting the empty cells offers a solutions, but it would be a lot more handier if there were no empty cells in the first place.
we can easily load a file into db tables. However, my main concern here is the number of columns in the file. A text file TEXT_1400.txt has 1400 columns. I am unable to load data to my db table using BCP or BULK INSERT commands, as maximum of 1024 columns are allowed per table in SQL Server 2008.Â
We can still go ahead and create ‘Wide Table’ (a special table that holds up to 30,000 columns. The maximum size of a wide table row is 8,019 bytes.). But when operating on wide table, BCP/BULK INSERT commands still fail. After few hours of scratching my head over BCP and BULK INSERT, I observed that while inserting BCP/BULK INSERT commands are unable to identify SPARSE columns and skip these columns, which disturbs column mapping and results in data conversion and trancation errors.  Is there any proper way to load this kind of files into the db table?Â
My ERP system (SAP Business One) does not have a "payment type" field, instead it has a field for the amount of each type of payment. It makes for annoying reporting.
If I have the following data in the Incoming Payments table (ORCT)
SELECTT0.DocEntry, CASE WHEN T0.CashSum <> 0 THEN 'Cash' WHEN T0.CreditSum <> 0 THEN 'Credit' WHEN T0.CheckSum <> 0 THEN 'Check' WHEN T0.TrsfrSum <> 0 THEN 'Transfer' END AS [Type], CASE WHEN T0.CashSum <> 0 THEN T0.CashSum WHEN T0.CreditSum <> 0 THEN T0.CreditSum WHEN T0.CheckSum <> 0 THEN T0.CheckSum WHEN T0.TrsfrSum <> 0 THEN T0.TrsfrSum END AS [Total] FROMORCT T0
I think I've just been staring at this too long and wanted some fresh eyes to think outside of the box on this one.
I am using this to report on our customer's monthly statements so that I can display "12/13/06 Wire Transfer $100". There are many more complications to this query such as currencies and which document number or reference to display, but I have figured out most of those on my own.
I have just inherited a new project consisting of data imported into sql 2005 from a multi-dimensional database. After finding the correct ODBC and importing the data I believed that I was done, but after reviewing the resulting structure I discovered why this was called a €œmulti-dimensional€? database. The resulting imported data is completely de-normalized and resembles an excel spreadsheet more than a relational database. An example of this structure is the persons table. The table has multiple columns, some of which contain the multi-dimensional fields. These fields contain multiple values which are separated with a tilde, €œ~€?. I need to parse out the data and normalize it. In the specific sample of data below I attempting to take the personid, associates, and assocattrib and insert them into a sql table, associates. This table references the persons table where the personid and the associates references the personid in the persons table.
CREATE TABLE [dbo].[associates]( [associd] [int] NOT NULL REFERENCES persons(personid), [namepkey] [int] NOT NULL REFERENCES persons(personid), [assocattribute] [varchar](20) NULL )
The purpose of normalizing this data will be to show the realationship(s) between people as it has been documented in the previous data structure, i.e. person 1 is an associate of person 336 and the attribute is WIT.
My problem lies in attempting to parse out the associates and assocattrib columns and relate them to the appropriate personid. The personid relates to each associate and assocattrib and the tilde, ~, separates the values ordinal position which, in sql, would be separate rows. For example the single row: personid associates assocattrib 58201 252427~252427~252427 VICT/SUSP~WIT~RP Should be: 58201 252427 VICT/SUSP 58201 252427 WIT 58201 252427 RP
The imported data can have no associates: personid associates assocattrib 152683 NULL NULL
or up to 69 associates, I am not even going to try to paste all of that here.
There are over 400,000 rows that I am looking at parsing, so any way of doing this in t-sql or the clr would be GREAT. This data is stored in SQL 2005 standard SP2. The specific machine is our test/reporting server, so I am not necessarily concerned with the best performing solution, I am leaning more towards providing some free time for me.
Any help or suggestions, including better ideas on the table structure, would be greatly appreciated.
From last 1 week or so, i have been facing very strange problem with my sql server 2005s database which is configured and set on the hosting web server. Right now for managing my sql server 2005 database, i am using an web based Control Panel developed by my hosting company.
Problem i am facing is that, whenever i try to modify (i.e. add new columns) tables in the database, it gives me error saying that,
"There is already an object named 'PK_xxx_Temp' in the database. Could not create constraint. See previous errors. Source: .Net SqlClient Data Provider".
where xxx is the table name.
I have done quite a bit research on the problem and have also searched on the net for solution but still the problem persist.
we have a table in our ERP database and we copy data from this table into another "stage"Â table on a nightly basis. is there a way to dynamically alter the schema of the stage table when the source table's structure is changed? in other words, if a new column is added to the source table, i would like to add the column to the stage table during the nightly refresh.
Hi allI have two tables in SqlServer with Exactly Same Structure,I want to Copy all Records fromone of them to another one.I came across to "Insert....select..." statement But i have two problem 1) I don't know any thing about Columns name!!! i just know they have same structure and as far as i know , "Insert...select..." need the Column list to operate correctly, am i right? 2) these two table have One Prinary Key column with IDENTITY feature. Any Help Greatly appriciated.Regards.
Structure Number1 Category: CategoryId(P.K.), UniqueName, CreatedDate, ModifiedDateCategoryNative: CategoryId(F.K.), LanguageId(F.K.), NativeName, Description, Importance, IsVisible, CreatedDate, ModifiedDate Structure Number2 Category: CategoryId(P.K.), UniqueName, CreatedDate, ModifiedDateCategoryNative: CategoryNativeId(P.K.), CategoryId(F.K.), LanguageId(F.K.), NativeName, Description, Importance, IsVisible, CreatedDate, ModifiedDate Can anyone tell me that between above 2 structure what is better and why? I just add CategoryNativeId(P.K.) in Number 2 structure.
How can I setup a table structure for the diagram shown in the attached bitmap?
1) I need to create a product using labour and materials. 2) I need to create hardware using labour and materials. 3) I need to be able to include some hardware in some products. 4) Some materials in the product are made of a culmination of materials. E.G., Concrete is made of sand, stone, and cement.
Any suggestions?
So far I have :
USE NORTHWIND
Create Table tbProducts ( ProductID int NOT NULL, Product varchar(50) )
go
ALTER TABLE tbProducts ADD CONSTRAINT tbProducts_pk PRIMARY KEY (ProductID) GO
CREATE Table tbMaterials ( MaterialID int NOT NULL, Material varchar (50) )
GO
ALTER TABLE tbMaterials ADD CONSTRAINT tbMaterials_pk PRIMARY KEY (MaterialID) GO
CREATE Table tbLabour ( LabourCode char (2) NOT NULL, Labour varchar(50) )
GO ALTER TABLE tbLabour ADD CONSTRAINT tbLabour_pk PRIMARY KEY (LabourCode) GO
CREATE Table tbProductMaterials ( fkProductID int NOT NULL, fkMaterialID int NOT NULL, Quantity Float NOT NULL )
GO ALTER TABLE tbProductMaterials ADD CONSTRAINT tbProductMaterials_pk PRIMARY KEY (fkProductID, fkMaterialID) GO
I'm new in creating tables. and I was asked to create table(s) to capture information using SQL server 2005 (express edition)
please read below and share your thoughts and opinions o what's the efficient and best way to structure the tables.
I need to capture user's input daily and every hour, text can be from 20-100 characters, and my company is expecting to have 50,000+users in the future...Also the hrs's text can be stored for max 120 days, after that data would be deleted from the table(s) An example is put all together, (not sure if it is the right and efficient way)
An example is put all together, (not sure if it is the rigth and effectient way) and if database would support that.. and if it will be fast enougth to search for data
(table structure) - (example with 3 users) userId | timestap | message 1 12/12/2007 10:00 some message 1 12/12/2007 11:00 some message 1 12/12/2007 12:00 some message 1 12/12/2007 13:00 some message 1 12/12/2007 14:00 some message 2 12/12/2007 10:00 some message 2 12/12/2007 11:00 some message 3 12/12/2007 12:00 some message 3 12/12/2007 13:00 some message 3 12/12/2007 14:00 some message ...etc..
Hi,I am new to XML and need a structure of table in XML format, which needsto be created as a table in another server.The Source and destinationservers are SQL Server 2000. Could anyone help me on this.Balaji.S.--Posted via http://dbforums.com
Hi, I have one field "Status" with 15 different types of values. Like 1, 2, 3, 4 .. 15. Now this Status field contains the values like this: Id Status 1 1,2 2 3 3 1,2,3 4 13,15
What is the best way to organize this information in SQLServer table? My idea is Create one table with the following volumns: ID, Status1, Status2, Status3, ... Status14, Status15.
Now the above data look like this: Id Status1 Status2 Status3 ... Status13 Status14 Status15 1 1 1 0 0 0 0 2 0 0 1 0 0 0 3 1 1 1 4 0 0 0 1 0 1
So, In future If they add new column they can easily add column like status16. And status table look like this Status StatusID Description 1 a 2 b .. 15