DB Design :: Normalizing Personal Contact Information
May 22, 2015
I have a large data set with 10s of millions of rows of contact information. The data is in CSV format and contains 48 columns of information (First name, MI, last name, 4 part address, 10+ demographic points, etc.) and I'm struggling with how I should design the database and normalize this data, or if I should normalize this data.
My 2 thoughts for design were either:
Break the columns into logical categorical tables (i.e. BasicContactInfo, Demographics, Financials, Interests, etc.) Keep the entire row in one table, and pull out the "Objects" into another table (i.e. ContactInformation, States, ZIPCodes, EmployementStatus, EthnicityCodes, etc.)
The data will be immutable for the most part, and when I get new data, I'll just create a new database and replace the old one.
The reason I like option 1 is because it makes importing easier, since I can just insert the appropriate columns from each row into the appropriate tables. Option number 2 feels like it would be faster to get metrics on the data, like how many contacts live in which states, or what is the total number of unique occupations in the data set. Plus I'll be able to make relationships between the tables, like which state is tied to which zipcode, which city is tied with which county, etc. Importing that data might be more tricky, since I don't think SQL Bulk Copy will allow for inserting into normalized tables like that.
The primary use for this data is to allow our sales force to create custom lists of contact information based on a faceted search page. The sales person would create the filter, and then I will provide them with the resulting data so they can start making business contacts. Search performance needs to be good. Insert, update, and deletes won't happen once the data has been imported.
What should I look for in designing this database? Any good articles on designing tables around wide data sets like my contact information?Â
THE LAYOUT: I have two tables: "Applicant_T" and "StreetSuffix_T"
The "Applicant_T" table contains fields for the applicant's current address, previous address and employer address. Each address is broken up into parts (i.e., street number, street name, street suffix, etc.). For this discussion, I will focus on the street suffix. For each of the addresses, I have a street suffix field as follows:
[Applicant_T] CurrSuffix PrevSuffix EmpSuffix
The "StreetSuffix_T" table contains the postal service approved street suffix names. There are two fields as follows:
[StreetSuffix_T] SuffixID <-----this is the primary key Name
For each of the addresses in the Applicant_T table, I input the SuffixID of the StreetSuffix_T table.
THE PROBLEM: I have never created a view that would require the primary key of one table to be associated with multiple fields of another table (i.e., SuffixID-->CurrSuffix, SuffixID-->PrevSuffix, SuffixID-->EmpSuffix). I want to create a view of the Applicant_T table that will show the suffix name from the StreetSuffix_T table for each of the suffix fields in the Applicant_T table. How is this done?
I actually work in an organisation and we have to find a solution about the data consistancy in the database. our partners use to send details to the organisation and inserted directly in the database, so we want to create a new database as a buffer database to insert informations from the partners then make an update to the main database. is there a better solution instead of that?
Please i have created some tables Delivary with this columns (DelivaryId,DelivaryNo,QtyRecieved,DelivaryDate,ProductId) and Product with this columns (ProductId,ProductCode,ProductName,ProductPrice) as you can see the product table keeps record of products whlie the delivary table keeps record of stock supplied. I will like to create another table that will keep record of stock sold out (Invoice Table) based on the qty recieved from the delivaries table Please help
So I'm creating an administrative back end for a site that's already been created, and whoever made the tables the site uses didn't know much about database design. So I need to normalize this table of Links so it can be easier to have someone make changes and updates to it, but then I need to put all my normalized tables back together to create a View exactly like the old table which the old site can select from. Basically the stipulation is I can't change the code for the old site so I have to make it think it's still selecting from the same table with the same type of parameters. Is it worth doing all this? Or should I just tough it out with this really ugly table?Here's the table: and here's the site that uses this table:http://waahp.byu.edu/links.aspThanks!~Cattrah~
Please can someone point me in the direction, i built a very badly designed database consisting of only one huge table when i first started databases, since learning about normalization i have designed and set up a new database which consists of many more tables instead of just the one. My question is where do i start in transfering the data from the old single tabled database to my new multi-tabled database?
I have MS SQL server 2005 managment studio if that helps, but want to transfer around 200,000 rows of data into the new database. Both new and old databases are on the same server.
I am a beginner, so please bare with me. I get very confused on how to normalize my database.
Firstly: The employees in the company I work for are in various departments and can have more then one title and work in more then one department.
Example: John Smith can work in the engineering department as a detailer and an engineer and at the same time work as a project manager for the management department.
How do I setup this table structure?
Employees Table Login (PK) | First | Last | Extension....... --------------------------------------------- jsmith | John | Smith | 280
Department Title Breakdown Department | Title -------------------------- Engineering | Detailer Engineering | Engineer Management | ProjectManager
I have this table...CREATE TABLE #Test (ID char(1), Seq int, Ch char(1))INSERT #Test SELECT 'A',1,'A'INSERT #Test SELECT 'A',2,'B'INSERT #Test SELECT 'A',3,'C'INSERT #Test SELECT 'B',1,'D'INSERT #Test SELECT 'B',2,'E'INSERT #Test SELECT 'B',3,'F'INSERT #Test SELECT 'B',4,'G'....and am searching for this query....SELECT ID, Pattern=...?? FROM #Test....??....to give this result, where Pattern is the ordered concatenation ofCh for each ID:ID PatternA ABCB DEFGThanks for any help!Jim
I re-designed a predecessor's database so that it is more properlynormalized. Now, I must migrate the data from the legacy system intothe new one. The problem is that one of the tables is a CROSSTABTABLE. Yes, the actual table is laid out in a cross-tabular fashion.What is a good approach for moving that data into normalized tables?This is the original table:CREATE TABLE [dbo].[Sensitivities]([Lab ID#] [int] NULL,[Organism name] [nvarchar](60) COLLATE SQL_Latin1_General_CP1_CI_ASNULL,[Source] [nvarchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[BACITRACIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[CEPHALOTHIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[CHLORAMPHENICOL] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_ASNULL,[CLINDAMYCIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[ERYTHROMYCIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_ASNULL,[SULFISOXAZOLE] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_ASNULL,[NEOMYCIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[OXACILLIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[PENICILLIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[TETRACYCLINE] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_ASNULL,[TOBRAMYCIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[VANCOMYCIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[TRIMETHOPRIM] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_ASNULL,[CIPROFLOXACIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_ASNULL,[AMIKACIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[AMPICILLIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[CARBENICILLIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_ASNULL,[CEFTAZIDIME] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[GENTAMICIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[OFLOXACIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[POLYMYXIN B] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,[MOXIFLOXACIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_ASNULL,[GATIFLOXACIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_ASNULL,[SENSI NOTE] [nvarchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL) ON [PRIMARY]
I have created an SSIS package that takes data from a very large table (301 columns) and puts it in a new database in smaller tables. I am using views to control what data goes to the new tables. I also specified that it drop the destination table and recreate it prior to copying the data. The reason for this is so that old data removed from the larger database will get removed from the normalized databases.
I have 2 things I am trying to figure out..
1. I would like to have the package set a specific row in each new table to be the primary key (this will allow us to use relationships when querying the data).
2. I decided I wanted to sort the data as it copies. I am using the BI Visual Studio for my editing. In the Data Flow view I cannot seem to disconnect the output from the Source block so I can connect it to the Sort block and then feed that to the output block. What am I missing here?
I am copying data from one denormalized table to a COUPLE of normalized ones. I am using multicast, following advices from the forum.
The problem I have is that the two destination tables (A and B) are sharing a foreign key relationship.Filling in A is no problem, but when I want to fill in B, I don't know how to populate its foreign key, since the multicast doesn't know the corresponding primary key in table A.
I have a database which has contact column eg. Mr Peter Smith
I am writing a new database which is to have three seperate columns.. saluation, first name and surname. What would be the best way to split the column up?? I was thinking on concentrating on the spaces??
Note: some conacts may not have saluation inc in the contact column, and in this case the saluation column should be blank...
I'm new to SSIS and have run into a problem I'm hoping someone can help me with.
Basically, I have a flat file that looks something like:
ID,Type,Description,Results 1,Test1,This is a test,5 2,Test1,This is also a 1 test,7 3,Test1,This is also a 1 test,13 4,Test2,This is a second test,14 5,Test2,This is also a second test,18
I'm trying to normalize the data by extracting out individual rows that have the same "Type" column value. So what I want is to extract each unique type and description into a separate table. This would give me two new rows, one for a type of Test1, and one for a type of Test2, with the descriptions. Does this make sense? Then I could relate the individual results to these test types. In my scenario, I don't care which description is used; I just want to take the first description that shows up with the associated "Type."
Does anyone have any idea of how I could go about doing this? I could pull out all unique "Types" from the rows with the Aggregate transformation, but I'm trying to figure out how to get the description that goes along with it.
Been learning web dev for five months in c#. The problem I have is that some weeks ago I moved my application to a new folder in my hard drive. All seemed ok but now my connection strings to insert data no longer work and throw exception saying...Object referance not set to an instance of an object. I dug into the new folder and it has created a new sql server ( I think) in app .data how do I connect to this new server and how do I find its connection strings.
I have a contact table and a customer table. The two tables will contain columns like First name Last Name, Date of Birth Post Code, House Number Street Name etc.
I would like to find the different combinations in which I can relate the customer and contact data. Like its is possible that the first name and last name are same but date of birth is different. This indicates that the contact and customer is the same. Now I do not know these combinations and I would like to have this set generated for me. From Integration Service (Sql Server 2005) I get the data and I would like to know the patterns in which data will differ. Is there any way of achieving this?
I am very new to Data Mining and would like to have some direction as to how to progress with this.
I downloaded Business Contact Manager. With the download an SQL server is part of the download. When I rebooted my machine. The SQL server is not connected.
It's asking for:
Server Service
How is this process completed to operate correctly?
I need to normalise comma separated strings of tags (SQL Server 2008 R2).
E.g. (1, 'abc, DEF, xyzrpt') should become (1, 'abc') (1, 'DEF') (1, 'xyzrpt')
I have written a procedure in T-SQL that can handle this. But it is slow and it would be better if the solution was available as a view, even a slow view would be better.
Most solutions I found go the way round: from (1, 'abc'), (1, 'DEF') and (1, 'xyzrpt'), generate (1, 'abc, DEF, xyzrpt').
If memory serves, it used "FOR XML PATH". But it's been a while and I may be totally wrong.
I'm new to SQL with 2 weeks under my belt....lol, so this may be a simple edit:
When I run the following query, I can get a list of all dups in the contact field: ++++++++++++++++++++++++++++++++++ SELECT full_name, COUNT(full_name) AS NumOccurrences FROM contact GROUP BY full_name HAVING ( COUNT(full_name) > 1 ) ++++++++++++++++++++++++++++++++++ However: I need to make sure I am de-activating (active = 0) only the contacts where they are listed more then once within the same company table (company.company_id) and the condition is that phone is NULL. I can't seem to make it work. Does anyone have any suggestions for an UPDATE I can use?
I am trying to figure out the availability of a contact by comparing their available flag to the current day of the week. IE, the contact has 7 BIT fields in the table,1 for each day of the week, being T or F depending on if they are available. I'm trying to figure out how to read the correct field based on the day of the week to see if their available (T) that day for a notification. Each field name in the table is as such: mon, tue, wed, etc. I can get the current DOW from sql and trim it to the same length and case of the field names to try and figure out which one I need to check whether that fields contents are true or not.
I am looking for SQL Server contact management software options. Our company is currently looking at GoldMine Sales & Marketing, but we would be interested in knowing about options that may be a little more intuitive/user-friendly. Please let me know if you have some leads.
I am trying to find out the the Average number of contact hours per student. in Reporting Services 2005. The contact hours is the in the Totaltime field
Is this formula correct
=Sum(Fields!TotalTime.Value)/Avg(Fields!TotalTime.Value) is in the =Fields!StateServices.Value Group
Hi, this is my first post to this forum so thank you all in advance..
I am trying to design a database to store information about the Specification required by each customer. And the main problem I am having is how to store 2 instants of ContactID (from CustomerContacts table) in a seperate table called CustomerSpec.
For example each Customer has many contacts,(the Customer data is stored in a table called Customers which has a one to many relationship with the CustomerContacts table) each customer has one and only one Customer spec, and each customer spec needs to have 2 customer contacts, ie. one for Artwork and one production. (it should also be possible to have the same contact for both Artwork and production).
The problem is how to associate these contactId's with the customer spec...(if there was only one Contact per spec I could simply link the CustomerContacts table with the CustomerSpec table and drop ContactID into CustomerSpec as a foreign key.. But I am stuck to how to save more than one ID..
And works perfectly, but ... how to make sure every item has an element "nodes" ? The case here is for the child leafs obviously. This, because on the client i have to inject this element "nodes" on a json version of this xml, and just wanted to avoid normalizing the structure on the client.
For the root I am using
FOR XML PATH('root'),TYPE; and for the hierarchy that follows FOR XML RAW ('node'), root('nodes'), ELEMENTS
Split function. I have records of multiple users, the last value of every record is a contact number (10 Digits- Numeric), I want a split function which can take the whole text and split the records on the basis of contact number.
In order words i want SQL to locate the contact number and move to the next record after that and so on till the end of the text.
create table tbl_1 (txt varchar (max))
insert into tbl_1 values ('john asfasdf 535 summit ave franklin lks nj 15521 510_644_1079 na na 5,8/12 executive, finance finance and planning far 5537 21133 8.25 126 ronald d hensor jr. 5575621596
[Code] .....
Output john jimenez 535 summit ave franklin lks nj 15521 510_644_1079 na na 5,8/12 executive,finance finance and planning far 5537 21133 8.25 126 ronald d hensor jr. 5575621596 jeffrey galione 57 allen dr wayne nj 15810 562_434_0710 na na 5,8/12 executive, technical sales and support good 8137 91630 8.25 126 eileen oneal 8258364083
I have a bunch of contacts that I've scored how well their names match to other contacts in the same business. I can programmatically figure out how to parse the results, but would like to know how to do this via SQL. My problem is for Business_fk 968976 I have 7 contacts. In the end I should have 4 contacts based on name match. For the business key listed Gerardo Lopez is in the ContactScore table twice for Contact keys 7355719 and 57028145. I then have two rows like so:
Each reference each other, and 2 is a good case, a more difficult case would have key 1 listed 10 times showing a ContactMatch_fk of 2 - 11, and then Contact_fk 2 listed 10 times with a ContactMatch_fk of 1, 3-11.I know 57028145 maps to 7355719 from the first row in the ContactScore table, so when Contact_fk of 7355719 comes up I should be able to skip it and not process that match. Hopefully that makes sense. Anyway here is the test data:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ContactScore]') AND type in (N'U')) DROP TABLE [dbo].[ContactScore]; GO CREATE TABLE [dbo].[ContactScore] ( [ContactScore_pk]INT NOT NULL, [Contact_fk]INT NOT NULL,
I have tried a number of times to install BCM and it just does not want to reinstall and it tells me to look at log file. I cant understand what I am looking at in log file. This is a reinstall and there is very little help I can find. It does not even install SQl express?
I previously installed Office 2007. I tried to install the Business Contact Manager (disk 2) and this error appeared.
Setup failed to install the required component Microsoft SQL Server 2005 Express (MSSMLBIZ). Microsoft Office Professional 2007 CD 2 cannot continue. See C:Program FilesMicrosoft SQL Server90Setup BootstrapLOGSummary.txt for more detail.
(The results are below) I am not extremely technical (I can follow clear directions) but have tried a couple of the options I have seen on this site, none of them worked. I took off encryption and compression, i uninstalled all SQL componets in add/remove hardware and domain was already in my regedit. PLEASE HELP.
Microsoft SQL Server 2005 9.00.2047.00 ============================== OS Version : Microsoft Windows XP Professional Service Pack 2 (Build 2600) Time : Wed Feb 27 17:50:58 2008
DAMON : The current system does not meet recommended hardware requirements for this SQL Server release. For detailed hardware requirements, see the readme file or SQL Server Books Online. Machine : DAMON Product : Microsoft SQL Server Setup Support Files (English) Product Version : 9.00.2047.00 Install : Successful Log File : c:Program FilesMicrosoft SQL Server90Setup BootstrapLOGFilesSQLSetup0008_DAMON_SQLSupport_1.log -------------------------------------------------------------------------------- Machine : DAMON Product : Microsoft SQL Server Native Client Product Version : 9.00.2047.00 Install : Successful Log File : c:Program FilesMicrosoft SQL Server90Setup BootstrapLOGFilesSQLSetup0008_DAMON_SQLNCLI_1.log -------------------------------------------------------------------------------- Machine : DAMON Product : Microsoft SQL Server VSS Writer Product Version : 9.00.2047.00 Install : Successful Log File : c:Program FilesMicrosoft SQL Server90Setup BootstrapLOGFilesSQLSetup0008_DAMON_SqlWriter_1.log -------------------------------------------------------------------------------- Machine : DAMON Product : MSXML 6.0 Parser (KB933579) Product Version : 6.10.1200.0 Install : Successful Log File : c:Program FilesMicrosoft SQL Server90Setup BootstrapLOGFilesSQLSetup0008_DAMON_MSXML6_1.log -------------------------------------------------------------------------------- Machine : DAMON Product : SQL Server Database Services Error : The SQL Server service failed to start. For more information, see the SQL Server Books Online topics, "How to: View SQL Server 2005 Setup Log Files" and "Starting SQL Server Manually." -------------------------------------------------------------------------------- Machine : DAMON Product : SQL Server Database Services Error : The SQL Server service failed to start. For more information, see the SQL Server Books Online topics, "How to: View SQL Server 2005 Setup Log Files" and "Starting SQL Server Manually." -------------------------------------------------------------------------------- Machine : DAMON Product : Microsoft SQL Server 2005 Express Edition Product Version : 9.1.2047.00 Install : Failed Log File : c:Program FilesMicrosoft SQL Server90Setup BootstrapLOGFilesSQLSetup0008_DAMON_SQL.log Last Action : InstallFinalize Error String : The SQL Server service failed to start. For more information, see the SQL Server Books Online topics, "How to: View SQL Server 2005 Setup Log Files" and "Starting SQL Server Manually." The error is (1067) The process terminated unexpectedly. Error Number : 29503 --------------------------------------------------------------------------------
SQL Server Setup failed. For more information, review the Setup log file in %ProgramFiles%Microsoft SQL Server90Setup BootstrapLOGSummary.txt.
I'm looking for a little help on a strange problem, used Business Contact Manager 2007 Database Tool to create a shared database in SQL Server 2005 on a server on the LAN, placed the database in the default instance of MSSQLSERVER then used the database tools to restore a 2003 BCM database to the newly created database in SQL. Everything worked as it should and I verified the database existed and was populated with data and all the permissions were set correctly to access the database. Then I loaded 2007 Business Contact Manager on a workstation on the LAN and attempted to use the wizard to connect to the remote database. I keep receiving errors that the database cannot be found. I have used the SQLCMD (Sqlcmd €“S €œtcp:erverNameinstanceName,portNumber€?) to verify access to the server and named instance and can connect with no problem, so it appears that would eliminate any firewall (which I turned off) problem or permission problem on the SQL Server. This is a connection between a Vista computer and a Server 2003 domain controller. DNS appears to work without a problem as a ping from the Vista machine by server name yields the correct IP. I have several databases running on the sequel server and have no problem accessing them. Any help would be greatly appreciated.