Eliminating Redundant Data

Jul 20, 2005

edit: this came out longer than I thought, any comments about anything
here is greatly appreciated. thank you for reading

My system stores millions of records, each with fields like firstname,
lastname, email address, city, state, zip, along with any number of user
defined fields. The application allows users to define message templates
with variables. They can then select a template, and for each variable
in the template, type in a value or select a field.

The system allows you to query for messages you've sent by specifying
criteria for the variables (not the fields).

This requirement has made it difficult to normalize my datamodel at all
for speed. What I have is this:

[fieldindex]
id int PK
name nvarchar
type datatype

[recordindex]
id int PK
....

[recordvalues]
recordid int PK
fieldid int PK
value nvarchar

whenever messages are sent, I store which fields were mapped to what
variables for that deployment. So the query with a variable criteria
looks like this:

select coalesce(vm.value, rv.value)
from sentmessages sm
inner join variablemapping vm on vm.deploymentid=sm.deploymentid
left outer join recordvalues rv on
rv.recordid=sm.recordid and rv.fieldid=vm.fieldid
where coalesce(vm.value, rv.value) ....

this model works pretty well for searching messages with variable
criteria and looking up variable values for a particular message. the
big problem I have is that the recordvalues table is HUGE, 1 million
records with 50 fields each = 50 million recordvalues rows. The value,
two int columns plus the two indexes I have on the table make it into a
beast. Importing data takes forever. Querying the records (with a field
criteria) also takes longer than it should.

makes sense, the performance was largely IO bound.

I decided to try and cut into that IO. looking at a recordvalues table
with over 100 million rows in it, there were only about 3 million unique
values. so I split the recordvalues table into two tables:

[recordvalues]
recordid int PK
fieldid int PK
valueid int

[valueindex]
id int PK
value nvarchar (unique)

now, valueindex holds 3 million unique values and recordvalues
references them by id. to my suprise this shaved only 500mb off a 4gb
database!

importing didn't get any faster either, although it's no longer IO bound
it appears the cpu as the new bottleneck outweighed the IO bottleneck.
this is probably because I haven't optimized the queries for the new
tables (was hoping it wouldn't be so hard w/o the IO problem).

is there a better way to accomplish what I'm trying to do? (eliminate
the redundant data).. does SQL have built-in constructs to do stuff like
this? It seems like maybe I'm trying to duplicate functionality at a
high level that may already exist at a lower level.

IO is becoming a serious bottleneck.
the million record 50 field csv file is only 500mb. I would've thought
that after eliminating all the redundant first name, city, last name,
etc it would be less data and not 8x more!

-
Gordon

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com

View 5 Replies

Eliminating Redundant Data

2 Tables = Redundant Data/ DISTINCT Doesnt Work

Eliminating Repeating Data Values

Redundant SQL Server

Redundant Indexes

Dual Redundant Environment

2005 Redundant Functions

DB Design :: Don't Have Enough RAM Slots Or Redundant Power Supplies

Need Advice On Identifying Redundant Rows In A Table

Eliminating Duplicates

Eliminating Duplicate Rows

Eliminating One Database...URGENT

Eliminating Duplicates In Select

Eliminating Duplicate Records

Eliminating A Cartesian Product

Eliminating Records In A Query

T-SQL (SS2K8) :: Eliminating Duplicates While Insert?

T-SQL (SS2K8) :: Eliminating Divide By Zero Error

Eliminating And Replacing Duplicate Records

Help Eliminating Duplcate Rows In My Query

ELIMINATING DUPLICATE ROWS URGENT

Eliminating (inverse) Duplicates In Result

Eliminating Combinatorial Relationship Multiplication

SQL Server 2014 :: Eliminating Distinct SORT Operation

SQL Server 2012 :: Eliminating Quotations From Columns When Running A Query?

Power Pivot :: Create Measure Within Date Dimension Table To Sum Single Entry Per Month Eliminating Duplicates

Sampling Data Set Via Integration Services Data Flow For Data Mining Models Without Saving Training And Test Data Set?

System.Data.SqlClient.SqlException: The Conversion Of A Char Data Type To A Datetime Data Type Resulted In An Out-of-range Datetime Value.

Data Reader Or Data Adapter With Data Set?

Master Data Services :: Master Data Services - Data Push Back To Excel Sheet

Ntext Over 4000 Chars Causes 'Data In Row (n) Was Not Update... String Or Binary Data Would Be Truncated...'

SQL Server Admin 2014 :: Change Data Capture(CDC) For Data Warehouse / Reporting?

How To Convert To Regular Text, Data Stored In Image Data Type Field ????