Hash/Encrypt CSV To Lookup Repeating Data?

Oct 1, 2007

Hopefully this makes sense, not sure what to even begin researching...

I'm trying to optimize all facets of this process, as it will take over the resources on my server if not done efficiently.

I have CSV files containing INTs that I need to upsert (match to an existing/earlier imported array or create a new record set) millions of times a day. To be clear, this data is a small subset of the actual import, this arrays contents are not the main data of the process, and the value of the entire array is meant to be related to higher level tables.

The contents of the CSV array are 99.9+% repeating, meaning they will very often share the exact same contents as a a previously imported array. A rough guess is there are 20k combinations existing, and less than 1k new per month, and will range from 6 cols x 15 rows to 6 cols x 50 rows.

So current plan is to use a MD5 hash during the (not SQL related) export process to identify the contents of this CSV file, and export only the md5 (32 digit hex) as a lookup to identify the contents. If the SQL import process finds a new (unknown) MD5 it will request the actual contents, otherwise it will simply use the MD5 as a key/id/code for the actual array contents that are already stored.

There's probably a certain terminology I'm not familiar with for this type of thing.. I've never heard of something like this. I realize collision is a threat, but I'm unsure how much I should be worried about it with this type of data (similar size/contents, but a relatively small amount of possibilities). I think up to even 0.1% collision would be acceptable which is probably way more than enough.

Does this sound like a bad idea to anyone? Are there certain hash functions I should use for this type of thing? Anyone have suggestions of where to look next?

Thanks!

View 1 Replies

Hash/Encrypt CSV To Lookup Repeating Data?

Repeating Data.

How Convrt Hash Fromat Data Into Plaintext Format

Repeating Data On Each Page

Repeating Matrix Row Data

Data Incorrectly Repeating Itself

Data Repeating In Stored Procedure

Eliminating Repeating Data Values

Suppress Repeating Data In SQL Report

Repeating Table Headers Overwriting Data

Encrypt Data

Encrypt Data

Encrypt Data In Database

Encrypt Data Issue

Encrypt/decrypt Data

Best Approach To Encrypt Data?

How To Md5 Encrypt Data Using TSQL

Encrypt Data In A Stored Procedure

Encrypt And Decrypt Text Format Data

Transact SQL :: Any Way To Encrypt Varbinary Column Data?

Is There Any Built In Facility To Encrypt A Column Data In SQLSERVER/MSDE

How To Encrypt My Password Or Sensitive Data Before Storing Them In A Database , Using SQL Server 2005?[urgent Plz Help]

Problem With Matrix (in Subreport, Multiple Groups), Groups Repeating First Row Data

Performance Expectations For Fuzzy Lookup Against 25mill Row Lookup Table

Fuzzy Lookup Error When Adding Additional Lookup Columns

Lookup Data In DB

Reporting Services :: SSRS Lookup - Can Use More Than One Field When Doing Lookup

Lookup Transformation (Can It Be Using The Old Data?)

Advanced Lookup In A Data Flow

One-time Data Flow Lookup

SSIS Data Lookup Question

Is It Possible To Lookup Value Based On Two Tables Using Lookup Task

Row Yielded No Match During Lookup When Using 2 Columns In Lookup