VERY Chalanging Question

May 30, 2006

input: 1.5 million records table consisting users with 4 nvchar
fields:A,B,C,D
the problem: there are many records with dublicates A's or duplicates
B's or duplicates A+B's or duplicates B+C+D's & so on. Mathematicly
there are 16-1 posibilities for each duplication.

aim: find the duplicates & filter them, leave only the unique users
which don't have ANY duplication.

We can do it by a simple select query that logicly checks the
duplication in a OR operator.
But it takes about 16 days in a very fast PC.
The DB is in sql-server, converting it to Oracle might acomplish it to
8 days.

How can i do it in a few hours?
Remeber that filtering first the users with parameter A & than by
parameter B & so on will result an error in the final result because it
will loose the information regarding the filtered users - maybe in
parameter C they are equal to other users in the table...

THANK YOU

View 13 Replies


ADVERTISEMENT

Very Chalanging Question - Explanation

May 31, 2006

Ok, here is a asample table representing the problem more clearlyA | B | C | D-----------------a1 b1 c1 d1a1 b2 c2 d2a3 b3 c1 d3a4 b4 c4 d3a5 b5 c5 d5a6 b6 c6 d3Tha duplications are:row 1+2 in param Arow 1+3 in param Crow 3+4+6 in param Donly row 5 is unique in all parameters.conclusion: row 1+2+3+4+6 are the same usergoal: to find all duplicated rows & to delete them all accept oneinstance to leave.Note:Finding that row 1similar to 2 in A & deleting it will loose databecause we won't know that row 1 is ALSO similar to 3 on C & later onfinding that 3 is similar to 4 & 6 on D & so onThe simple time consuming (about 2 weaks) query to acomplish the taskis:SELECT count(*),A.B,C,DFROM tblGROUP BY A,B,C,DHAVING count(*)>1I THANK YOU ALL

View 3 Replies View Related







Copyrights 2005-15 www.BigResource.com, All rights reserved