I managed to find the 'Deleting Duplicate Records' from SQLTeam.com (thanks, by the way!!).. I managed to modify it for one of my tables (one of 14).
-- Add a new column
Alter table dbo.tblMyDocsSize add NewPK int NULL
go
-- populate the new Primary Key
declare @intCounter int
set @intCounter = 0
update dbo.tblMyDocsSize
SET @intCounter = NewPK = @intCounter + 1
-- ID the records to delete and get one primary key value also
-- We'll delete all but this primary key
select strComputer, strATUUser, RecCount=count(*), PktoKeep = max(NewPK)
into #dupes
from dbo.tblMyDocsSize
group by strComputer, strATUUser
having count(*) > 1
order by count(*) desc, strComputer, strATUUser
-- delete dupes except one Primary key for each dup record
deletedbo.tblMyDocsSize
fromdbo.tblMyDocsSize a join #dupes d
ond.strComputer = a.strComputer
andd.strATUUser = a.strATUUser
wherea.NewPK not in (select PKtoKeep from #dupes)
-- remove the NewPK column
ALTER TABLE dbo.tblMyDocsSize DROP COLUMN NewPK
go
drop table #dupes
Now that I've got that figured out, I need to write the same thing to fix the other 13 tables (with different column info)- and I'll need to run this daily.
Basically I've put together some vbscript that gathers inventory data and drops it into an MSDE db (sorry - goin for 'free' stuff right now). Problem is it has to run daily so that I'm sure to capture computers that turned on at different times etc which ever-increases my database 'till I bounce off the 2GB limit of MSDE.
So the question is, what would be the best way to do this? Can I put the code into a stored procedure that I can execute each day?
Apparently, deleting 7,000,000 records from a table of about 20,000,000 is not advisable. We were able to take orders at 8:00AM, but not at 7:59.
So, what's the best way of going about deleting a large number of records? Pretty basic lookup table, no relationships with other tables, 12 or so address-type fields, 4 or 5 simple indexes. I can take it down for a weekend or night, if needed.
DTS the ones to keep to another table, drop the old and rename the new table? Bulk copy out, truncate and bring back in? DTS to text, truncate and import back? Other ways?
Never worked with such a large table and need a little experienced guidance.
I need a sql statement to delete duplicate records.
I have a college table with all colleges in the nation. I noticed that all of the colleges were listed twice. How do I delete all of the duplicate records.
Here is my table. Colleges ------------------- schoolID - smallint NOT NULL, schoolName - varchar(60) NULL
Can someone help me out with the sql statement??? I'm running SQL Server 6.5.
Hi All, I am having one table named MyTable and this table contains only one column MyCol. Now i m having 10 records in it and all the records are duplicate ie value is 7 for all 10 records.
It is something like this,
MyCol 7 7 7 7 7 7 7 7 7 7
Now i m trying to delete 10th record or any record then it gives me error "Key column information is insufficient or incorrect. Too many rows were affected by update."
What should i do if i want only 4 records insted 10 records in my table? How do i delete the 6 records from table?
Rajarajan writes "Kindly don't ignore this as regular case. This is peculiar. I need to delete one of duplicate records only if they occurs consecutively. eg.
1. 232 2. 232 3. 345 4. 567 5. 232
Here only the first record has to be delete. Kindly help me out.
I loaded one table via SSIS and found that it contained many duplicate records (from the input source). I can create a SQL task to delete them, but I wonder if SSIS offers and task "out of the box" to delete dups?
Hello all, I have a DTS package set up to import a text file on a daily basis. I need to dump the data in 2 table after 7 days of the last import .this is the code that I have Delete From TblTemp date(Day(-7), CurrentStamp). But for some reason it deleting the data right after it imports it. And it doesn't delete anything out of the other table.
I'm new to relational database concepts and designs, but what i've learned so far has been helpful. I now know how to select certain records from multiple tables using joins, etc. Now I need info on how to do complete deletes. I've tried reading articles on cascading deletes, but the people writing them are so verbose that they are confusing to understand for a beginner. I hope someone could help me with this problem.
I have sql server 2005. I use visual studio 2005. In the database I've created the following tables(with their column names):
Table 1: Classes --Columns: ClassID, ClassName
Table 2: Roster--Columns: ClassID, StudentID, Student Name
What I can't seem to figure out is how can I delete a class (ClassID) from Classes and as a result of this one deletion, delete all students in the Roster table associated with that class, delete all assignments associated with that class, delete all scores associated with all assignments associated with that class in one DELETE sql statement.
What I tried to do in sql server management studio is set the ClassID in Classes as a primary key, then set foreign keys to the other three tables. However, also set AssignmentID in Table 4 as a foreign key to Table 3.
The stored procedure I created was
DELETE FROM Classes WHERE ClassID=@classid
I thought, since I established ClassID as a primary key in Classes, that by deleting it, it would also delete all other rows in the foreign tables that have the same value in their ClassID columns. But I get errors when I run the query. The error said:
The DELETE statement conflicted with the REFERENCE constraint "FK_Roster_Classes1". The conflict occurred in database "database", table "dbo.Roster", column 'ClassID'. The statement has been terminated.
What are reference constraints? What are they talking about? Plus is the query correct? If not, how would I go about solving my problem. Would I have to do joins while deleting?
I thought I was doing a cascade delete. The articles I read kept insisting that cascade deletes are deletes where if you delete a record from a parent table, then the rows in the child table will also be deleted, but I get the error.
Did I approach this right? If not, please show me how, and please, please explain it like I'm a four year old.
Further, is there something else I need to do besides assigning primary keys and foreign keys?
Hi All,I've the following table with a PK defined on an IDENTITY column(INSERT_SEQ):CREATE TABLE MYDATA (MID NUMERIC(19,0) NOT NULL,MYVALUE FLOAT NOT NULL,TIMEKEY INTEGER NOT NULL,TIMEKEY_DTTM DATETIME NULL,IID NUMERIC(19,0) NOT NULL,EID NUMERIC(19,0) NOT NULL,INSERT_SEQ NUMERIC(19,0) IDENTITY(1,1) NOT NULL)GOALTER TABLE MYDATAADD CONSTRAINT PK_MYDATAPRIMARY KEY (INSERT_SEQ)GOThe TIMEKEY_DTTM field is generated, from the value actually insertedinto theTIMEKEY field, by the following trigger:CREATE TRIGGER TIMEKEY1ON MYDATAFOR INSERT ASBEGINDECLARE @M_TIMEKEY_DTTM DATETIMESELECT @M_TIMEKEY_DTTM = DATEADD(SECOND, INS.TIMEKEY +EP.GMT_OFFSET * 0 ,'1970-01-01 00:00:00.000')FROM INSERTED INS, LOCATIONINFO EPWHERE INS.EID = EP.EIDUPDATE MYDATASET TIMEKEY_DTTM = @M_TIMEKEY_DTTMFROM INSERTED INS, MYDATA MDWHERE MD.INSERT_SEQ = INS.INSERT_SEQENDGOThere is also a composite, non unique, index defined on thetuple:(MID,IID,TIMEKEY,EID)CREATE INDEX IX_METDATA ON MYDATA (MID,IID,TIMEKEY,EID)GOAs a consequence of an application design change, I would also changethis index to be UNIQUE, but when I try to drop and create it I get anerror, because the tables stores some duplicated rows...In order to succesfully upgrade the index definition, I wrote some DMLstaementsto lookup and remove the duplicated rows, keeping only the firstrecord inserted, i.e. the one with the lowest INSERT_SEQ:---- This table stores then umber of duplicated records eventuallydiscovered-- into the MYDATA table; the initial value for the NUM_DUPLICATESfield is-- 0 (no duplicated record)--DROP TABLE DUPLICATESGOCREATE TABLE DUPLICATES (TABLENAME VARCHAR(17),NUM_DUPLICATES NUMERIC(19,0) )GOINSERT INTO DUPLICATES VALUES ('MYDATA',0)GOINSERT INTO DUPLICATES VALUES ('CATEGORIESDATA',0)GO---- ///////// CLEAN UP OF MYDATA TABLE--DROP TABLE TMP_MYDATAGOCREATE TABLE TMP_MYDATA (MID NUMERIC(19,0) NOT NULL,TIMEKEY INTEGER NOT NULL,IID NUMERIC(19,0) NOT NULL,EID NUMERIC(19,0) NOT NULL,INSERT_SEQ NUMERIC(19,0) )GO---- Insert into the TMP_MYDATA table all the duplicated records for-- the tuple (MID,IID,TIMEKEY,EID) and NULL for the INSERT_SEQ field--INSERT INTO TMP_MYDATA (MID,IID,TIMEKEY,EID)SELECT MID,IID,TIMEKEY,EIDFROM MYDATAGROUP BY MID,IID,TIMEKEY,EIDHAVING COUNT(*)>1GO---- Updates the INSERT_SEQ field to the lowest value in the group-- of duplicated records--UPDATE TMP_MYDATASET TMP_MYDATA.INSERT_SEQ = (SELECT MIN(INSERT_SEQ)FROM MYDATAWHERE TMP_MYDATA.MID = MYDATA.MID ANDTMP_MYDATA.IID = MYDATA.IID ANDTMP_MYDATA.TIMEKEY = MYDATA.TIMEKEY ANDTMP_MYDATA.EID = MYDATA.EID )GO---- Updates the value of NUM_DUPLICATES for the MYDATA table.--UPDATE DUPLICATESSET NUM_DUPLICATES = (SELECT COUNT(*) FROM TMP_MYDATA)WHERE TABLENAME = 'MYDATA'GO---- Delete from the MYDATA table all the duplicated records,-- keeping only the row with the lowest INSERT_SEQ-- The delete is performed only if there are duplicated recors;-- this is achieved using a "short circuit" AND on the number ofrecords-- stored into the NUM_DUPLICATES field of the DUPLICATES table for-- the MYDATA table...--DELETE FROM MYDATAWHERE ( SELECT NUM_DUPLICATES FROM DUPLICATES WHERE TABLENAME ='MYDATA') > 0 ANDEXISTS ( SELECT 1FROM TMP_MYDATAWHERE MYDATA.MID = TMP_MYDATA.MID ANDMYDATA.IID = TMP_MYDATA.IID ANDMYDATA.TIMEKEY = TMP_MYDATA.TIMEKEY ANDMYDATA.EID = TMP_MYDATA.EID ANDMYDATA.INSERT_SEQ > TMP_MYDATA.INSERT_SEQ )GOThis tecnique works fine on a normal table (1M recs) but is not veryperformanton huge tables (>10M records)!Do you know a better way to achieve the task of removing all theduplicates records, preserving the lowest INSERT_SEQ betwee theduplicates and also preserving the sequence seed, so that a new recordinserted at time t1>t0 is enumerated with an INSERT_SEQ|t1 >max(INSERT_SEQ)|t0 ?Thanks a lot for your help!PatrizioPS. sorry for such a large post!
I have tried joining several tables and the result displays duplicate rows of virtually every line/row. I have tried using distinct but this didn't work. I know it could because there's several columns from some of the tables named the same.
Im trying to delete duplicate records from the output of the query below, if they also meet certain conditions ie 'different address type' then I would merge the records. From the following query how do I go about achieving one and/or the other from either the output, or as an extension of the query itself?
I have a database where several thousand records have NULL in a binary field. I want to change all the NULLs to false. I have Visual Studio 5, and the database is a SQL Server 5 database on a remote server. What is the easiest way to do this? Is there a query I can run that will set all ReNew to false where ReNew is Null? This is a live database so I want to get it right. I can't afford to mess it up.Diane
Hi i have to delete the master table data without deleting the child table records,is there any solution for this, parent table has relation with the child table. regards vinod.t.v
What's the best practice for adding / editing a record into a database with lots of fields ?I am not talking about the mechanics of it, as there are a lot of trivial examples using ADO.NET, stored procs, etc. Deleting is easy, you just pass in (a few) primary key/keys to uniquely identify the record. But in the real world when you have, say, a table with 100 fields! Do you code the INSERT sproc by hand, with 100 parameters... then call it with your ADO.NET code ? sounds like a lot of work to me... What about updating! That's even worst, sometimes you may need to update only 3 or 4 fields, but using sprocs you would have to pass the whole 100 parameters in again, and "update" the whole record (when in fact you are only changing 3 or 4 fields). With the update i could write different sprocs targeting only the fields i wish to update, but that sounds like duplicating work, vs having one generic update proc. Sometimes i just feel like bypassing sprocs and having inline sql as it would be less work... but i know it is untidy.. and more potential to be buggy. So come on guys (and gals)... let's hear your thoughts on how you would handle the insert / update scenarios when you have lots of fields ? Northwind examples are too trivial :-)
I am planning an application where ~1000 companies will be accessing data. Should I use a key to identify the company and place all data in one table i.e (WHERE company =123) or should the application create company specific tables i.e should I have 1000 small tables with 100 records in each, or one table with 100,000 records?
I have a situation where deleting old records is blocking updating latest records on highly transactional table and getting timeout errors from application.
In details, I have one table called Tran_table1 in OLTP database. This Tran_table1 is highly transactional table, it will receive data for insert/update continuously
While archiving 2 years old records from Tran_table1 into Tran_table1_archive in batches(using DELETE OUTPUT INTO clause), if there is any UPDATEs on Tran_table1,these updates are getting blocked and result is timeout errors in application.
Is there any SQL Server hints to avoid blocking ..
I want to delete duplicate row from a very big table. Actually this table is used by a SP, and it's a very important. Due to duplicate record entry it's falling. I use bellow method for discarding the dulicate record.
PLz tel me it's the most efficent way to this job or u have some other way
1. I drop the primary key
2. Then I let all the duplicate record came into the table
3. then I removed them by using Group by clause and setting rowcount(1 - group by count).
4. Put primary key back and update the statistics.
Code is
If Exists (select * from SYSINDEXES where name='PrimaryKey' and id=Object_id('AdjustmentTransactions')) DROP INDEX AdjustmentTransactions.PrimaryKey
Insert into AdjustmentTransactions (UrnABS, UrnBar, .................Description) Select a.UrnAbs, a.UrnBar, a.TxnUrn..................... a.Description from TxnProcess a where a.InsUpFlag = 'I' and a.Processed = 'N' and ASCII(b.TxnType) = 65
Set @row_count=0 Declare dup_cursor cursor for Select UrnBAR,TxnUrnBarPat,count(*) counts from AdjustmentTransactions group by UrnBAR,TxnUrnBarPat having count(*) > 1 Open dup_cursor Fetch next from dup_cursor into @VUrnBAR,@VTxnUrnBarPat,@count
While (@@Fetch_Status = 0) Begin Select @row_count=@count-1 Set rowcount @row_count Delete from AdjustmentTransactions where UrnBAR=@VUrnBAR and TxnUrnBarPat=@VTxnUrnBarPat Fetch next from dup_cursor into @VUrnBAR,@VTxnUrnBarPat,@count End Set rowcount 0 Close dup_cursor Deallocate dup_cursor If not Exists (select * from SYSINDEXES where name='PrimaryKey' and id=Object_id('AdjustmentTransactions')) CREATE UNIQUE INDEX [PrimaryKey] ON [dbo].[AdjustmentTransactions]([UrnBAR], [TxnUrnBarPat]) ON [PRIMARY] Update Statistics AdjustmentTransactions
I have a csv file that I need to import daily into a SQL Server 2005 table. Much of the table contents could just be overwritten with the new csv file, however there are a set of Rows within the table that need to be appended to , rather than overwritten. There is no Primary Key in the csv file that can be used. I'm not sure this is the best approach, but what I have been trying to do, is append the entire csv file to the existing table, and then go back and delete the duplicates. When I run the Delete, it does delete the majority of the records, but leaves a couple hundred behind. The number left behind varies with each run, can't seem to identify a pattern here. Running the Delete a second time does clean up the rows left behind in the first execution of the Delete, and gives the result I want. Any thoughts as to why this needs to be run twice? Or is a better approach available? Here is my code - SELECT [Pkg ID], [Elm (s)], [Type Name (s)], [End Exec Date], [End Exec Time], dupcount=count(*) INTO temppkgactions FROM pkgactions GROUP BY [Pkg ID], [Elm (s)], [Type Name (s)], [End Exec Date], [End Exec Time]HAVING count(*) > 1
DELETE TOP (SELECT COUNT(*) -1 FROM dbo.temppkgactions WHERE dupcount > 1 ) FROM dbo.pkgactions DROP TABLE temppkgactions
I have problem in deleting duplicate rows. I have a identity column in my table, if I try to use correlatted sub query with Delete command it gives error.
The other problem I have is I have a date column in my table and update that column with current date and time. If use a query to fetch a records on a particular day , it does not return any rows
select * from rates where ch_date >='02/11/99' and ch_date<='02/11/99'
If I use convert also there is some other problems. Is there any way to force date checkings to be done excluding time.
This is an imaginary problem while discussing ROWID in ORACLE.
Consider a table without primary key, unique key, uniuqe index. A row has inserted into the table many times. I want to delete all but one dulicated rows. With any 'where' clause all rows(duplicated) will be deleted. In ORACLE i can achieve this using ROWID as follows:
Delete from Table_name where < all column values > and ROWID <> ( Select max(rowid) from Table_name where < all column values > )
How can this be achieved in MS SQL Server 6.5 ?
According to Dr. Codd's Golden rules for RDBMS one is that One should be able to reach each data value in the database by using table name, row idenfication value and column name.
Does MS SQL Server 6.5 satisfy this requirement ?
Also How many of Dr. Codd's 13 Golden Rules for RDBMS does MS SQL Server 6.5 Satisfy? Which doesn't ?
In my database, I have a table "tbl_c_extract" that consists of 4 columns that look the following. I'm looking at a daily batch of around 4000 records, of which 150 are likely to be duplicates.
In the example above, I need to remove 2 of the entries, leaving only the one that with the maximum leave date. In this case, those without a leave date have the 2099 entry.
Using CTE works exactly as I want it to, however SQL Server Agent doesn't seem to like the use of CTE..
Code: WITH CTE (Proprietary_ID, LeaveDate, RN) AS ( SELECT Proprietary_ID, LeaveDate, ROW_NUMBER() OVER(PARTITION BY Proprietary_ID ORDER BY Proprietary_ID, LeaveDate) AS RN FROM tbl_c_extract ) DELETE FROM CTE WHERE RN > 1
I need to delete the duplicate rows from a table. How to do that in SQL server 7.0 ? If possible write an example, so that it will be much useful for me..
I have an SQL tables [Keys] that has various rows such as: [ID] [Name] [Path] [Customer] 1 Key1 Key1 InHouse 2 Key2 Key2 External 3 Key1 Key1 InHouse 4 Key1 Key1 InHouse 5 Key1 Key1 InHouse
Obviously IDs 1,3,4,5 are all exactly the same and I would like to be left with only: [ID] [Name] [Path] [Customer] 1 Key1 Key1 InHouse 2 Key2 Key2 External
I cannot create a new table/database or change the unique identifier (which is currently ID) either. I simply need an SQL script I can run to clean out the duplicates (I know how they got there and the issue has been fixed but the Database is still currently invalid due to all these duplicate entires).
I have an SQL tables [Keys] that has various rows such as: [ID] [Name] [Path] [Customer] 1 Key1 Key1 InHouse 2 Key2 Key2 External 3 Key1 Key1 InHouse 4 Key1 Key1 InHouse 5 Key1 Key1 InHouse
Obviously IDs 1,3,4,5 are all exactly the same and I would like to be left with only:
I cannot create a new table/database or change the unique identifier (which is currently ID) either. I simply need an SQL script I can run to clean out the duplicates (I know how they got there and the issue has been fixed but the Database is still currently invalid due to all these duplicate entires).
Hi , i am using sql server 2005. i have one table where i need to find records that have same citycode and hospitalcode and doctorcode then delete the record keeping only one record of them my problem is table structure have idendtity column which is unique. that is m table structure is something like
I have a table that contains more than 10,000 rows of duplicate data. The script below copies the data to a temp table then deletes from the original table. My problem is that after it runs, I now have 122 rows of triplicate data (but dups are gone). If I rerun the script, it doesn't see the triplicate data and returns 0 rows. I've use three different versions of delete dup row scripts with the same result. There are no triggers or constraints on the table, not even a primary key. What am I missing?-------------------------------------------------------------------
/********************************************** Delete Duplicate Data **********************************************/
--Identify and save dup data into temp table INSERT INTO #tempduplicatedata SELECT * FROM crt_concept_score GROUP BY student_test_uniq, test_uniq, concept_id, test_id, questions_correct, questions_count, percentage_correct, concept_response_count HAVING COUNT(*) > 1
--Confirm number of dup rows SELECT @@ROWCOUNT AS 'Number of Duplicate Rows'
--Delete dup from original table DELETE FROM crt_concept_score FROM crt_concept_score INNER JOIN #tempduplicatedata ON crt_concept_score.student_test_uniq = #tempduplicatedata.student_test_uniq AND crt_concept_score.test_uniq = #tempduplicatedata.test_uniq AND crt_concept_score.concept_id = #tempduplicatedata.concept_id AND crt_concept_score.test_id = #tempduplicatedata.test_id AND crt_concept_score.questions_correct = #tempduplicatedata.questions_correct AND crt_concept_score.questions_count = #tempduplicatedata.questions_count AND crt_concept_score.percentage_correct = #tempduplicatedata.percentage_correct AND crt_concept_score.concept_response_count = #tempduplicatedata.concept_response_count
--Insert the delete data back INSERT INTO crt_concept_score SELECT * FROM #tempduplicatedata
--Check for dup data. SELECT * FROM crt_concept_score GROUP BY student_test_uniq, test_uniq, concept_id, test_id, questions_correct, questions_count, percentage_correct, concept_response_count HAVING COUNT(*) > 1
--Check table -- SELECT * FROM crt_concept_score
--Drop temp table DROP TABLE #tempduplicatedata GO
I was wondering if anyone had a suggestion as to how to delete duplicate rows from a table. I have been doing this:
SELECT * INTO TempUsersNoRepeats FROM TempUsers2 UNION SELECT * FROM TempUsers3
This way I end up with a total of four tables (the fourth table being the original Users table) and I was hoping that there was a way that I could do this all within the the original Users table and not have to create the three TempUsers tables.
Hi, New to this Database and this forum as I am I would like to ask for a couple of prompts. My SQL2000 tables are ready and I need to schedule Daily upload of .txt files. These contain a rolling 7Days of Stats. Q1: How best to schedule the automiatic uploading of this data to the respective Tables in SQLServer.(Field names are identical), and Q2: How to schedule a Daily Deletion of those rows which are in the tables already (Each day 6 Days must be Deleted and 1 kept)
I must admit I dont know all that much about SQL, which is why I hope someone can show me the light. I have a script almost finished, however I have no idea how to have it trim database entries that are older than, say, 90 days. Any ideas?
I have a table with a load of orphaned records (I know... poor design) I'm trying to get rid of them, but I'm having a brain cramp.
I need to delete all the records from the table "Floor_Stock" that would be returned by this select statement:
SELECT FLOOR_STOCK.PRODUCT, FLOOR_STOCK.SITE FROM PRODUCT_MASTER INNER JOIN FLOOR_STOCK ON PRODUCT_MASTER.PRODUCT = FLOOR_STOCK.PRODUCT LEFT OUTER JOIN BOD_HEADER ON FLOOR_STOCK.PRODUCT = BOD_HEADER.PRODUCT AND FLOOR_STOCK.SITE = BOD_HEADER.SITE WHERE (BOD_HEADER.BOD_INDEX IS NULL) AND (PRODUCT_MASTER.PROD_TYPE IN ('f', 'n', 'k', 'b', 'l', 's'))
I was thinking along the lines of:
DELETE FROM FLOOR_STOCK INNER JOIN (SELECT FLOOR_STOCK. PRODUCT, FLOOR_STOCK.SITE FROM PRODUCT_MASTER INNER JOIN FLOOR_STOCK ON PRODUCT_MASTER. PRODUCT = FLOOR_STOCK.PRODUCT LEFT OUTER JOIN BOD_HEADER ON FLOOR_STOCK. PRODUCT = BOD_HEADER. PRODUCT AND FLOOR_STOCK.SITE = BOD_HEADER.SITE WHERE (BOD_HEADER.BOD_INDEX IS NULL) AND (PRODUCT_MASTER.PROD_TYPE IN ('f', 'n', 'k', 'b', 'l', 's'))) F ON FLOOR_STOCK. PRODUCT = F. PRODUCT AND FLOOR_STOCK.SITE = F.SITE
... but Sql Server just laughs at me: "Incorrect Syntax near the keyword INNER"