Comparing Rows Within The Same Table (duplicates)?
Jun 15, 2007
How do I only select rows with duplicate dates for each person (id)? (The actual table has approximately 13000 rows with approximately 3000 unique ids)
I have a table with 22 million Business records. I can see that there are duplicates when I group by BusinessName and Address and Phone. I'd like to place only the duplicates into a table, with a ranking, oldest business key gets a ranking of 1.
As a bonus I'd like each group to have a distinct group name (although not necessary, just want to know how to do this)
Later after I run more verifications to make sure these are not referenced elsewhere I'll delete everything with a matchRank > 1 out of the main Business table.
DROP TABLE [dbo].[TestBusiness]; GO CREATE TABLE [dbo].[TestBusiness]( [Business_pk] INT IDENTITY(1,1) NOT NULL, [BusinessName] VARCHAR (200) NOT NULL, [Address] VARCHAR(MAX) NOT NULL,
I'm trying to come up with an elegant, simple way to compare twoconsecutive values from the same table.For instance:SELECT TOP 2 datavalues FROM myTable ORDER BY timestamp DESCThat gives me the two latest values. I want to test the rate ofchange of these values. If the top row is a 50% increase over the rowbelow it, I'll execute some special logic.What are my options? The only ways I can think of doing this arepretty ugly. Any help is very much appreciated. Thanks!B.
I have this 40,000,000 rows table... I am trying to clean this 'Contacts' table since I know there are a lot of duplicates.
At first, I wanted to get a count of how many there are.
I need to compare records where these fields are matched:
MATCHED: (email, firstname) but not MATCH: (lastname, phone, mobile). MATCHED: (email, firstname, mobile) But not MATCH: (lastname, phone) MATCHED: (email, firstname, lastname) But not MATCH: (phone, mobile)
;WITH ctePreAgg AS ( select top 500 act_reference "ActivityRef", row_number() over (partition by act_reference order by act_reference) as rowno, t3.s_initials "Initials" from mytablestuff order by act_reference
[code]...
But what I would love to do next is take each of the above rows - and return the initials either in one column with all the nulls and duplicate values removed, separated by a comma ..
OR the above but using variable number of columns based on the maximum number of different initials for each row.this is not strictly required, but maybe neater for further work on the view
As you can see each room is assigned and id and once the room changes the id starts from 1 again. I don't want the 2nd row to appear because 9am - 1pm is in 8am - 5pm. How do I remove this row?
create table #prints(id int IDENTITY(1,1) NOT NULL, Printermarkersupplyid varchar(36),PrinterID varchar(10),Description varchar(10),SupplyLevel int,ModifiedDate datetime) insert into #prints(Printermarkersupplyid,PrinterID,Description,SupplyLevel,ModifiedDate) select newid(),'P1','D1',100,'2013-08-1 03:28:38.203' union all
[code]....
my requirement is to get the difference between adjacent rows.ie difference between 2nd and 3rd or 6th and 7th but not 6th and 8th.if difference between 2nd and 3rd is less than zero and 3rd modified date > 2nd modified date,then i should get count as 1 against 3rd row.
I have a table that holds pay rate changes with a field for the rate start date and a field for the rate end date. When an employee gets given a new pay rate, the existing rate is given an end date and a new row is added with the rate start date being the day following the end date of the old pay rate.
I need to identify the staff who have had a rate change within the past month, therefore an end date on one row that is within one month of the current month, and a start date on another row that is one day after an end date on a separate row and within one month of the current month.
I've been working with T-SQL in a MSSQL Server Management Studio (2005) for about a week now. I've been trying to convert some horribly written VB code from a MS Access DB over to SQL so it can be automated on a SQL backend.
Most of the learning process and coding has gone surprisingly well. The problem is with comparing some data to determine which one needs to be flagged.
Three tables to note in bold, with notable fields in italics below them:
EmployeeData HRID (identity)
ResourceAllocation ID (identity) [Last Name] (linked to HRID) Project [Resource Start Date] [Resource End Date] [Percent Utilization]
tblHCvalues RAID (linked to ResourceAllocation.ID) a monthyear and quarteryear for every month and quarter from 2012-2014. IE january12, february12, 1q12, 2q13, etc...
And yes, there are probably a thousand ways to optimize that tblHCvalues, but I'll ask about that later. Just work with the structure I have
Here's how it works: Each employee's data and unique HRID is in the EmployeeData tableAn employee can be on one or multiple projects at any timeThose projects are stored per project in the ResourceAllocation table with a link to the Employee's HRID, and all the other information listed aboveEven though an employee might be on two projects, they can only count for headcount on one project.
We use rules that compare the percent of work being done on a project, and the start and end dates of the employee (resource) on that project to determine which project should be counted for Headcount. The code uses a cursor to go through each HRID, and then pull up all the ResourceAllocation records associated with it.Run the rules to determine which ResourceAllocation record counts toward headcountA stored procedure then runs that fills out the tblHCvalues in the way we want for the project we want
All of it works, except for the rules that compare the things, so that's what I want to focus on in this thread. How do I write these rules:
Here are the rules, and they should work for any number of multiple resource allocations for one employee:
Choose the ResourceAllocation with the greatest [Percent Utilization]If the top ResourceAllocations have equal [Percent Utilization], choose the ResourceAllocation with the earliest [Resource Start Date]If the [Percent Utilization] and the [Resource Start Date] are equal, choose the latest [Resource End Date]If all three fields are equal, choose the first ResourceAllocation (aka, screw it and pick one at random)
I'm sure I could use a bunch of IF statements to compare it all, but even that is complicated to think about. There has to be an easier way, right?
I want to do a login in vb.net, when the user enters username and password, it looks into ms sql to find user then compare password with database then select user in database and closes form to open another form with selected database. please help, i am stranded with my project and i am getting nowhere.
I want to do a login in vb.net, when the user enters username and password, it looks into ms sql to find user then compare password with database then select user in database and closes form to open another form with selected database. please help, i am stranded with my project and i am getting nowhere.
I have a table employee_test having the sample data. The rows with EmployeeID=6 are duplicate rows. I want to delete the duplicates retaining one row for the employeeid=6. Note :- I don't want to use a temporary table. I want to do this using a single query or at the most in a SP query batch. Please advise.
I have a report that is built using the report builder and the report model that was created. With the fields that are displayed in the report, it could be possible for the database to have multiple rows with the same values for all the fields (that are displayed in the report). The PK will be different, but this is not displayed in the report as the PK won't make sense in the report. So what happens is that the report displays just 1 record even though the database has multiple records because all the fields (that are displayed in the report) are identical. Is there a way to make the report display all the rows irrespective of duplicates?
I have an existing stored table with duplicate rows that I want to delete.Using a cte gives me
WITH CTE AS ( SELECT rn = ROW_NUMBER() OVER( PARTITION BY employeeid, dateofincident, typeid, description ORDER BY Id ASC), * FROM dbo.TableName ) DELETE FROM cte WHERE rn > 1
This is what I want to do basically. But this is only deleting in my CTE, is there anyway I can update my existing table "TableName" with this, without using temp tables?
I need to compare two consecutive rows (if BEGDA of second row is 1 day greater than ENDDA of first row then I need to pick First row BEGDA and 2nd row ENDDA)
Hello,I currently have Table1 and View1.View1 is a query from 2 or 3 tables that works fine on its own.However in my current query if I try to use it...something like...SELECT a.col1, a.col2, a.col3, b.col1, b.col2, b.col3FROM View1 a JOIN Table1 b on a.col1 = b.col1WHERE a.col2 <b.col2 OR a.col3 <b.col3It throws an error "Server: Msg 446, Level 16, State 9, Line 1 Cannotresolve collation conflict for not equal to operation."Clearly I need to use collation between Table1 and View1, But I dontknow where I need to use "COLLATE SQL_Latin1_General_CP850_CI_AI" andhow? this is the collation set on Table1.Thank you!Yas
I have a customer table with a postcode and a suburb fields and cutomer info which is manually entered by data entry people...
I am trying to compare the entries against a postcode table with the correct postcodes which have fields postcode and suburb and based on the postcode entered in the customer table it should be the same as the suburb in the postcode table, if they are not the same output them to a table for manual checking..How would I go about this
Hi, I have table which stores the fund name and its data. We get quarterly information from the fund co. Suppose if the user wants to add a fund thats not in our database we let then add a ClientFundId and a FundName. But may be after sometime the fund company may add that fund in the next quarter.. So how do i get rid of Duplicated Data.. In the ClientFundId column we can a 9 letter Aplhanumeric or a 5 letter character but if the fund co.. provides those values the 5 letter characters are stored in Ticker column and the 9 letter words are stored in Cusip column.. So i just wrote this query hoping i could retrieve the duplicate values but it didnt list any..but i found one this is my query.. Select FundId, Cusip, Ticker, ClientFundId, FundName, ShortName From Fund Where
ClientFundId = Ticker or ClientFundId = Cusip Any help will appreciated Thanks Karen
does someone have a querry to display the duplicate records in a table.
Table: zipcode dma
My data upload is failing because there is a primary key on zipcode and the source data (42k records) has about 50 duplicate zipcode records in it. It is possible that there is a unique combo of zipcode / dma but I need to identify the duplicate records to determine that.
I have this script bellow which does what it is supposed to. However it only outputs the cust_id. I want it to show all the columns in the table. How would I do this?
SELECT cust_id FROM cust_table WHERE cust_name in ('Billy','John') and rownum < 100 GROUP BY cust_id HAVING COUNT(*) > 1;
The database has Name,Email, and skill. Though the name is distinct it is repeated as it has different skills. I would like to remove duplicate names and add the corresponding skill to the only one row.
From the stored procedure, combining 3 tables I got the output as:
NameemaildepartmentSkill ArunemailidTech teamTechnical ArunemailidTech teamLeadership ArunemailidTech teamDecision Making BinayemailidMarketingTechnical BinayemailidMarketingDecision Making
I would like to remove the duplicate Name fields and combine the Skill in a single row as other fields are same.
So the output should be
NameemaildepartmentSkill ArunemailidTech teamTechnical, Leadership, Decision Making BinayemailidMarketingTechnical,Decision Making
how i can check for duplicate entries for example if a serial number has already been inputted and a user tries to input the same serial number.. how can i get a trigger or some sort to check for duplicates and then prompt that the number has already been entered.
Im trying to look for duplicates in a table field.. field name is alphanumericCol and table is a user defined table...This is my trigger:
ALTER TRIGGER [dbo].[DUPLICATES] ON [dbo].[AMGR_User_Fields_Tbl]
FOR INSERT, UPDATE AS DECLARE @Alphanumericcol VARCHAR (750)
-- This trigger has been created to check that duplicate rows are not inserted into table.
-- Check if row exists SELECT @Alphanumericcol FROM Inserted i, AMGR_User_Fields_Tbl t WHERE t.AlphaNumericCol = i.AlphaNumericCol AND t.Client_Id = i.Client_Id -- (@Alphanumericcol = 1) -- Display Error and then Rollback transaction BEGIN RAISERROR ('This row already exists in the table', 16, 1) ROLLBACK TRANSACTION END
The result i get is, if i input a duplicate number it fills in a null in the field so my question is how do i get it to tell me its duplicate and let me insert a new one
Hi All I have the dbo.OperatingHour It has many duplicates and I want to remove duplicates permanently The statement below works but when I open the table there are no changes
Insert into OperatingHour(Weekdays, Wednesdays, Fridays,Saturdays, [Sundays/Public Holidays]) (SELECT DISTINCT Weekdays, Wednesdays, Fridays,Saturdays, [Sundays/Public Holidays] FROM OperatingHour)
I need help flagging duplicate records in ome tables I have.For example if I have Table1 which conatins Field1, Field2 and Field3like belowField1 Field2 Field3 Field4Paul 18 Null NullPaul 18 Null NullJohn 19 Null NullHow would I;1. put a 'Y' in Field3 to mark the two records which are duplicates.2. put a 'Y' in Field4 to mark ONLY ONE of the duplicate records.Regards,Ciarán
There is a table with a single column with 75 rows - 50 unique / 25duplicates. How would pull back a list of the rows that have/areduplicates?This is a question that I got in an interview. I didn't get it,obviously....Thanks,Tim
I would like to compare some values in two columns which are in the same table. I want to check that there are no differences between the values if the ID is Test1 and Test2
Example table
IDValue1Value 2 TEST1HouseTango TEST2HouseTango with test as ( select * from ExampleTable where ID= 'TEST' ),
I have two name columns in my table, NAME1 & NAME2 that I want to compare to see if they match. Only problem is that the order of the first, last, middle name can be either same or different between the two fields.
For example NAME1 = JAY JOHN SMITH NAME2 = JOHN SMITH JAY or SMITH JOHN JAY
Is there a way to somehow reorder these fields and then compare using SQL?