Transact SQL :: Comparing 2 Records / Multiple Instances In The Same Table For A Specific Combination?
Jun 10, 2015
I have a problem where I have 2 compare 2 records from the same table. This part looks easy but the problem is for a User there can be multiple records and I have 2 compare each record with its previous instance based on the timestamp. Not only I have to compare I have to perform some analysis. Below is the Table script and sample output.
Givens: All SQL Server 2008 or 2012 tools at your disposal.
Production database contains the following tables (simplified for example: constraints ignored, etc.) associated with a racing video game’s server.
-- A player of our game
-- Table greater than 10 million rows
CREATE
TABLE [dbo].[User]
(
[UserId]
[bigint] NOT
NULL
,[country]
[int] NULL
-- User’s home country
,[name]
[nvarchar](15)
NULL -- User’s displayable name (‘John’, ‘Bill’)
,[subscriptionTier]
[int] NULL
)
-- 0 == free, 1 == paid, for instance
Assume that rows get written into the event tables at a rate of 1,000 a minute,are never updated once written and currently are only read on a replica/reporting server.
Question Background: Write up a single query that would return the following: List of users and whose “TotalMoneyEarned” value ever grew (between logon events) at a rate of more than 1,000 per minute (we’d consider these suspicious and flag them for later investigation).
For instance, if the sample data were:
-- example of [Events.UserLogon] data -- not the query output we want
Event 1 is okay because there’s nothing to compare it against
Event 2 is okay because the TotalMoneyEarned only grew 500 in a minute
Event 3 should be flagged, as the value grew 1500 in a minute
Event 4 is okay, as it grew 7,000 in 8 minutes (< 1000 per minute)
Query Output (your query should return data in a format like this):
User Flagged Logon Time Rate Since Last Logon (money/minute)
John 2010-10-16 00:21:56 1500
Dave 2010-10-16 00:30:50 3200
Bill 2010-10-16 00:35:23 1000
It is likely that you will need to create sample data for both the User and [Events.Logon] tables. We are looking for a single query that returns data like what is represented in Query Output.
Can we push the data for the above query in a physical table and create index to make the query fast rather than using the same set tables multiple times
I need to compare records between two tables. There is no ID in the tables to do a simple join between them. So, what I'm looking for is: get the first record from table1 and read all record from table2 and give me back the most similar record. The String Distance is a predefined function.
Select a.table1 ,b.table2 from table1 a, table2 b where StringDistance (''a.table1,'b.table2') >90
I have this 40,000,000 rows table... I am trying to clean this 'Contacts' table since I know there are a lot of duplicates.
At first, I wanted to get a count of how many there are.
I need to compare records where these fields are matched:
MATCHED: (email, firstname) but not MATCH: (lastname, phone, mobile). MATCHED: (email, firstname, mobile) But not MATCH: (lastname, phone) MATCHED: (email, firstname, lastname) But not MATCH: (phone, mobile)
I have a scenario to compare previous records based on each ID columns. For each ID, there would be few records, I have a column called "compare", We have to compare all Compare 1 records with Compare 0 Records. If Dt is lesser or equal to comparing DT, then show 0. Else 1
We always only one Compare 0 records in my table, so all compare 1 columns will compare with only one row per ID
My tables look like
Declare @tab1 table (ID Varchar(3), Dt Date, Compare Int) Insert Into @tab1 values ('101','2015-07-01',0) Insert Into @tab1 values ('101','2015-07-02',1) Insert Into @tab1 values ('101','2015-07-03',1) Insert Into @tab1 values ('101','2015-07-01',1) Insert Into @tab1 values ('101','2015-06-30',1)
Insert Into @tab1 values ('102','2015-07-01',0) Insert Into @tab1 values ('102','2015-07-02',1) Insert Into @tab1 values ('102','2015-07-01',1)
select * from @tab1
1.) In the above scenario for ID = '101', we have 5 records, first record has Compare value 0, which mean all other 4 records need to compare with this record only
2.) If Compare 1 record's Dt is less or equal to Compare 0's DT, then show 0 in next column
3.) If Compare 1 record's Dt is greater than Compare 0's DT, then show 1 in next column
If I just use a simple select statement, I find that I have 8286 records within a specified date range.
If I use the select statement to pull records that were created from 5pm and later and then add it to another select statement with records created before 5pm, I get a different count: 7521 + 756 = 8277
Is there something I am doing incorrectly in the following sql?
DECLARE @startdate date = '03-06-2015' DECLARE @enddate date = '10-31-2015' DECLARE @afterTime time = '17:00' SELECT General_Count = (SELECT COUNT(*) as General FROM Unidata.CrumsTicket ct
I'm trying to avoid a large amount of manual data manipulation.
Here's the background: Legacy system that has (well let's call apples apples) pretty much no method of enforcing data integrity, which has caused a fairly decent amount of garbage data to be inserted in some tables. Pulling one of the [Individuals] table from within this Legacy system and inserting it into a production system, into the Table schema currently in place to track [Individuals] in this Production system.
Problem: Inserting the information is easy, how to deduplicate the records that exist within the staging table that the legacy [Individuals] table has been dumped into in production, prior to insertion. (Wanting to do this programmatically with SQL or SSIS preferably, so that I can alter it later to allow for updating existing/inserting new)
Staging Table Schema:
; CREATE TABLE [dbo].[stage_Individuals]( [SysID] [int] NULL, --Unique, though it's not an index intended to identify the [Individuals] [JJISID] [nvarchar](10) NULL, [NameLast] [nvarchar](30) NULL, [NameFirst] [nvarchar](30) NULL, [NameMiddle] [nvarchar](30) NULL,
[code]....
Scenario: There are records that duplicate the JJISID, though this value is supposed to be unique for every individual. The SYSID is just a Clustered Index (I'm assuming) within the Legacy system and will be most likely dropped when inserted into the Production [Inviduals] table. There are records that are missing their JJISID, though this isn't supposed to happen either, but have valid information within SSN/DOB/Name/etc that can be merged into the correct record that has a JJISID assigned. There is really no data conformity, some records have NULLS for everything except JJISID, or some records will have all the [Individuals] information excluding the JJISID.
Currently I am running the following SQL just to get a list of the records that have a duplicate JJISID (I have other's that partition by Name/DOB/etc and will adapt whatever I come up with to be used for those as well):
; select j.* from (select ROW_NUMBER() OVER (PARTITION BY JJISID ORDER BY JJISID) as RowNum, stage_Individuals.*, COUNT(*) OVER (partition by jjisid) as cnt from stage_Individuals) as j where cnt > 1 and j.JJISID is not nullNow, with SQL Server 2012 or later I could use LAG and LEAD w/ the RowNum value to do my data manipulation...but that won't work because we are on SQL Server 2008 in this environment.
[URL]
With, the following as a potential solution:
GSquared (3/16/2010)Here's a query that seems to do what you need. Try it, let me know if it works.
Performance on it will be a problem, but I can't fine tune that. You'll need to look at various method for getting this kind of data from the table and work out which variation will be best for your data. Without access to the actual table, I can't do that.
; WITH CTE AS (SELECT master_id, MIN(ID) AS first_id, MAX(Account_Expiry) AS latest_expiry FROM #People GROUP BY master_id) SELECT P1.master_id,
[code].....
Unfortunately, I don't think that will accomplish what I'm looking for - I have some records that are duplicated 6 times, and I'm wanting to keep the values within these that aren't NULL.
Basically what I'm looking for, is to update any column with a NULL value to the corresponding Duplicate [Individuals] record value for that column.
**EDIT - Example, Record 1 has a JJISID with NULL NameFirst & NameLast BUT Record 2 has the same JJISID and values for NameFirst & NameLast. I'm wanting to propogate the NameFirst & NameLast from Record2 into Record1
I am trying to write a query that will retrieve all students of a particular class and also any rows in HomeworkLogLine if they exist (but return null if there is no row). I thought this should be a relatively simple LEFT join but I've tried every possible combination of joins but it's not working.
SELECT Student.StudentSurname + ', ' + Student.StudentForename AS Fullname, HomeworkLogLine.HomeworkLogLineTimestamp, HomeworkLog.HomeworkLogDescription, ROW_NUMBER() OVER (PARTITION BY HomeworkLogLine.HomeworkLogLineStudentID ORDER BY
[Code] ...
It's only returning two rows (the students where they have a row in the HomeworkLogLine table).
I have a table with 35,000 records in it. I want to update a value in column A for only the first 5000 records, leaving the value in Column A for the remaining 30,000 records as it is now. What would be the command I would use to update Column A for the first 5000 records.
I am still learning T-SQL .Lets consider the table below, ID 1-3 shows our purchase transactions from various Vendors and ID 4-6 shows our payments to them
Table 1 - VendorTransactions
ID PARTY AMOUNT VOUCHER --------------------------------------- 1 A 5000 Purchase 2 B 3000 Purchase 3 C 2000 Purchase
4 A 3000 Payment 5 B 1000 Payment 6 C 2000 Payment 7 A 1000 Payment
Now we have a blank table Table 2 - Liabilities
ID PARTY AMOUNT
I want that SQL should look for each individual party from Table 1 and Calculate TOTAL PURCHASE and TOTAL PAYMENTS and then deduct TOTAL PAYMENTS from TOTAL PURCHASE so we get the remaining balance due for each party and then add the DIFFERENCE AMOUNT alongwith PARTY to the TABLE 2 so I can get the desired result like below
ID PARTY AMOUNT ------------------------- 1 A 1000 2 B 2000 3 C 0
I have a table that has for each shop a value that can change over time.For example
BK_POS 1 --> Segment A BK_POS 1 --> Segment /
What I would like to achieve is to get all distinct Shops (BK_POS) from the table above, but if for that specific pos a row exists where the segment = "/" then I do not want to take this BK_POS in my select query.More concrete, the for example above I do not want to select BK_POS 1 because he has one row where the segment = "/".
I have a table called Employees which has lots of columns but I only want to count some specific columns of this table.
i.e. EmployeeID: 001
week1: 40 week2: 24 week3: 24 week4: 39
This employee (001) has two weeks below 32. How do I use the COUNT statement to calculate that within these four weeks(columns), how many weeks(columns) have the value below 32?
I have this data as below. I need to find out the combination from the data and take a count of them
CREATE TABLE A ( nRef INT, nOrd INT, Token INT, nML INT, nNode INT, sSymbol VARCHAR(50), nMessageCode INT ) INSERT INTO A ( nReferenceNumber,nOrderNumber,nTokenNumber,nML,nNode,sSymbol,nMessageCode ) VALUES (1, 101, 1001,0,2,'SILVER',13073),
[code]....
if you can see, the rows with column nRefNo 1 and 3 are same i.e. with same combination of Symbol viz. Silver and Castorseed. How to get this combination together and then take count of them. Please note i will be dealing with more than 5 million rows.
Hi,I have an Access application with linked tables via ODBC to MSSQLserver 2000.Having a weird problem, probably something i've done while not beingaware of (kinda newbie).the last 20 records (and growing)of a specific table are locked - cantchange them - ("another user is editing these records ... ").I know for a fact that no one is editing records and yet no user canedit these last records in the MDB - including the administrator -while able to add new records.Administrator able to edit records in the ADP (mssql server) where thetables are stored.Please help, the application is renedred inert .Thanks for reading,Oren.
I have a CTE returning a recordset which contains a column SRC. SRC is a number which I use later to get counts and sums for the records in a distinct list.
declare@startdate date = '2014-04-01' declare@enddate date = '2014-05-01' ; with SM as ( SELECT --ROW_NUMBER() OVER (PARTITION BY u.SRC ORDER BY u.SRC) As Row, u.SRC,
[Code] ....
-- If Referral start date is between our requested dates
ref.Referral_Start_Date between @startdate and @enddate
OR
-- Include referrals which started before our requested date, but are still active during our date range.
(ref.Referral_Start_Date < @startdate and (ref.Referral_End_Date > @startdate OR ref.Referral_End_Date IS NULL )) ) INNER JOIN c_sdt s on s.Service_Delivery_Type_Id = u.Service_Delivery_Type_Id AND s.Service_Delivery_Unit_Id = 200 ) SELECT count(distinct (case SRC when 91 then client_number else 0 end)) As Eligable_91,
The query repeats the Header row value for all children associated with the header.I need the output of the query in XML format such that..For every Header element in the XML, all its children should come under that header element//I am using -
SELECT Cols FROM Table Names FOR XML PATH ('Header'), root('root') , ELEMENTS XSINIL
This still repeats the header for each detail (in the XML) , but I need all children for a header under it.I basically want my output in this format -
Im Working with stored procedure. How can i compare Columns with specific values. I want to get the greater values of those column and inserted it to other columns. i want something like these CASE WHEN a> b,c,d THEN a WHEN b> a,c,d THEN b WHEN c> a,b,d THEN c WHEN d> a,d,c THEN d
is there any ways to implement this? i got an error.. thanks please help..
I have a table which is updated daily using a MERGE statement. As records are insert, updated and deleted, I am saving the OUTPUT from the MERGE statement into a history table with a timestamp and action$ column appended to the record.
Using this history table, I'd like to rebuild the data based on specific past date. I was able to create a stored procedure that inspects each record in the history table and apply it to the data in a temp table. The stored procedure solution uses multiple queries to rebuild the data at a point in time. I was curious if there was an easier and more efficient solution using a table function.
I'm trying to create an email report which gives a result of multiple results from multiple databases in a table format bt I'm trying to find out if there is a simple format I can use.Here is what I've done so far but I'm having troble getting into html and also with the database column:
DOC_NO // REV_NO // FILE_NAME ABC123 // A // abc123.pdf ABC123 // B // abc123_2.docx ABC124 // A // abc124.xlsx ABC124 // A // - ABC125 // A // abc125.docx ABC125 // C // abc125.jpg ABC125 // C // abc125.docx ABC125 // C // - ABC126 // 0 // - ABC127 // A1 // abc127.xlsx ABC127 // A1 // abc127.pdf
I'm looking to select all rows where the DOC_NO and REV_NO appear only once.(i.e. the combination of the two values together, not any distinct value in a column)
I have written the sub query to filter the correct results;
SELECT DOC_NO, REV_NO FROM [MYTABLE] GROUP BY DOC_NO, REV_NO HAVING COUNT(*) =1
I now need to strip out the records which have no file (represented as "-" in the FILE_NAME field) and select the other fields (same table - for example, lets just say "ADD1", "ADD2" and "ADD3")
I was looking to put together a query like;
SELECT DOC_NO, REV_NO, FILE_NAME, ADD1, ADD2, ADD3 FROM [MYTABLE] WHERE FILE_NAME NOT LIKE '-' AND DOC_NO IN (SELECT DOC_NO, REV_NO FROM [MYTABLE] GROUP BY DOC_NO, REV_NO HAVING COUNT(*) =1)
But of course, DOC_NO alone being in the subquery select is not sufficient, as (ABC125 /A) is a unique combination, but (ABC125 /C) is not, but these results would be pulled in.
I also cannot simply add an additional "AND" clause on its own to make sure the REV_NO value appears in the subquery, because it is highly repetitive and would have to specifically match the DOC_NO)
What is the easiest way of ensuring that I only pull in the records where both the DOC_NO and REV_NO (combination) are unique, or is there a better way of putting this select together altogether?
I need to delete records from a table (Table1) which has a foreign key column in a related table (Table2).
Table1 columns are: table1Id; Name. Table2 columns include Table2.table1Id which is the foreign key to Table1.
What is the syntax to delete records from Table1 using Table1.Name='some name' and remove any records in Table2 that have Table2.table1Id equal to Table1.table1Id?
I have one table with many records in the table. Each time a record is entered the date the record was entered is also saved in the table. I need a query that will find all the missing records in the table. So if I have in my table:
ID Date Location 1 4/1/2015 bld1 2 4/2/2015 bld1 3 4/4/2015 bld1
I want to run a query like
Select Date, Location FROM [table] WHERE (Date Between '4/1/2015' and '4/4/2015') and (Location = bld1) WHERE Date not in (Select Date, Location FROM [table])