Transact SQL :: Comparing Records Between Two Tables
Sep 29, 2015
I need to compare records between two tables. There is no ID in the tables to do a simple join between them. So, what I'm looking for is: get the first record from table1 and read all record from table2 and give me back the most similar record. The String Distance is a predefined function.
Select a.table1
,b.table2
from table1 a, table2 b
where StringDistance (''a.table1,'b.table2') >90
I have this 40,000,000 rows table... I am trying to clean this 'Contacts' table since I know there are a lot of duplicates.
At first, I wanted to get a count of how many there are.
I need to compare records where these fields are matched:
MATCHED: (email, firstname) but not MATCH: (lastname, phone, mobile). MATCHED: (email, firstname, mobile) But not MATCH: (lastname, phone) MATCHED: (email, firstname, lastname) But not MATCH: (phone, mobile)
I have a scenario to compare previous records based on each ID columns. For each ID, there would be few records, I have a column called "compare", We have to compare all Compare 1 records with Compare 0 Records. If Dt is lesser or equal to comparing DT, then show 0. Else 1
We always only one Compare 0 records in my table, so all compare 1 columns will compare with only one row per ID
My tables look like
Declare @tab1 table (ID Varchar(3), Dt Date, Compare Int) Insert Into @tab1 values ('101','2015-07-01',0) Insert Into @tab1 values ('101','2015-07-02',1) Insert Into @tab1 values ('101','2015-07-03',1) Insert Into @tab1 values ('101','2015-07-01',1) Insert Into @tab1 values ('101','2015-06-30',1)
Insert Into @tab1 values ('102','2015-07-01',0) Insert Into @tab1 values ('102','2015-07-02',1) Insert Into @tab1 values ('102','2015-07-01',1)
select * from @tab1
1.) In the above scenario for ID = '101', we have 5 records, first record has Compare value 0, which mean all other 4 records need to compare with this record only
2.) If Compare 1 record's Dt is less or equal to Compare 0's DT, then show 0 in next column
3.) If Compare 1 record's Dt is greater than Compare 0's DT, then show 1 in next column
I have a problem where I have 2 compare 2 records from the same table. This part looks easy but the problem is for a User there can be multiple records and I have 2 compare each record with its previous instance based on the timestamp. Not only I have to compare I have to perform some analysis. Below is the Table script and sample output.
Givens: All SQL Server 2008 or 2012 tools at your disposal.
Production database contains the following tables (simplified for example: constraints ignored, etc.) associated with a racing video game’s server.
-- A player of our game
-- Table greater than 10 million rows
CREATE TABLE [dbo].[User] ( [UserId] [bigint] NOT NULL ,[country] [int] NULL -- User’s home country ,[name] [nvarchar](15) NULL -- User’s displayable name (‘John’, ‘Bill’) ,[subscriptionTier] [int] NULL ) -- 0 == free, 1 == paid, for instance
Assume that rows get written into the event tables at a rate of 1,000 a minute,are never updated once written and currently are only read on a replica/reporting server.
Question Background: Write up a single query that would return the following: List of users and whose “TotalMoneyEarned” value ever grew (between logon events) at a rate of more than 1,000 per minute (we’d consider these suspicious and flag them for later investigation).
For instance, if the sample data were:
-- example of [Events.UserLogon] data -- not the query output we want
Event 1 is okay because there’s nothing to compare it against
Event 2 is okay because the TotalMoneyEarned only grew 500 in a minute
Event 3 should be flagged, as the value grew 1500 in a minute
Event 4 is okay, as it grew 7,000 in 8 minutes (< 1000 per minute)
Query Output (your query should return data in a format like this):
User Flagged Logon Time Rate Since Last Logon (money/minute) John 2010-10-16 00:21:56 1500 Dave 2010-10-16 00:30:50 3200 Bill 2010-10-16 00:35:23 1000
It is likely that you will need to create sample data for both the User and [Events.Logon] tables. We are looking for a single query that returns data like what is represented in Query Output.
I have two tables of book information. One that has descriptions of thebook in it, and the isbn, and the other that has the book title,inventory data, prices, the isbn.Because of some techncal constraints I won't get into now, I can'tcombine them both into one table. No problem. Things are going fine aslong as there is a description in the one table to corrispond to theisbn and other data in the other table.However, about half of the products are not yet entered into thedescrition table. I'd like to run a sql query that pulls up all theisbns that don't exist in the other. In other words, I'd like to get aquery that tells me exactly which isbns do not yet have descrition datain them. I know there is some sql that says to search from one filewhere the number does not exist in the other, but it slips my mind. Cansomeone help me on this please?Thank you!Bill*** Sent via Developersdex http://www.developersdex.com ***Don't just participate in USENET...get rewarded for it!
I need to check Table1 by Table2 only on NOT NULL cells and if all of them in the row match do not return that row as the result. In this case it will be:
SELECT a.Server, a.Databases, a.Users, a.Names FROM Table1 EXCEPT SELECT ISNULL(b.Server,c.Server), ISNULL(b.Databases,c.Databases), ISNULL(b.Users,c.Users), ISNULL(b.Names,c.Names) FROM Table2 AS a, Table1 AS c
But for many rows (like 100 000) it takes ages to get results, any better way to work on this?
i have 3 tables names parent, child1, child2 parent has 1 record, child1 has 2 record and child 3 has 3 records the script
select Parent.*,child1.f1,child2.f2 from child1 inner join Parent on parent.id =child1.id inner join child2 on child1.id =child2.id
running above query gives me sixes rows but i want only all rows of childs but not their Cartesian products
Object: Table [dbo].[Parent] Script Date: 06/18/2015 17:33:02 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[Parent]( [id] [int] NOT NULL,
Lets say we are executing this query below to retrieve each customer and the amount associated to a table
"INSERT INTO tblReceiptDue (Dealer, Amount) SELECT CustomerName, SUM(CASE WHEN VoucherType = 'Sales' then Outbound ELSE - Inbound END) AS AMOUNT from tblSaleStatementCustomer WHERE CustomerType = 'Dealer' GROUP BY CustomerName"
Table 3: ID RoleID Time 1 1 09:14 2 1 09:15 1 2 09:16
Now I want all the records which belongs to RoleID 1 but if same ID is belongs to RoleID 2 than i don't want that ID.From above tables ID 1 belongs to RoleID 1 and 2 so i don't want ID 1.
Hey Guys, I have a contacts table that contains ID, First Name, Last Name, and Phone Number, Date Entered, Changed. Every time, the data is modified and saved, it will insert a new record in the table. So, Ill create a new record for a contact named Ryan, and then come back a day later and update the last name and phone number. So theSQL table would look like...1 Ryan Scott 818-550-0000 05/08/2008 Null2 Ryan Peters 000-000-0000 05/09/2008 Null How do I write a sql query that will run an update after the insert of the second record to fill in the Changed field with the data that changed?So I want to have record 2, end up looking like this... 2 Ryan Peters 000-000-0000 05/09/2008 LastName,PhoneNumberAny ideas?
Thanks for your help... I have two databases in two different servers, I am running this script which shows customers not in the second server. I am getting an error shown below. any idea of how to solve this issue. Ali
CREATE view v_show_customers_not_in_GP as select customer_id,company_name,contact_fname,contact_lna me,phone,alt_phone,fax,email,street_1,street_2,cit y,c.name,s.code as state,zip_code FROM customer v,country c, state s WHERE v.country_id =c.country_id and v.state_id = s.state_id and convert(char(15),customer_id ) NOT IN (select custnmbr from servername.dbname.dbo.RM00101 )
Server: Msg 18452, Level 14, State 1, Line 1 Login failed for user '(null)'. Reason: Not associated with a trusted SQL Server connection.
I have a date comparison situation in which I will have a column with a date and will have another value containing a GETDATE() minus two weeks. I want to be able to compare both dates and get the MIN date between the two. I'm not sure how to use the MIN(Date) in this scenario since the comparison won't be between two different columns, but between one column and a random date generated by the GETDATE() minus two weeks.
I'm trying to compare about 28 million records (270 length) from table A and B using the Lookup task as described in this forum. The process works fine with about two million records or so on my desktop ( p.4 3.39GHz, 1.5 GB Ram), but hangs with the amount of data I'm trying to process. I tried using full and partial caching, but to no avail. I'm thinking this is a hardware resource problem. So, does anyone has any recommendation on the hardware needed for this kind of operation and/or suggestion? Thanks in advance...
Below is the code for two data sets and I can't seem to get my head around the issue. I need to find the number of 'ER' visits and 'IN' visits, separately, in dbo.VisitData for the 'Active' patients in dbo.PatientStatus. So, consider patient 69. He is Active on 5/5/2014 but becomes Inactive on 9/15/2014. I only want to count the number of visits ER or IN that are between those dates. In addition if patient 69 becomes active again after 9/15/2014, I need to capture that data as well. Patients can change there status multiple times.
I'm trying to avoid a large amount of manual data manipulation.
Here's the background: Legacy system that has (well let's call apples apples) pretty much no method of enforcing data integrity, which has caused a fairly decent amount of garbage data to be inserted in some tables. Pulling one of the [Individuals] table from within this Legacy system and inserting it into a production system, into the Table schema currently in place to track [Individuals] in this Production system.
Problem: Inserting the information is easy, how to deduplicate the records that exist within the staging table that the legacy [Individuals] table has been dumped into in production, prior to insertion. (Wanting to do this programmatically with SQL or SSIS preferably, so that I can alter it later to allow for updating existing/inserting new)
Staging Table Schema:
; CREATE TABLE [dbo].[stage_Individuals]( [SysID] [int] NULL, --Unique, though it's not an index intended to identify the [Individuals] [JJISID] [nvarchar](10) NULL, [NameLast] [nvarchar](30) NULL, [NameFirst] [nvarchar](30) NULL, [NameMiddle] [nvarchar](30) NULL,
[code]....
Scenario: There are records that duplicate the JJISID, though this value is supposed to be unique for every individual. The SYSID is just a Clustered Index (I'm assuming) within the Legacy system and will be most likely dropped when inserted into the Production [Inviduals] table. There are records that are missing their JJISID, though this isn't supposed to happen either, but have valid information within SSN/DOB/Name/etc that can be merged into the correct record that has a JJISID assigned. There is really no data conformity, some records have NULLS for everything except JJISID, or some records will have all the [Individuals] information excluding the JJISID.
Currently I am running the following SQL just to get a list of the records that have a duplicate JJISID (I have other's that partition by Name/DOB/etc and will adapt whatever I come up with to be used for those as well):
; select j.* from (select ROW_NUMBER() OVER (PARTITION BY JJISID ORDER BY JJISID) as RowNum, stage_Individuals.*, COUNT(*) OVER (partition by jjisid) as cnt from stage_Individuals) as j where cnt > 1 and j.JJISID is not nullNow, with SQL Server 2012 or later I could use LAG and LEAD w/ the RowNum value to do my data manipulation...but that won't work because we are on SQL Server 2008 in this environment.
[URL]
With, the following as a potential solution:
GSquared (3/16/2010)Here's a query that seems to do what you need. Try it, let me know if it works.
Performance on it will be a problem, but I can't fine tune that. You'll need to look at various method for getting this kind of data from the table and work out which variation will be best for your data. Without access to the actual table, I can't do that.
; WITH CTE AS (SELECT master_id, MIN(ID) AS first_id, MAX(Account_Expiry) AS latest_expiry FROM #People GROUP BY master_id) SELECT P1.master_id,
[code].....
Unfortunately, I don't think that will accomplish what I'm looking for - I have some records that are duplicated 6 times, and I'm wanting to keep the values within these that aren't NULL.
Basically what I'm looking for, is to update any column with a NULL value to the corresponding Duplicate [Individuals] record value for that column.
**EDIT - Example, Record 1 has a JJISID with NULL NameFirst & NameLast BUT Record 2 has the same JJISID and values for NameFirst & NameLast. I'm wanting to propogate the NameFirst & NameLast from Record2 into Record1
Hi , I have three tables T1 , T2 AND T3. T3 is having fields as a combination of T1 and T2 fields.How can I compare T1 and T2 field values with T3 FIELD VALUES.
I have 2 seperate databases and I need to check for rows that are different from each other in a table.
I used access to link the tables in a database and am using queries to check the tables. However, I am having trouble formulating the SQL. What I want to do is not just check for the ID field to see if it exists, but to make sure the whole row exists. How can I form an SQL statement for this?
I tried something like:
Select * from table1 where Column1 NOT IN (Select Column1 from Table2) AND Column2 NOT IN (Select Column2 from Table2). However, I do not think this is correct. I want to make sure that the rows are compared, not individual values.
I've got two identical tables (except for the names) and I need to run a query that outputs the rows that aren't in the primary table.
To clarify table one has 38,450 records and table two has 30,703. I need to output the records that are in table one but not table two.
Everything I've tried keeps returning the records that are in it. They have a ID as primary key and a userid field which is what I want to be able to list.
How can I select records that have changed or are new when comparing a previous copy of a table with the live version of the table? There is no datetime stamp in these tables. Many thanks in advance.
I need to compare two tables and update the records that have changed. The following code works great for that: SELECT MIN(TableName) as TableName, ID, COL1, COL2, COL3 ... FROM SELECT 'Table A' as TableName, A.ID, A.COL1, A.COL2, A.COL3, ... FROM A UNION ALL SELECT 'Table B' as TableName, B.ID, B.COL1, B.COl2, B.COL3, ... FROM B ) tmp GROUP BY ID, COL1, COL2, COL3 ... HAVING COUNT(*) = 1 ORDER BY ID
However, I also need to compare the two tables and see if there are any new records or deleted records. What would those queries look like? Can they all somehow be combined into one query?
DECLARE @Teams AS TABLE(Team VARCHAR(3)) INSERT INTO @Teams SELECT 'IND' UNION SELECT 'SA' UNION SELECT 'AUS' select Team from @Teams where Team > 'AUS'
[code]....
co-relation between comparison operators in WHERE Clause and the respective output.
I am trying to pull the records which are being affected i.e, comparing the values and if not same then populating the record.
I am using the below condition in where clause however i am getting runt time error as "Conversion failure when converting date and/or time from character string"
Condition in Where clause:
cast(isNull(tab1.Col1,'') as datetime) <> cast(isNull(tab2.col1,'') as datetime)
im currently doin my proj on asp.net, so i need some help on this.
i need to get a special price which muz be constant throughout all the pages which requires to show the product and the special price.
i haf 2 table which are products and promotion. and in products there is a column name price,productid, etc and in promotion, it displays the productid of the product which is having promotion and also a column name specialPrice.
i need to get the special price from the promotion tables.
so how do i go about retrieving wat i need for my databind specialprice.
ive tried using join, but there is cartesian, and i tried to use in, but its not constant throuh all the pages.
I am working with the article that MAK wrote on SecurityLogs http://www.databasejournal.com/features/mssql/article.php/3515886
I have completed this, but I have made some changes to the database (for normalization to 3NF purposes). I now have problems with a query.
I am trying to "Insert a new record in a table if it does not already exist in the table". To try to clarify I perform the following query:
INSERT INTO Tmp_Event SELECT DISTINCT EventID, EventType, EventTypeName from Tmp
Which gives me the Tmp_Event table consisting of EventID's etc. (no duplicates). What I then want to do, is compare the 'Tmp_event' table and an already existing 'Event' table. These two tables are in fact identical. I would like to insert any records from 'Tmp_Event' into 'Event' if they do not already exist in 'Event'.
This query gives me all records that do not exist in 'Event'
SELECT EventID, EventType, EventTYpeName from Tmp_Event WHERE EventID NOT IN (SELECT EventID from Event)
How can I change this query into performing an INSERT INTO Event as well?
I have two tables that share (supposedly) 2 fields (PartID and RaceID) and those two tables should be identical as far as those two fields are concerned. That is, there should be the same number of rows in both tables and if listed in the same sort order in reference to these two fields, they should be identical. The problem is, they are not. There are in excess of 3000 records in each field and I need to write a query that will allow me to compare them row-by-row.
I am using sql server 2000 and I am (kind of)familiar with the SQL Query Analyzer. What I really need to know is how to write the select statement that will allow me to compare the two tables line-by-line to find the discrepancies or, better yet, simply show the discrepancies so I can focus on them.
I would like to compare data across two tables. I have partinformation in a table. I get a new set of information periodically.I would like to compare my new info to my old info. I recognize thatdoing a compare of every attribute of every part will take FOREVER. Isthere some way I can do a "diff" based on the columns that I careabout?Thanks!--gloria
Hi,I'm trying to figure out a way to compare two tables, table one has moreentries than table two, I want SQL to compare table one to table two andspit out and XLS of the enries that exist in table one, but not in tabletwo.So far I can't even get my query right...hehselect * from table1 aleft join tabl2 b on a.column=b.columnwhere a.column exists not b.columnam I missing an "in" in the select portion on my query?thanks alot for any help.