Comparing Data In Two Tables To Find Missing Records
Jul 20, 2005
I have two tables of book information. One that has descriptions of the
book in it, and the isbn, and the other that has the book title,
inventory data, prices, the isbn.
Because of some techncal constraints I won't get into now, I can't
combine them both into one table. No problem. Things are going fine as
long as there is a description in the one table to corrispond to the
isbn and other data in the other table.
However, about half of the products are not yet entered into the
descrition table. I'd like to run a sql query that pulls up all the
isbns that don't exist in the other. In other words, I'd like to get a
query that tells me exactly which isbns do not yet have descrition data
in them. I know there is some sql that says to search from one file
where the number does not exist in the other, but it slips my mind. Can
someone help me on this please?
Thank you!
Bill
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Im wondering if it is possible to write a procedure that check two identical tables for any missing records. The table design is excatly the same, but some records (of the 40,000) have not copied over to the second table.
Below is the code for two data sets and I can't seem to get my head around the issue. I need to find the number of 'ER' visits and 'IN' visits, separately, in dbo.VisitData for the 'Active' patients in dbo.PatientStatus. So, consider patient 69. He is Active on 5/5/2014 but becomes Inactive on 9/15/2014. I only want to count the number of visits ER or IN that are between those dates. In addition if patient 69 becomes active again after 9/15/2014, I need to capture that data as well. Patients can change there status multiple times.
I have 2 tables say table1 and table2 with the same structure. Each recordis identified by a field 'SerialNo'. Now there should be a total of 500000records in both tables with serialno from 1 to 500000. Either a record is intable1 or table2. I want to find records (or SerialNo's) that are inneither table (if deleted by accident etc). What would be the sql query?I'm using SQL 6.5thx
I need to compare records between two tables. There is no ID in the tables to do a simple join between them. So, what I'm looking for is: get the first record from table1 and read all record from table2 and give me back the most similar record. The String Distance is a predefined function.
Select a.table1 ,b.table2 from table1 a, table2 b where StringDistance (''a.table1,'b.table2') >90
as you can see, the records have a 30minutes time interval. i need to create a query to know if there are missing records in the table. so basically the result should be this:
We are trying to find out the difference between tables in CUSTOMER database and CUSTOMER_coded database. The goal is to find out if there are any columns missing in each table of CUSTOMER_coded database.
We need the list of tables in CUSTOMER_coded database that misses some column compare to its peer in CUSTOMER database (list of columns being missing also).
I googled, but I get only all the columns in tables of database.
I need missing columns of all the tables when we compare these 2 databases( CUSTOMER and CUSTOMER_coded databases).
I have received some data out of a relational database that is incomplete and I need to find where the holes are. Essentially, I have three tables. One table has a primary key of PID. The other two tables have PID as a foreign key. Each table should have at least one instance of every available PID.
I need to find out which ones are in the second and third table that do not show up in the first one, which ones are in the first and third but not in the second, and which ones are in the first and second but not in the third.
I've come up with quite a few ways of working it but they all involve multiple union statements (or dumping to temp tables) that are joining back to the original tables and then unioning and sorting the results. It just seems like there should be a clean elegant way to do this.
Here is an example:
create table TBL1(PID int, info1 varchar(10) )
Create table TBL2(TID int,PID int)
Create table TBL3(XID int,PID int)
insert into TBL1
select '1','Someone' union all
select '2','Will ' union all
select '4','Have' union all
select '7','An' union all
select '8','Answer' union all
select '9','ForMe'
insert into TBL2
select '1','1' union all
select '2','1' union all
select '3','8' union all
select '4','2' union all
select '5','3' union all
select '6','3' union all
select '7','5' union all
select '8','9'
insert into TBL3
select '1','10' union all
select '2','10' union all
select '3','8' union all
select '4','6' union all
select '5','7' union all
select '6','3' union all
select '7','5' union all
select '8','9'
I need to find the PID and the table it is missing from. So the results should look like:
I have a sql sever 2005 express table with an automatically incremented primary key field. I use a Detailsview to insert new records and on the Detailsview itemInserted event, i send out automated notification emails. I then received two automated emails(indicating two records have been inserted) but looking at the database, the records are not there. Whats confusing me is that even the tables primary key field had been incremented by two, an indication that indeed the two records should actually be in table. Recovering these records is not abig deal because i can re-enter them but iam wondering what the possible cause is. How come the id field was even incremented and the records are not there yet iam 100% sure no one deleted them. Its only me who can delete a record. And then how come i insert new records now and they are all there in the database but now with two id numbers for those missing records skipped. Its not crucial data but for my learning, i feel i deserve understanding why it happened because next time, it might be costly.
Here is an issue that has me stumped for the past few days. I have atable called MerchTran. Among various columns, the relevant columns forthis issue are:FileDate datetime, SourceTable varchar(25)SQL statement:SELECT DISTINCTFileDate, SourceTableFROMMerchTranORDER BYFileDate, SourceTableData looks like this:FileDate DataSource-----------------------------------2005-02-13 00:00:00.000S12005-02-13 00:00:00.000S22005-02-13 00:00:00.000S32005-02-14 00:00:00.000S12005-02-14 00:00:00.000S22005-02-14 00:00:00.000S32005-02-15 00:00:00.000S22005-02-15 00:00:00.000S32005-02-16 00:00:00.000S12005-02-16 00:00:00.000S22005-02-16 00:00:00.000S32005-02-17 00:00:00.000S12005-02-17 00:00:00.000S22005-02-18 00:00:00.000S12005-02-18 00:00:00.000S22005-02-18 00:00:00.000S32005-02-19 00:00:00.000S12005-02-19 00:00:00.000S3We run a daily process that inserts data in to this table everyday forall 3 sources S1, S2, S3Notice how some data is missing indicating the import process for thatparticular source failed.Example: Missing record2005-02-15 00:00:00.000S12005-02-17 00:00:00.000S32005-02-19 00:00:00.000S2Can someone please help me with a SQL Statement that will return me the3 missing records as above.Thanks in advance for all your help!DBA in distress!Vishal
I would like to compare data across two tables. I have partinformation in a table. I get a new set of information periodically.I would like to compare my new info to my old info. I recognize thatdoing a compare of every attribute of every part will take FOREVER. Isthere some way I can do a "diff" based on the columns that I careabout?Thanks!--gloria
I am looking for an efficient mechanism to compare data between 2 tables/views.
Rationale: I use a proprietary tool to data transfer between 2 databases. The tool itself uses Microsoft SSIS (integration service) to transfer data. I want to make sure that the data is transfered properly. Both the source and target database are not live database. The source and target database are in seperate servers.
I have recently converted my DTS packages to SSIS and deployed them to the new server. I have the 2000 and 2005 server running concurrently, all that is left for me to do is compare the the tables generated by the DTS and SSIS packages to see if they are the same.
How do I go about comparing the tables, which are from two different servers using SQL server 2005?
It seems that there should be a solution for my situation, but for the life of me I can't seem to figure it out.
I need to compare two "like" tables, containing similar data. Tbl 1 is "BOOKED" (which is a snapshot of inventory) and tbl 2 is "CURRENT" (the live - working inventory table). If I write my query as follows the the subsequent result is "duplicate" data.
Code Block SELECT booked.item, booked.bin, booked.quantity, current.bin, current.quantity FROM BOOKED LEFT JOIN CURRENT ON booked.item = current.item
No matter what type of join I use, there is duplicate data displayed for each table. For example, if there are more bins in the BOOKED table that contain a certain product then the CURRENT table will repeat data and vica versa.
As follows:
Item Bin Quantity Bin Quantity
12345 A01 500 A01 7680
12345 B01 6 A01 7680
12345 C01 20 A01 7680
54321 G10 1032 E15 1163
54321 G10 1032 F20 523
54321 G10 1032 H30 750
98765 Z20 7000 Z20 8500
98765 Y15 2500 Y15 3000
98765 X10 1200 Y15 3000
What I would like to do is display Bin and Quantity only once and the repeating values as NULL or [BLANK]. Or, to display all of the bins from both tables and only the quantities from each table in relation to the bin found in that table, returning a "0" if no quantity exists.
This is what I'm after:
Item Bin Quantity Bin Quantity
12345 A01 500 A01 7680
12345 B01 6 B01 0
12345 C01 20 C01 0
54321 G10 1032 E15 1163
54321 F20 0 F20 523
54321 H30 0 H30 750
98765 Z20 7000 Z20 8500
98765 Y15 2500 Y15 3000
98765 X10 1200 X10 0
Is this possible? If so, how?
I also might add that it is ok for each table to contain multiple entries for any given item. This is basically being requested as an inventory variance report - inventory before physical count and immediatly after physical count - and will only be run once a year.
----------------------------------------------- Just thinking out loud here: What if I created three subqueries, the first containing only BOOKED information, the second containing only CURRENT information and the third being a UNION of both tables? Something like this:
Code Block SELECT q3.bin, q1.item, ISNULL(q1.quantity, 0) as QTY_BEFORE, ISNULL(q2.quantity, 0) as QTY_AFTER
FROM
(select item, bin, quantity from BOOKED)q1 Left Join
(select item, bin, quantity from CURRENT)q2 on q1.item = q2.item Left Join
(select bin, item from BOOKED UNION CURRENT)q3 on q1.item = q3.item
Order By q1.item
I don't know if I wrote the UNION statement correctly, but I will have to try this when I get back to work...
Hey Guys, I have a contacts table that contains ID, First Name, Last Name, and Phone Number, Date Entered, Changed. Every time, the data is modified and saved, it will insert a new record in the table. So, Ill create a new record for a contact named Ryan, and then come back a day later and update the last name and phone number. So theSQL table would look like...1 Ryan Scott 818-550-0000 05/08/2008 Null2 Ryan Peters 000-000-0000 05/09/2008 Null How do I write a sql query that will run an update after the insert of the second record to fill in the Changed field with the data that changed?So I want to have record 2, end up looking like this... 2 Ryan Peters 000-000-0000 05/09/2008 LastName,PhoneNumberAny ideas?
Thanks for your help... I have two databases in two different servers, I am running this script which shows customers not in the second server. I am getting an error shown below. any idea of how to solve this issue. Ali
CREATE view v_show_customers_not_in_GP as select customer_id,company_name,contact_fname,contact_lna me,phone,alt_phone,fax,email,street_1,street_2,cit y,c.name,s.code as state,zip_code FROM customer v,country c, state s WHERE v.country_id =c.country_id and v.state_id = s.state_id and convert(char(15),customer_id ) NOT IN (select custnmbr from servername.dbname.dbo.RM00101 )
Server: Msg 18452, Level 14, State 1, Line 1 Login failed for user '(null)'. Reason: Not associated with a trusted SQL Server connection.
I'm trying to compare about 28 million records (270 length) from table A and B using the Lookup task as described in this forum. The process works fine with about two million records or so on my desktop ( p.4 3.39GHz, 1.5 GB Ram), but hangs with the amount of data I'm trying to process. I tried using full and partial caching, but to no avail. I'm thinking this is a hardware resource problem. So, does anyone has any recommendation on the hardware needed for this kind of operation and/or suggestion? Thanks in advance...
I'm currently setting up a Tabular Model to do some research between several fact tables. In this example i have two fact tables (table 1 and table 2) which I've created a 1 to 1 relationship on phone number. Typically I create a relationship between these tables to find common data between the two. However, in this case I am trying to figure out the best way to model the data so that I can easily surface data from one table that does not exist in the other. I would liken this to a LEFT JOIN or a WHERE NOT EXISTS in SQL.
Table 1 has all of the data and Table 2 Only has a subset of the data from Table 1. What I'm trying to do here is display what attributes in Table 1 may play a part in records not existing in Table 2. What is the best way to model this?
I have Two Database that exist on Two seperate servers. The two database contain same schema and contains tables and columns of same name. Some tables have slight differences in terms of data types or Data type lenght.
For example if a Table on ServerA has a column named - CustomerSale with Varchar (100, Null) and a table on ServerB has a column named CustomerSale with Varchar (60, Null), how can i find if other columns have similar differences in all tables with the same name and columns in the two servers.
I am using SQL Server 2005. And the Two Servers are Linked Servers
What Script can i use to accomplish this task. Thanks
We have an asp.net app with about 200 data entry forms. Customers mayenter data into any number of forms. Each form's data is persisted ina corresponding sql table. When data entry is complete, it needs to beprocessed. Here's where the questions start.How can we easily determine in which tables a customer has data and howbest to select that data?We're not opposed to putting all the data in a single table. Thistable would wind up having ~15 million records and constantly have CRUDoperations performed against it by up to 5000 users simultaneously.With sufficient hardware, is this too much to ask of the db?
I have this 40,000,000 rows table... I am trying to clean this 'Contacts' table since I know there are a lot of duplicates.
At first, I wanted to get a count of how many there are.
I need to compare records where these fields are matched:
MATCHED: (email, firstname) but not MATCH: (lastname, phone, mobile). MATCHED: (email, firstname, mobile) But not MATCH: (lastname, phone) MATCHED: (email, firstname, lastname) But not MATCH: (phone, mobile)
I have a scenario to compare previous records based on each ID columns. For each ID, there would be few records, I have a column called "compare", We have to compare all Compare 1 records with Compare 0 Records. If Dt is lesser or equal to comparing DT, then show 0. Else 1
We always only one Compare 0 records in my table, so all compare 1 columns will compare with only one row per ID
My tables look like
Declare @tab1 table (ID Varchar(3), Dt Date, Compare Int) Insert Into @tab1 values ('101','2015-07-01',0) Insert Into @tab1 values ('101','2015-07-02',1) Insert Into @tab1 values ('101','2015-07-03',1) Insert Into @tab1 values ('101','2015-07-01',1) Insert Into @tab1 values ('101','2015-06-30',1)
Insert Into @tab1 values ('102','2015-07-01',0) Insert Into @tab1 values ('102','2015-07-02',1) Insert Into @tab1 values ('102','2015-07-01',1)
select * from @tab1
1.) In the above scenario for ID = '101', we have 5 records, first record has Compare value 0, which mean all other 4 records need to compare with this record only
2.) If Compare 1 record's Dt is less or equal to Compare 0's DT, then show 0 in next column
3.) If Compare 1 record's Dt is greater than Compare 0's DT, then show 1 in next column
i have a db that gets real time min by min datas everyday but sometimes somehow some of those dates did not written into that db and i wanna know which dates are missing? how can i do it?
I want to create a view to get records from multiple tables. I have a UserID in all the tables. When I pass UserID to view it should get records from multiple tables. I have a table
UserInfo with as data as UserID=1, FName = John, LName=Abraham and Industry = 2. I have a Industry table with data as ID=1 and Name= Sports, ID =2 and Name= Film.
When I query view where UserID=1 it should return record as
I'm trying to avoid a large amount of manual data manipulation.
Here's the background: Legacy system that has (well let's call apples apples) pretty much no method of enforcing data integrity, which has caused a fairly decent amount of garbage data to be inserted in some tables. Pulling one of the [Individuals] table from within this Legacy system and inserting it into a production system, into the Table schema currently in place to track [Individuals] in this Production system.
Problem: Inserting the information is easy, how to deduplicate the records that exist within the staging table that the legacy [Individuals] table has been dumped into in production, prior to insertion. (Wanting to do this programmatically with SQL or SSIS preferably, so that I can alter it later to allow for updating existing/inserting new)
Staging Table Schema:
; CREATE TABLE [dbo].[stage_Individuals]( [SysID] [int] NULL, --Unique, though it's not an index intended to identify the [Individuals] [JJISID] [nvarchar](10) NULL, [NameLast] [nvarchar](30) NULL, [NameFirst] [nvarchar](30) NULL, [NameMiddle] [nvarchar](30) NULL,
[code]....
Scenario: There are records that duplicate the JJISID, though this value is supposed to be unique for every individual. The SYSID is just a Clustered Index (I'm assuming) within the Legacy system and will be most likely dropped when inserted into the Production [Inviduals] table. There are records that are missing their JJISID, though this isn't supposed to happen either, but have valid information within SSN/DOB/Name/etc that can be merged into the correct record that has a JJISID assigned. There is really no data conformity, some records have NULLS for everything except JJISID, or some records will have all the [Individuals] information excluding the JJISID.
Currently I am running the following SQL just to get a list of the records that have a duplicate JJISID (I have other's that partition by Name/DOB/etc and will adapt whatever I come up with to be used for those as well):
; select j.* from (select ROW_NUMBER() OVER (PARTITION BY JJISID ORDER BY JJISID) as RowNum, stage_Individuals.*, COUNT(*) OVER (partition by jjisid) as cnt from stage_Individuals) as j where cnt > 1 and j.JJISID is not nullNow, with SQL Server 2012 or later I could use LAG and LEAD w/ the RowNum value to do my data manipulation...but that won't work because we are on SQL Server 2008 in this environment.
[URL]
With, the following as a potential solution:
GSquared (3/16/2010)Here's a query that seems to do what you need. Try it, let me know if it works.
Performance on it will be a problem, but I can't fine tune that. You'll need to look at various method for getting this kind of data from the table and work out which variation will be best for your data. Without access to the actual table, I can't do that.
; WITH CTE AS (SELECT master_id, MIN(ID) AS first_id, MAX(Account_Expiry) AS latest_expiry FROM #People GROUP BY master_id) SELECT P1.master_id,
[code].....
Unfortunately, I don't think that will accomplish what I'm looking for - I have some records that are duplicated 6 times, and I'm wanting to keep the values within these that aren't NULL.
Basically what I'm looking for, is to update any column with a NULL value to the corresponding Duplicate [Individuals] record value for that column.
**EDIT - Example, Record 1 has a JJISID with NULL NameFirst & NameLast BUT Record 2 has the same JJISID and values for NameFirst & NameLast. I'm wanting to propogate the NameFirst & NameLast from Record2 into Record1
Hi , I have three tables T1 , T2 AND T3. T3 is having fields as a combination of T1 and T2 fields.How can I compare T1 and T2 field values with T3 FIELD VALUES.
I have 2 seperate databases and I need to check for rows that are different from each other in a table.
I used access to link the tables in a database and am using queries to check the tables. However, I am having trouble formulating the SQL. What I want to do is not just check for the ID field to see if it exists, but to make sure the whole row exists. How can I form an SQL statement for this?
I tried something like:
Select * from table1 where Column1 NOT IN (Select Column1 from Table2) AND Column2 NOT IN (Select Column2 from Table2). However, I do not think this is correct. I want to make sure that the rows are compared, not individual values.