SQL Server 2014 :: Selecting And Merging Records For Singular Complete Record
Jan 24, 2015
I have a database full of different types of leads some for company A some for company B and so on, each doing a different service. However the leads from B can be used for A and leads from A can be used for B, so I want to merge the data.
Example:
Phone Number Name Home Owner Credit Insurance
727-555-1234 Dave Thomas Yes B
727-555-1234 Dave Thomas Gieco
I would like the end result to be one record:
Phone Number Name Home Owner Credit Insurance
727-555-1234 Dave Thomas Yes B Gieco
Since these were imported into SQL they all have a unique ID, here are the current labels
So let's say I have a table Orders with columns: Order# and ReceiptDate. Order#'s may be duplicated (Could have same Order# with different ReceiptDate). I want to select Order#'s that go back 6 months from the last ReceiptDate for each Order#.
I can't just do something like: SELECT * FROM Orders WHERE ReceiptDate >= add_months(date,-6)
because there could be Order#'s whose last ReceiptDate was earlier than 6 months ago. I want to capture all of the instances of each Order# going back 6 months from each last ReceiptDate relative to each Order#.
Table2 contains fields Group, Name,Category, Dimension (Group and Name are not in Table1)
So basically I need to read the records in Table1 using Groupid and each time there is a Groupid then select records from Table2 where Table2.Category in (Select Catergory from Table1) and Table2.Dimension in (Select Dimension from Table1)
In Table1 There might be 10 Groupid records all of which are different.
Hi,I have a stored procedure that has to extract the child records forparticular parent records.The issue is that in some cases I do not want to extract all the childrecords only a certain number of them.Firstly I identify all the parent records that have the requird numberof child records and insert them into the result table.insert into t_AuditQualifiedNumberExtractDetails(BatchNumber,EntryRecordID,LN,AdditionalQualCritPassed)(select t1.BatchNumber,t1.EntryRecordID,t1.LN,t1.AdditionalQualCritPassedfrom(select BatchNumber,RecordType,EntryRecordID,LN,AdditionalQualCritPassedfrom t_AuditQualifiedNumberExtractDetails_Temp) as t1inner join(select BatchNumber,RecordType,EntryRecordID,Count(*) as AssignedNumbers,max(TotalNumbers) as TotalNumbersfrom t_AuditQualifiedNumberExtractDetails_Tempgroup by BatchNumber, RecordType, EntryRecordIDhaving count(*) = max(TotalNumbers)) as t2on t1.BatchNumber = t2.BatchNumberand t1.RecordType = t2.RecordTypeand t1.EntryRecordID = t2.EntryRecordID)then insert the remaining records into a temp table where the number ofrecords required does not equal the total number of child records, andthenloop through each record manipulating the ROWNUMBER to only selectthe number of child records needed.insert into @t_QualificationMismatchedAllocs([BatchNumber],[RecordType],[EntryRecordID],[AssignedNumbers],[TotalNumbers])(select BatchNumber,RecordType,EntryRecordID,Count(*) as AssignedNumbers,max(TotalNumbers) as TotalNumbersfrom t_AuditQualifiedNumberExtractDetails_Tempgroup by BatchNumber, RecordType, EntryRecordIDhaving count(*) <max(TotalNumbers))SELECT @QualificationMismatched_RowCnt = 1SELECT @MaxQualificationMismatched = (select count(*) from@t_QualificationMismatchedAllocs)while @QualificationMismatched_RowCnt <= @MaxQualificationMismatchedbegin--## Get Prize Draw to extract numbers forselect @RecordType = RecordType,@EntryRecordID = EntryRecordID,@AssignedNumbers = AssignedNumbers,@TotalNumbers = TotalNumbersfrom @t_QualificationMismatchedAllocswhere QualMismatchedAllocsRowNum = @QualificationMismatched_RowCntSET ROWCOUNT @TotalNumbersinsert into t_AuditQualifiedNumberExtractDetails(BatchNumber,EntryRecordID,LN,AdditionalQualCritPassed)(select BatchNumber,EntryRecordID,LN,AdditionalQualCritPassedfrom t_AuditQualifiedNumberExtractDetails_Tempwhere RecordType = @RecordTypeand EntryRecordID = @EntryRecordID)SET @QualificationMismatched_RowCnt =QualificationMismatched_RowCnt + 1SET ROWCOUNT 0endIs there a better methodology for doing this .....Is the use of a table variable here incorrect ?Should I be using a temporary table or indexed table if there are alarge number of parent records where the child records required doesnot match the total number of child records ?
Previously same records exists in table having primary key and table having foreign key . we have faced 7 records were lost from primary key table but same record exists in foreign key table.
We've recently upgraded to SQL Server 2014, and are now using SSIS integrated with Visual Studio. We have a SSIS project which contains about 20 packages which are nested in Sequence Containers and executed concurrently. These packages have been set up as project references.
The problem is that when I press the start button to run the packages, they all light up green reporting completion before the data has finished loading into the SQL database. If I press the stop button without waiting a sufficient length of time, then not all of the data gets loaded. i.e. a certain number of rows will be missing from some of the SQL tables.
If I click through to the individual package items and check the data flow progress while running, some of the data flows appear to hang at a certain number of rows without ever reaching completion. The number of rows indicated in the data flow is incorrect - i.e. it will count up to ~150,000 and stay there indefinitely in the running state, when in actual fact there are ~500,000 rows to load.
To clarify, the main package will show all items green and display the "Finished: Success" message in the log window, however when I drill through to certain packages in the set, they'll be stuck in the yellow running state, with no way of knowing whether they've actually completed or not.
My current workaround is to just wait a certain length of time before pressing the stop button. This bug doesn't seem to inhibit rows being loaded - it just incorrectly identifies the point when the load finishes, causing people to terminate the load prematurely.
This issue only occurs if I run the project from the main package container. If I execute the child packages individually, they correctly report the number of rows being loaded and light up green once complete.
I'm trying to avoid a large amount of manual data manipulation.
Here's the background: Legacy system that has (well let's call apples apples) pretty much no method of enforcing data integrity, which has caused a fairly decent amount of garbage data to be inserted in some tables. Pulling one of the [Individuals] table from within this Legacy system and inserting it into a production system, into the Table schema currently in place to track [Individuals] in this Production system.
Problem: Inserting the information is easy, how to deduplicate the records that exist within the staging table that the legacy [Individuals] table has been dumped into in production, prior to insertion. (Wanting to do this programmatically with SQL or SSIS preferably, so that I can alter it later to allow for updating existing/inserting new)
Staging Table Schema:
; CREATE TABLE [dbo].[stage_Individuals]( [SysID] [int] NULL, --Unique, though it's not an index intended to identify the [Individuals] [JJISID] [nvarchar](10) NULL, [NameLast] [nvarchar](30) NULL, [NameFirst] [nvarchar](30) NULL, [NameMiddle] [nvarchar](30) NULL,
[code]....
Scenario: There are records that duplicate the JJISID, though this value is supposed to be unique for every individual. The SYSID is just a Clustered Index (I'm assuming) within the Legacy system and will be most likely dropped when inserted into the Production [Inviduals] table. There are records that are missing their JJISID, though this isn't supposed to happen either, but have valid information within SSN/DOB/Name/etc that can be merged into the correct record that has a JJISID assigned. There is really no data conformity, some records have NULLS for everything except JJISID, or some records will have all the [Individuals] information excluding the JJISID.
Currently I am running the following SQL just to get a list of the records that have a duplicate JJISID (I have other's that partition by Name/DOB/etc and will adapt whatever I come up with to be used for those as well):
; select j.* from (select ROW_NUMBER() OVER (PARTITION BY JJISID ORDER BY JJISID) as RowNum, stage_Individuals.*, COUNT(*) OVER (partition by jjisid) as cnt from stage_Individuals) as j where cnt > 1 and j.JJISID is not nullNow, with SQL Server 2012 or later I could use LAG and LEAD w/ the RowNum value to do my data manipulation...but that won't work because we are on SQL Server 2008 in this environment.
[URL]
With, the following as a potential solution:
GSquared (3/16/2010)Here's a query that seems to do what you need. Try it, let me know if it works.
Performance on it will be a problem, but I can't fine tune that. You'll need to look at various method for getting this kind of data from the table and work out which variation will be best for your data. Without access to the actual table, I can't do that.
; WITH CTE AS (SELECT master_id, MIN(ID) AS first_id, MAX(Account_Expiry) AS latest_expiry FROM #People GROUP BY master_id) SELECT P1.master_id,
[code].....
Unfortunately, I don't think that will accomplish what I'm looking for - I have some records that are duplicated 6 times, and I'm wanting to keep the values within these that aren't NULL.
Basically what I'm looking for, is to update any column with a NULL value to the corresponding Duplicate [Individuals] record value for that column.
**EDIT - Example, Record 1 has a JJISID with NULL NameFirst & NameLast BUT Record 2 has the same JJISID and values for NameFirst & NameLast. I'm wanting to propogate the NameFirst & NameLast from Record2 into Record1
tblCustomers contains a CustomerID that is unique to each customer.
tblMachines contains a list of all machines with a MachineID that is unique to each machine. It also contains the CustomerID number.
tblServiceOrders contains a list of each time each customer was serviced. It contains the ServiceDate, CustomerID, and ServiceOrderNo. But it does not have any information on the machines.
tblMachinesServiced contains a list of each machine that was serviced for each service order. It contains the ServiceOrderNo and the MachineID number.
What I want is to be able to extract a list of machines that were not serviced between 2 dates. What I end up getting is a list of machines that were serviced outside of the date range I provide.
For instance, say machine A was serviced in 2013 and 2015 but not in 2014. And say machine B was serviced in all 3 years. When I try to extract my list of machines not serviced in 2014 I end up with a list that contains machine A-2013, A-2015, B-2013 & B-2015. But what I need is just machine A-2014, since that machine wasn’t serviced in 2014.
I’ve tried several different queries but here is an example:
SELECT tblMachines.MachineID,ServiceMachines.ServiceDate FROM tblMachines LEFT JOIN (SELECT MachineID, ServiceDate FROM tblServiceOrders, tblMachinesServiced WHERE tblServiceOrders.ServiceOrderNo=tblMachinesServiced.ServiceOrderNo ) ServicedMachines ON tblMachines.MachineID=ServicedMachines.MachineID WHERE YEAR(ServiceDate) != '2014'
I understand why it returns the records that it does, but I'm not sure how to get what I want, which is a list of machines not serviced in 2014.
Hello there. I am Completely new to SQL and this forum, and this problem that I have may appear to be very basic to you guys but still... I was wondering if I could get some help with a database I am trying to make in MS Access.
I have used the Access TransferText function to import data from a text file into a table with an ID attached to each line, eg.
ID Text 1 Hello world 2 This is an example 3 Of my database
I want to merge the data, or copy it into a field in a new table to get:
ID Text 1 Hello World This is an example Of my database 2 [more imported text from a different table]
and i have been advised that SQL is the best way to do this. Is it possible to have line breaks in a field within microsoft access, or would it have to be structured as
ID Text 1 Hello World This is an Example Of My Database 2 ...
I am trying to create a dimension table and I am pulling in data from two tables to create it. I need all records from table A, any records from table B that are not in table A, and I need to use the fields from B for those records that do match. What would be the best way to approach this, merge join + derived columns, union all + aggrigation? Any suggestions?
It seems like it's harder to do this in ssis rather then just doing it in the database.
We have a data warehouse staging database in which we capture change history for hundreds of tables from a source system. In the source system, records are updated in place, but in our data warehouse we capture these changes by "terminating" the existing record and adding a new record reflecting the changes. In the data warehouse we add two columns to every table -- effective_date and expiration_date -- which indicate the dates the record was in effect in the source system. By convention, an expiration_date of 6/6/2079 means the record is currently still active in the source system. Each day we simply compare yesterday's version of the record (in the data warehouse) against today's version (in the source system). If differences are found in any of the columns, we terminate the record and add a new one, setting those dates appropriately.
In this example, the employee_id column is the natural key in the source system. We add the effective_date and expiration_date in the data warehouse, so those three columns together make up the key in the data warehouse. The employee_name, employee_dept, and last_login_date columns all come from the source system as well.
In the select output, you can follow the trail of changes for each of these three employees. Bob moved from dept 7 to 8 at some point; Frank didn't change departments at all; Cheryl moved from dept 6 to 9 and later back to 6. However, the last_login_date was updated frequently for all these employees.
We've tracked hundreds of tables this way for years, some with hundreds of columns. For optimization purposes, I'm now interested in trimming the fat a bit. That is, we track changes in many columns that we don't really need in our data warehouse. Some of these columns are rapidly-changing, causing all sorts of unnecessary terminate/inserts in the data warehouse. My goal is to remove these columns, reclaim the disk space and increase the ETL speed. So in this example, let's get rid of the last_login_date column.
alter table mytbl drop column last_login_date select * from mytbl order by employee_id, effective_date
Now in the select output, you can see we have many "effective duplicate" records. For example, nothing changed for Bob between 1/1/2014 and 1/31/2014 -- those really should be one record, not three. Here's the challenge: I'm looking for an efficient way to merge these "effective duplicates" together, through set-based sql updates/deletes/inserts (hoping to avoid any RBAR operations). Here's what the table ultimately should look like (cheating to get there):
Note that Bob only has two records (he changed department), Frank only has one record (no changes), and Cheryl has three records (two department changes).
My inclination would be to drop the unwanted columns, then GROUP BY all the remaining columns from the source system, and taking the MIN effective_date and MAX expiration_date. However, this doesn't work for cases like Cheryl's -- she moved to another department, then back again, so that change history needs to be retained.
As I mentioned, we have hundreds of tables, and I'd like to strip out dozens (maybe hundreds) of unused columns, so ultimately there will be millions of these pseudo-duplicates that need to be merged together. These are huge tables, so I really need to find an efficient set-based approach to this.
I am working on a project that will require me to get a flat data file (excel spreadsheet) with hundreds of thousands of records. Each record is an Owner, and specifically what they own. There will be a field for OwnerName that I want to figure out a way to pull the data into a database like;
Table(Owners) - make sure owner is listed only once
Table(Properties) - joined to owners showing all properties that person owns
Now the tricky part, the owner names might not be exactly the same. Some records might have;
Smith, John John Smith Smith, John T etc.
To make matters worse, this will be a continuous process. I will receive updated excel spreadsheets from time to time and will need to import the new records, many times overwriting the old data. For the good news, there should be an OwnerID that will be unique within the excel data. So as I am merging similar records into the Owners table, I should have a list of OwnerID's that can forever be used to link to the owner.
Is there a way using a stored procedure in a local database to add a record to a database executing in a cloud environment when both entities reside in different domains?
The result of the query I'd like should look something like this
1 2 5 7 8
So basically I'd like to leave record 3 and 4 out because they fall within 24 hours of record 2 and I'd like to leave record 6 out because it falls within 24 hours of record 5.I'd tried working with a CTE and set a dateadd(d, 1, recorddate), join it on itself and use a between From / To filter on the join but that didn't work. I don't think NTILE will work with this?
I am trying to send a csv file with 15000 records via the database mail in SQL Server 2014. The problem is that when I open my email the csv only contains 209 records. I have tried the same thing in SQL Server 2012 and it works as expected - it sends the 15000 records in the csv.
I have tested this on several sql servers with 2014 edition on them, and I have the same issue on all of them. The query breaks off at different points on each sever - for example one of them breaks off at 209 records as i said above, another one at 307. The last record always gets truncated at the same place. The csv attachment size it's about 64 kb - which is well below the 4MB limit i've configured the database Maximum File Size bytes parameter.
What i am doing basically is creating a job that is meant to execute a stored procedure and send the results in a csv in an email. The stored procedure is something like:
ID - INT Machine - TINYINT StartTime - DATETIME EndTime - DATETIME
What I am trying to do is figure out how much time is used for production per day. The problem is, there are production runs that run over midnight and possible multiple days without ending. For example, if I have the following data:
I have duplicate records in table.I need to count duplicate records based upon Account number and count will be stored in a variable.i need to check whether count > 0 or not in stored procedure.I have used below query.It is not working.
SELECT @_Stat_Count= count(*),L1.AcctNo,L1.ReceivedFileID from Legacy L1,Legacy L2,ReceivedFiles where L1.ReceivedFileID = ReceivedFiles.ReceivedFileID and L1.AcctNo=L2.AcctNo group by L1.AcctNo,L1.ReceivedFileID having Count(*)> 0 IF (@_Stat_Count >0) BEGIN SELECT @Status = status_cd from status-table where status_id = 10 END
I have a robust query that returns a dataset and the data is good, however some of the records contain the exact same data with the exception of the 'Price' field. I want to combine the records that are identical and SUM the values in the 'Price' field. My query and example return dataset is below.
Query: -------- select distinct 'On-Demand' as 'Business Line', o.OrderID as 'Order #', isnull(d.DisplayCode,'UNK') as Hub, isnull(rz.RouteID,'UNK') as 'Default Route', 'On-Demand' as 'Assigned Route',
Hi i was wanting to know how to select a record in a gridview. I have a gridview with firsname and lastname. I currently have select command on it but don't want it. I want to be able to select first name or last name and have it take me to that record on the database.
I want to get some combined data from both tables, so right now I am joining them at the SessionStartTime column, which is a primary key in the first and a foreign key in the second table, something like this:
Code: SELECT DlIndexTable.SessionStartTime, DlTextDataTable.Channel01data FROM DlIndexTable LEFT JOIN DlTextDataTable ON DlIndexTable.SessionStartTime = DlTextDataTable.SessionStartTime WHERE DlIndexTable.SessionStartTime BETWEEN '2006-10-13 16:40:08.790' AND '2012-03-01 17:54:30.930' ORDER BY DlIndexTable.SessionStartTime, DlTextDataTable.ChTimestamp
The trouble is that this query, exactly as requested, gives me all the entries from the second table matching the first, while I really would like to pick just one row (preferably, the first chronologically - by ChTimestamp) so that the first column (SessionStartTime) has distinct entries in the resulting table. What would be the simplest way of doing that? Performance is not a big priority over simplicity since the first table could have only a few hundred rows (maybe a couple of thousand), while the second will be real tiny.