Comparing Millions Of Records From File A And B
Sep 11, 2006
I'm trying to compare about 28 million records (270 length) from table A and B using the Lookup task as described in this forum. The process works fine with about two million records or so on my desktop ( p.4 3.39GHz, 1.5 GB Ram), but hangs with the amount of data I'm trying to process. I tried using full and partial caching, but to no avail. I'm thinking this is a hardware resource problem. So, does anyone has any recommendation on the hardware needed for this kind of operation and/or suggestion? Thanks in advance...
View 8 Replies
ADVERTISEMENT
Sep 3, 2006
Hi all,I was given a task to create a houseHolding logic under a table thathave millions records.first let me explain what is a house holding:let's say I have 2 records that have the same phone number, that meanthat both records are under the same household, but this can get morecomplicatedthis article explain ithttp://www.teradata.com/t/page/115924/index.htmlif anyone worked with household he knows that you need to scan thetable many time to get all the house holds, I used a dts to do it.I tested the dts on 11 records like the article did and that workgreat, but once I went to million records each loop is taking me 2 houror so....a and I have no idea how how many loops I will have to do.if anyone out there worked with household queries and used sql, yourimput would help me allotthanks.
View 3 Replies
View Related
Aug 19, 2007
Hi
Please guide me for the below given problem.
While doing migration by using cursors for the below given sample data its taking more hours to complete the process. Therefore want to know is there any way I can do it in simple query.
ACNo Amount Balance CalType
A001 10 10 +
A001 10 20 -
A001 40 40 +
A001 10 30 -
A002 90 90 +
A002 20 110 +
A002 40 150 +
A003 10 30 +
A003 10 40 +
A003 10 30 -
A004 40 40 +
A004 10 30 -
Iam having Amount value alone and Balance has to be calculated value based on CalType. At the same time the Balance has to be reset as 0 when AcNo has changed.
Please guide me for the faster approach.
Regards,
Mohanraj
View 3 Replies
View Related
Sep 19, 2007
Hi,There are about 30 millions records on my mssql server and I want to access 2 million of them at one time. However, when I try to access with sql command I get time out error. I want to select first 100 record and select the other 100 and so on. May I obtain this?For example;select * from tbl_Customer where name = @name_ ->time out errorSomeone has said that you can solve this problem with < cursors > but I can't find enough article. Thanks...
View 3 Replies
View Related
Mar 15, 2004
If there is 13 million records in one table and 40 thousand records in another table then what is the fastest way of joining these two tables????
This was a question to me from somebody to which i cudn't answer back properly. Cud anybody tell the answer with properreasons behind the answer??????
Thanx.
View 7 Replies
View Related
Jul 23, 2012
i have a query to delete millions of records. I whant to delete in batches of a 1000. My Select join statement will return millions of records so this takes alot of time how to i select a 1000 records delete everything that his not in those record and loop and not select the same records again.Here is what i have :
DECLARE @i INT
WHILE (1=1)
BEGIN
BEGIN TRAN
DELETE TOP(1000) FROM dbo.ABC123
WHERE SUBSTRING(dumbdumb,1,8) NOT IN
[code]....
View 3 Replies
View Related
May 14, 2008
Hi Teachies,
I am using SQL Server Standard Edition .
in one of my table , there i am inserting around 25 millions records and that takes time around more than 3 hrs.
same thing is happening while fetching records from that table.
this database contains only single file group i.e primary
and that table contains .. Clustered as well as non clustered index.
it doesnot have any Triggers.
How do i increase this performance.
Paritioning of table cannot be use in SQL Server Standard Edition.
Or Dropping all non clustered index before insert operation will improve my performance.
Please suggest me.
Thanks
Rajesh Varma
View 2 Replies
View Related
Feb 27, 2007
The iussue:Sql 2KI have to keep in the database the data from the last 3 months.Every day I have to load 2 millions records in the database.So every day I have to export (in an other database as historical datacontainer) and delete the 2 millions records inserted 3 month + one day ago.The main problem is that delete operation take a while...involvingtransaction log.The question are:1) How can I improve this operation (export/delete)2) If we decide to migrate to SQL 2005, may we use some feature, as"partitioning" to resolve the problems ? In oracle I can use the "truncatepartition" statement, but in sql 2005, I'm reading, it cant be done.This becouse we can think to create a partition on the last three mounts tosplit data. The partitioning function can be dinamic or containing afunction that says "last 3 months ?" I dont think so.May you help usthank youMastino
View 5 Replies
View Related
Sep 4, 2007
Hello all
quick question-- Im looking for the most effiecient way to extract data daily from a table with some 9.5 mill records and growing. These are transaction records and ideally I would like to bring over the last days transactions and add them to my existing table. I cannot use the transaction date as sometime we have to operate in an "Offline" mode where the records are brought over sometime later. This could be days are unfortunetaly a week or more. there are some 30 fields in the transaction table so is there a more efficient way to do this simply creating a concatenated key?? Would it be more effiecient to drop and recreate the table daily? that sounds extreme so wanted to get a few ideas.
Thanks in Advance
km
View 2 Replies
View Related
May 19, 2008
Hi Teachies,
I am using SQL Server Standard Edition with good HardWare configuration.
In one of table i am inserting around 25 millions records and that takes time around more than 3 hrs.
same thing is happening while fetching records from that table.
this database contains only single file group i.e primary
and that table contains .. Clustered as well as non clustered index.
it doesnot have any Triggers.
How do i increase this performance.
Paritioning of table cannot be use in SQL Server Standard Edition.
Or Dropping all non clustered index before insert operation will improve my performance.
Please find the details ..
SERVER CONFIGURATION:
Intel pentium (R) 4 CPU
2.88 GHZ,
2.79 GHZ ,2 GB RAM
Operating System: WINDOWS 2003 R2 STANDARD SERVICE PACK 2
Microsoft SQL Server 2005 - 9.00.1399.06 (Intel X86)
Oct 14 2005 00:33:37
Copyright (c) 1988-2005 Microsoft Corporation
Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 2)
DAtABASE DETAILS:
MDF and LDF located on C: Drive
Available Space on C: DRIVE 2.94 GB
TABLES DETAILS
CREATE TABLE [dbo].[TIX_PAYMENT_SCHEDULE](
[PaymentScheduleId] [bigint] IDENTITY(1,1) NOT NULL,
[OwedAmountId] [int] NULL, --NonClusteredIndex
[ProposalId] [int] NOT NULL, --NonClusteredIndex
[BrandId] [int] NULL, -- NonClusteredIndex
[DueDate] [datetime] NULL, --NonClusteredIndex
[OverdueDate] [datetime] NULL,--NonClusteredIndex
[ExpectedAmount] [decimal](18, 2) NULL,
[TransactionStatusId] [tinyint] NULL,--NonClusteredIndex
[IsLate] [char](1) NULL,
[IsPaymentReceived] [char](1) NULL,
[ScheduleBatchJournalId] [bigint] NULL,--NonClusteredIndex
[IsValidSchedule] [char](1) NULL,
[RuleId] [int] NULL,
[ActionId] [int] NOT NULL,
[ReasonId] [tinyint] NULL,
[Comments] [nvarchar](2000) NULL,
[NoofDays] [int] NULL,
[ActualAmountReceived] [decimal](18, 2) NULL,
[CreatedBy] [uniqueidentifier] NULL,
[CreatedDateTime] [datetime] NOT NULL,
[LastUpdatedBy] [uniqueidentifier] NULL,
[LastUpdatedDateTime] [datetime] NOT NULL,
[CaseScheduleId] [bigint] NULL,--NonClusteredIndex
[ActionDate] [datetime] NULL,
[HasExactMatch] [char](1) NULL,
[IsCatchupBalanced] [char](1) NULL,
[HasModified] [char](1) NULL,--NonClusteredIndex
[PendDate] [datetime] NULL,
[IsAutoAccept] [char](1) NULL,
[CatchupBalanceIdentifier] [uniqueidentifier] NULL,--NonClusteredIndex
CONSTRAINT [PK_TIX_PAYMENT_SCHEDULE] PRIMARY KEY CLUSTERED
(
[PaymentScheduleId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
TABLE 2
CREATE TABLE [dbo].[TIX_PAYMENT_CASE_SCHEDULE](
[CaseScheduleId] [bigint] IDENTITY(1,1) NOT NULL,
[ProposalId] [int] NOT NULL,--NonClusteredIndex
[DueDate] [datetime] NOT NULL,
[OverDueDate] [datetime] NOT NULL,
[TotalExpectedAmount] [decimal](18, 2) NOT NULL,
[TotalActualPaymentReceived] [decimal](18, 2) NOT NULL,
[TransactionStatusId] [int] NOT NULL,--NonClusteredIndex
[ActionId] [int] NULL,
[CreatedBy] [uniqueidentifier] NULL,
[CreatedDateTime] [datetime] NULL,
[LastUpdatedBy] [uniqueidentifier] NULL,
[LastUpdatedDateTime] [datetime] NULL,
[IsValidSchedule] [char](1) NULL,
[ScheduleBatchJournalId] [bigint] NULL,
[IsCatchupBalanced] [char](1) NULL,
[HasModified] [char](1) NULL,--NonClusteredIndex
[CatchupBalanceIdentifier] [uniqueidentifier] NULL,
CONSTRAINT [PK_TIX_PAYMENT_CASE_SCHEDULE] PRIMARY KEY CLUSTERED
(
[CaseScheduleId] ASC
)WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
STORED PROCEDURE:
CREATE PROC [dbo].[TIX_PRC_GENERATE_PAYMENTSCHEDULE_DATA]
(
@XMLParams XML,
@ToDate datetime,
@HasModified char(1)
)
AS
BEGIN
SET NOCOUNT ON
--Exception Handling Variable Declaration
DECLARE @ErrorMessage NVARCHAR(200),
@ErrorNumber INT,
@ErrorSeverity INT,
@ErrorState INT,
@ErrorProcedure NVARCHAR(50),
@ErrorLine INT,
@ErrorDesc NVARCHAR(100)
DECLARE @XMLPayment INT
BEGIN TRY
IF @XMLParams IS NOT NULL
BEGIN --BEGIN IF
SET @ErrorDesc='Error Occured While Inserting into TIX_PAYMENT_SCHEDULE FROM XML'
INSERT INTO TIX_PAYMENT_SCHEDULE
(
OwedAmountId,
ProposalId,
BrandId,
DueDate,
OverdueDate ,
CreatedDateTime,
LastUpdatedDateTime,
ExpectedAmount,
ActualAmountReceived,
ScheduleBatchJournalId,
RuleId,
TransactionStatusId,
ActionId,
IsLate,
IsPaymentReceived ,
IsValidSchedule,
--Added by DC : 119
IsCatchupBalanced,
CatchupBalanceIdentifier,
HasModified
---------------------------------------------------
)
SELECT
Main.ELEMENT.value('(OwedAmountId)[1]','int') AS OwedAmountId,
Main.ELEMENT.value('(ProposalId)[1]','int') AS ProposalId,
Main.ELEMENT.value('(BrandId)[1]','int') AS BrandId,
convert(datetime,Main.ELEMENT.value('(DueDate)[1]','varchar(100)')) AS DueDate,
convert(datetime,Main.ELEMENT.value('(OverdueDate)[1]','varchar(100)')) AS OverdueDate,
@ToDate AS CreatedDateTime,
@ToDate AS LastUpdatedDateTime,
convert(decimal(18,2),Main.ELEMENT.value('(ExpectedAmount)[1]','varchar(100)')) AS ExpectedAmount,
convert(decimal(18,2),Main.ELEMENT.value('(ActualAmountReceived)[1]','varchar(100)')) AS ActualAmountReceived,
Main.ELEMENT.value('(ScheduleBatchJournalId)[1]','bigint') AS ScheduleBatchJournalId,
Main.ELEMENT.value('(RuleId)[1]','int') AS RuleId,
Main.ELEMENT.value('(TransactionStatusId)[1]','int') AS TransactionStatusId,
Main.ELEMENT.value('(ActionId)[1]','int') AS ActionId,
Main.ELEMENT.value('(IsLate)[1]','char(1)') AS IsLate,
Main.ELEMENT.value('(IsPaymentReceived)[1]','char(1)') AS IsPaymentReceived,
Main.ELEMENT.value('(IsValidSchedule)[1]','char(1)') AS IsValidSchedule
--Added by DC for 119
,Main.ELEMENT.value('(IsCatchupBalanced)[1]','char(1)') AS IsCatchupBalanced
,Main.ELEMENT.value('(CatchupBalanceIdentifier)[1]','nvarchar(1000)') AS CatchupBalanceIdentifier
,@HasModified
---------------------------------------------------------------------
FROM @XMLParams.nodes ('(/ROOT/DATA)') AS Main(ELEMENT)
END--END IF
END TRY--Main END TRY
BEGIN CATCH --Main BEGIN CATCH
SELECT @ErrorMessage = @ErrorDesc+Char(13)+Error_Message(),
@ErrorSeverity = Error_Severity(),
@ErrorState = Error_State(),
@ErrorNumber = Error_Number(),
@ErrorProcedure = Error_Procedure(),
@ErrorLine = Error_Line()
RAISERROR(
@ErrorMessage,
@ErrorSeverity,
@ErrorState,
@ErrorNumber,
@ErrorProcedure,
@ErrorLine
)
END CATCH --Main END CATCH
END --Main END
STOREDPROCEDURE 2
CREATE PROCEDURE [dbo].[TIX_PRC_GET_PAYMENTSCHEDULE_SCHEDULE_FOR_DATE_RANGE]
(
@ToDate datetime,
@IsValidSchedule char(1)
)
AS
BEGIN
SET NOCOUNT ON
--Execption Handling Variable Declaration
DECLARE @ErrorMessage NVARCHAR(200),
@ErrorNumber INT,
@ErrorSeverity INT,
@ErrorState INT,
@ErrorProcedure NVARCHAR(50),
@ErrorLine INT,
@ErrorDesc NVARCHAR(100)
BEGIN TRY --Exception Handling
SET @ErrorDesc='Error Occured while fetching records from TIX_PAYMENT_SCHEDULE'
SELECT
PaymentScheduleId,
OwedAmountId,
ProposalId,
DueDate,
OverdueDate,
ExpectedAmount,
TransactionStatusId,
IsPaymentReceived,
IsLate,
ActionId,
ActualAmountReceived,
IsValidSchedule,
BrandId,
CaseScheduleId,
ReasonId,
Comments,
NoOfDays,
ActionDate,
IsCatchupBalanced,
CatchupBalanceIdentifier,
HasModified
from TIX_PAYMENT_SCHEDULE with (nolock)
WHERE DUEDATE <=@ToDate AND IsValidSchedule=@IsValidSchedule
SELECT DISTINCT OwedAmountId,proposalId,brandId from TIX_PAYMENT_SCHEDULE with (nolock) WHERE DUEDATE <=@ToDate AND IsValidSchedule=@IsValidSchedule Order By OwedAmountId,ProposalId,BrandId asc
SELECT DISTINCT ProposalId from TIX_PAYMENT_SCHEDULE with (nolock) WHERE DUEDATE <=@ToDate AND IsValidSchedule=@IsValidSchedule Order By ProposalId asc
END TRY
BEGIN CATCH
SELECT @ErrorMessage=@ErrorDesc+CHAR(13)+ Error_Message(),
@ErrorNumber=Error_Number(),
@ErrorState=Error_State(),
@ErrorProcedure=Error_Procedure(),
@ErrorLine=Error_Line(),
@ErrorSeverity=Error_Severity()
RAISERROR(
@ErrorMessage,
@ErrorSeverity,
@ErrorState,
@ErrorNumber,
@ErrorProcedure,
@ErrorLine
)
END CATCH
END
Thanks & Regards
Rajesh Varma
View 6 Replies
View Related
Jun 21, 2007
hi!
i am using sql 2005 k with sp1.
I want to delete 30-40 million rows from a transactional table. Whats the fastest way to delete these rows. just to delete 300,000 rows it takes 30 min. also i don't want to truncate the table.
any help would be appreciated
Thanks
View 9 Replies
View Related
May 14, 2008
Hi Teachies,
I am using SQL Server Standard Edition with good HardWare configuration.
In one of table i am inserting around 25 millions records and that takes time around more than 3 hrs.
same thing is happening while fetching records from that table.
this database contains only single file group i.e primary
and that table contains .. Clustered as well as non clustered index.
it doesnot have any Triggers.
How do i increase this performance.
Paritioning of table cannot be use in SQL Server Standard Edition.
Or Dropping all non clustered index before insert operation will improve my performance.
Please suggest me.
Thanks
Rajesh Varma
View 17 Replies
View Related
Apr 25, 2008
Hi,
My full text search on 2 millions records is taking time to show the result.
I have created full text catalog in RAM drive to make the retrival process faster.
But still its taking more than 1 minute to get the matching pattern.
I am using SQL server 2005. I have 2 columns (id,text) in my table..
This is my unique index script
USE [SAMPLE]
GO
CREATE UNIQUE NONCLUSTERED INDEX [ui_productid] ON [dbo].[Products]
(
[id] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]..
This is my primary key index script..
USE [SAMPLE]
GO
ALTER TABLE [dbo].[Products] ADD CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]..
This is my query..
SELECT D.[id], D.productname
FROM dbo.Products AS D
WHERE CONTAINS(productname, 'ford')
What should i do to show the result in 3-4 seconds.
View 1 Replies
View Related
Apr 25, 2014
Sample Table
USE [Testing]
GO
/****** Object: Table [dbo].[Testing] Script Date: 4/25/2014 11:08:18 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
[Code] ....
It seems to work fine with one million records.
Each primary key is unique, but the begindate is non-unique, and i guess even if i use datetime2 and add nanoseconds, from what i have read, there is a chance that i could have a duplicate datetime since the date is imported via XML from multiple sources.
View 7 Replies
View Related
Oct 24, 2007
Hi,
I have a table with 50 + columns....
I have a single record and i want to basically say are there any that are the same as this one.. but without using 50 where clauses...
Any Suggestions
View 7 Replies
View Related
May 9, 2008
Hey Guys, I have a contacts table that contains ID, First Name, Last Name, and Phone Number, Date Entered, Changed. Every time, the data is modified and saved, it will insert a new record in the table. So, Ill create a new record for a contact named Ryan, and then come back a day later and update the last name and phone number. So theSQL table would look like...1 Ryan Scott 818-550-0000 05/08/2008 Null2 Ryan Peters 000-000-0000 05/09/2008 Null How do I write a sql query that will run an update after the insert of the second record to fill in the Changed field with the data that changed?So I want to have record 2, end up looking like this... 2 Ryan Peters 000-000-0000 05/09/2008 LastName,PhoneNumberAny ideas?
View 8 Replies
View Related
Jul 3, 2001
Thanks for your help...
I have two databases in two different servers, I am running this script which shows customers not in the second server. I am getting an error shown below. any idea of how to solve this issue.
Ali
CREATE view v_show_customers_not_in_GP as
select customer_id,company_name,contact_fname,contact_lna me,phone,alt_phone,fax,email,street_1,street_2,cit y,c.name,s.code as state,zip_code
FROM customer v,country c, state s
WHERE v.country_id =c.country_id
and v.state_id = s.state_id
and convert(char(15),customer_id ) NOT IN (select custnmbr from servername.dbname.dbo.RM00101 )
Server: Msg 18452, Level 14, State 1, Line 1
Login failed for user '(null)'. Reason: Not associated with a trusted SQL Server connection.
View 1 Replies
View Related
Sep 29, 2015
I need to compare records between two tables. There is no ID in the tables to do a simple join between them. So, what I'm looking for is: get the first record from table1 and read all record from table2 and give me back the most similar record. The String Distance is a predefined function.
Select a.table1
,b.table2
from table1 a, table2 b
where StringDistance (''a.table1,'b.table2') >90
View 4 Replies
View Related
Jul 31, 2015
Below is the code for two data sets and I can't seem to get my head around the issue. I need to find the number of 'ER' visits and 'IN' visits, separately, in dbo.VisitData for the 'Active' patients in dbo.PatientStatus. So, consider patient 69. He is Active on 5/5/2014 but becomes Inactive on 9/15/2014. I only want to count the number of visits ER or IN that are between those dates. In addition if patient 69 becomes active again after 9/15/2014, I need to capture that data as well. Patients can change there status multiple times.
create table dbo.PatientStatus
as
(
patient_id varchar(10),
status_type varchar(10),
status_date datetime
[Code] ....
View 2 Replies
View Related
Jul 20, 2005
I have two tables of book information. One that has descriptions of thebook in it, and the isbn, and the other that has the book title,inventory data, prices, the isbn.Because of some techncal constraints I won't get into now, I can'tcombine them both into one table. No problem. Things are going fine aslong as there is a description in the one table to corrispond to theisbn and other data in the other table.However, about half of the products are not yet entered into thedescrition table. I'd like to run a sql query that pulls up all theisbns that don't exist in the other. In other words, I'd like to get aquery that tells me exactly which isbns do not yet have descrition datain them. I know there is some sql that says to search from one filewhere the number does not exist in the other, but it slips my mind. Cansomeone help me on this please?Thank you!Bill*** Sent via Developersdex http://www.developersdex.com ***Don't just participate in USENET...get rewarded for it!
View 2 Replies
View Related
Nov 20, 2015
I have this 40,000,000 rows table... I am trying to clean this 'Contacts' table since I know there are a lot of duplicates.
At first, I wanted to get a count of how many there are.
I need to compare records where these fields are matched:
MATCHED: (email, firstname) but not MATCH: (lastname, phone, mobile).
MATCHED: (email, firstname, mobile)
But not MATCH: (lastname, phone)
MATCHED: (email, firstname, lastname)
But not MATCH: (phone, mobile)
View 9 Replies
View Related
Aug 21, 2015
I have a scenario to compare previous records based on each ID columns. For each ID, there would be few records, I have a column called "compare", We have to compare all Compare 1 records with Compare 0 Records. If Dt is lesser or equal to comparing DT, then show 0. Else 1
We always only one Compare 0 records in my table, so all compare 1 columns will compare with only one row per ID
My tables look like
Declare @tab1 table (ID Varchar(3), Dt Date, Compare Int)
Insert Into @tab1 values ('101','2015-07-01',0)
Insert Into @tab1 values ('101','2015-07-02',1)
Insert Into @tab1 values ('101','2015-07-03',1)
Insert Into @tab1 values ('101','2015-07-01',1)
Insert Into @tab1 values ('101','2015-06-30',1)
Insert Into @tab1 values ('102','2015-07-01',0)
Insert Into @tab1 values ('102','2015-07-02',1)
Insert Into @tab1 values ('102','2015-07-01',1)
select * from @tab1
1.) In the above scenario for ID = '101', we have 5 records, first record has Compare value 0, which mean all other 4 records need to compare with this record only
2.) If Compare 1 record's Dt is less or equal to Compare 0's DT, then show 0 in next column
3.) If Compare 1 record's Dt is greater than Compare 0's DT, then show 1 in next column
My expected result set should be like ....
View 10 Replies
View Related
Jun 2, 2015
I'm trying to avoid a large amount of manual data manipulation.
Here's the background: Legacy system that has (well let's call apples apples) pretty much no method of enforcing data integrity, which has caused a fairly decent amount of garbage data to be inserted in some tables. Pulling one of the [Individuals] table from within this Legacy system and inserting it into a production system, into the Table schema currently in place to track [Individuals] in this Production system.
Problem: Inserting the information is easy, how to deduplicate the records that exist within the staging table that the legacy [Individuals] table has been dumped into in production, prior to insertion. (Wanting to do this programmatically with SQL or SSIS preferably, so that I can alter it later to allow for updating existing/inserting new)
Staging Table Schema:
;
CREATE TABLE [dbo].[stage_Individuals](
[SysID] [int] NULL, --Unique, though it's not an index intended to identify the [Individuals]
[JJISID] [nvarchar](10) NULL,
[NameLast] [nvarchar](30) NULL,
[NameFirst] [nvarchar](30) NULL,
[NameMiddle] [nvarchar](30) NULL,
[code]....
Scenario: There are records that duplicate the JJISID, though this value is supposed to be unique for every individual. The SYSID is just a Clustered Index (I'm assuming) within the Legacy system and will be most likely dropped when inserted into the Production [Inviduals] table. There are records that are missing their JJISID, though this isn't supposed to happen either, but have valid information within SSN/DOB/Name/etc that can be merged into the correct record that has a JJISID assigned. There is really no data conformity, some records have NULLS for everything except JJISID, or some records will have all the [Individuals] information excluding the JJISID.
Currently I am running the following SQL just to get a list of the records that have a duplicate JJISID (I have other's that partition by Name/DOB/etc and will adapt whatever I come up with to be used for those as well):
;
select j.*
from (select ROW_NUMBER() OVER (PARTITION BY JJISID ORDER BY JJISID) as RowNum, stage_Individuals.*, COUNT(*) OVER (partition by jjisid) as cnt from stage_Individuals) as j
where cnt > 1 and j.JJISID is not nullNow, with SQL Server 2012 or later I could use LAG and LEAD w/ the RowNum value to do my data manipulation...but that won't work because we are on SQL Server 2008 in this environment.
[URL]
With, the following as a potential solution:
GSquared (3/16/2010)Here's a query that seems to do what you need. Try it, let me know if it works.
Performance on it will be a problem, but I can't fine tune that. You'll need to look at various method for getting this kind of data from the table and work out which variation will be best for your data. Without access to the actual table, I can't do that.
;
WITH CTE
AS (SELECT master_id,
MIN(ID) AS first_id,
MAX(Account_Expiry) AS latest_expiry
FROM #People
GROUP BY master_id)
SELECT P1.master_id,
[code].....
Unfortunately, I don't think that will accomplish what I'm looking for - I have some records that are duplicated 6 times, and I'm wanting to keep the values within these that aren't NULL.
Basically what I'm looking for, is to update any column with a NULL value to the corresponding Duplicate [Individuals] record value for that column.
**EDIT - Example, Record 1 has a JJISID with NULL NameFirst & NameLast BUT Record 2 has the same JJISID and values for NameFirst & NameLast. I'm wanting to propogate the NameFirst & NameLast from Record2 into Record1
View 6 Replies
View Related
Jun 10, 2015
I have a problem where I have 2 compare 2 records from the same table. This part looks easy but the problem is for a User there can be multiple records and I have 2 compare each record with its previous instance based on the timestamp. Not only I have to compare I have to perform some analysis. Below is the Table script and sample output.
Givens: All SQL Server 2008 or 2012 tools at your disposal.
Production database contains the following tables (simplified for example: constraints ignored, etc.) associated with a racing video game’s server.
-- A player of our game
-- Table greater than 10 million rows
CREATE
TABLE [dbo].[User]
(
[UserId]
[bigint] NOT
NULL
,[country]
[int] NULL
-- User’s home country
,[name]
[nvarchar](15)
NULL -- User’s displayable name (‘John’, ‘Bill’)
,[subscriptionTier]
[int] NULL
)
-- 0 == free, 1 == paid, for instance
Assume that rows get written into the event tables at a rate of 1,000 a minute,are never updated once written and currently are only read on a replica/reporting server.
Question Background: Write up a single query that would return the following: List of users and whose “TotalMoneyEarned” value ever grew (between logon events) at a rate of more than 1,000 per minute (we’d consider these suspicious and flag them for later investigation).
For instance, if the sample data were:
-- example of [Events.UserLogon] data -- not the query output we want
EventId UserId TotalMoneyEarned LogonDate
----------- -------------------- ---------------- -----------------------
1 1 1000 2010-10-16 00:19:56.460
2 1 1500 2010-10-16 00:20:56.460
3 1 3000 2010-10-16 00:21:56.460
4 1 10000 2010-10-16 00:29:56.460
Event 1 is okay because there’s nothing to compare it against
Event 2 is okay because the TotalMoneyEarned only grew 500 in a minute
Event 3 should be flagged, as the value grew 1500 in a minute
Event 4 is okay, as it grew 7,000 in 8 minutes (< 1000 per minute)
Query Output (your query should return data in a format like this):
User Flagged Logon Time Rate Since Last Logon (money/minute)
John 2010-10-16 00:21:56 1500
Dave 2010-10-16 00:30:50 3200
Bill 2010-10-16 00:35:23 1000
It is likely that you will need to create sample data for both the User and [Events.Logon] tables. We are looking for a single query that returns data like what is represented in Query Output.
View 3 Replies
View Related
Jun 6, 2012
I need to compare the contents of a column.
I have a table with 1 column. Below are the sample contents..
DateInterval
2012-06-01 00:30:00.000
2012-06-01 01:00:00.000
2012-06-01 01:30:00.000
2012-06-01 02:00:00.000
2012-06-01 02:30:00.000
[code]....
as you can see, the records have a 30minutes time interval. i need to create a query to know if there are missing records in the table. so basically the result should be this:
DateInterval
2012-06-01 18:00:00.000
2012-06-01 20:30:00.000
if it is not doable, we can get the nearest record of all missing records like this:
DateInterval
2012-06-01 17:30:00.000
2012-06-01 20:00:00.000
or this:
DateInterval
2012-06-01 18:30:00.000
2012-06-01 21:00:00.000
or this:
DateInterval
2012-06-01 17:30:00.000
2012-06-01 18:30:00.000
2012-06-01 20:00:00.000
2012-06-01 21:00:00.000
View 6 Replies
View Related
Aug 4, 2015
I am creating one SSIS package where my source is oracle. I have transferred the data from Oracle to flat file as per client requirement.I have to create single package for 2 country 1 is US and another is CANADA Columns are below
ZONE_ID,
ZONE_NAME
Zone Id having data like 10001,10002,10003,20001,2002,2003
Where zone_id start with 1000 is US Zone and Zone_Id start with 2000 is Canada Zone.
For US:
1. Load geography data from DB tables into flat files
2. Load geography data from flat files to Spectrum DB tables
For Canada:
1. Load geography data from DB tables into flat files
2. Load geography data from flat files to Spectrum DB tables
Now I want to look from flat file if Zone_id start with 1000 then it must go to US_DFT and if Zone_id start with 2000 then it must go to CANADA_DFT.
View 2 Replies
View Related
Apr 17, 2008
Hello friends,
I have the following (simplified):
1. Flat File Source
2. Conditional Split, Case Good = !ISNULL(KEY) Case Error = ISNULL(KEY)
3. Case Good -> Writes to Good Flat File (with timestamp in the title)
4. Case Error -> Writes to Error Flat File (with timestamp in the title)
Most job runs have no errors but the error file is created as a zero byte file anyway. If there are no error records I don't want the error file created. How might I accomplish this?
Thanks
View 5 Replies
View Related
May 26, 2004
hi everybody,
i have an application that generate a lot rows from 1 mellion to 2 mellions rows
i wana insert this record in MS SQL server in a fast way
i am currentlly loop through this records while it is loaded in dataset
building a command text that generate insert query for each row
and run it against SQL server
but it takes a lot of time to be finished
is there r a way to bulk insert this data?
thanks 4 ur help.
Bolos
View 7 Replies
View Related
Jul 27, 2007
Hi everyone,
I have been trying to store millions of files (layer tiles) in NTFS now for a while with little success as the read/write speed drops off the chart or NTFS itself gets corrupted. My questions are:
1) Is there any way to store millions of files successfully in NTFS?
2) What does VE use to store its tiles (I am guessing SQL Server)?
3) If VE does not use SQL Server to store tile images, has anyone tried it and what are the pros/cons?
Any help would be greatly appreciated...
Thanks in advance,
Matt
View 2 Replies
View Related
Feb 21, 2008
Hi everyone - I have an ETL package which loads about 10 million rows from SQL 2005 staging tables to new, empty tables (no indexes or constraints) in another SQL 2005 DB to be SWITCHED into the main partioned data tables.
Both databases reside on the same SQL Server instance - it is a dev server so the disk aren't super fast/SAN speeds but it has plenty of RAM/CPU & SCSI disks.
The insert takes about 45 minutes - can I get this working any faster?? or is this typical for 10 million rows?? I've messed about withe the data flow a few times but I can't seem to get any significant improvements.
Any tips anyone??
I perform several lookups on dimensions - these are not cached.
I do query the source table concurrently with different WHERE clauses & run two pipelines processing the data into 2 destination tables.
Would it be better to query the base table once & use a conditional split instead of the two separate queries??
I also mulicast from each pipeline & use a UNION ALL to log some of the rows from each pipeline to anther destination table.
Hope this makes sense?? Any ideas or tips on how I can speed up this kinda transform would be appeciated..
I'm using oleDB connections.
Hope ths makes some kinda sense!! Thanks for any advice!!
Sinister Pengiun
View 5 Replies
View Related
Oct 11, 2007
Hello,Currently we are in the process of implementing a sql server database where couple tables will have millions of rows ( about 98 millions and will grow) and a web site that will retrieve and sort the data ( read only). How asp.net gridview and sqldatareader act situation like that? Will it be a very slow response? Is there any alternative? Is there any example on the net?
Assuming tables are well tuned and well indexed.
Thank you in advance.
View 4 Replies
View Related
Aug 10, 2006
Hi,
I have a DataTable in memory and I want to write a C# code to dump the data into a SQL database. Is there a faster way of dumping millions of rows into a SQL table besides running INSERT INTO row by row?
Thank you,
Jina
View 3 Replies
View Related
Mar 27, 2006
I have a table with first name, last name, SSN(social security number)and other columns.I want to assign group number according to this business logic.1. Records with equal SSN and (similar first name or last name) belongto the same group.John Smith 1234Smith John 1234S John 1234J Smith 1234John Smith and Smith John falls in the same group Number as long asthey have similar SSN.This is because I have a record of equal SSN but the first name andlast name is switched because of people who make error inserting lastname as first name and vice versa. John Smith and Smith John will haveequal group Name if they have equal SSN.2. There are records with equal SSN but different first name and lastname. These belong to different group numbers.Equal SSN doesn't guarantee equal group number, at least one of thefirst name or last name should be the same. John Smith and Dan Brownwith equal SSN=1234 shouldn't fall in the same group number.Sample data:Id Fname lname SSN grpNum1 John Smith 1234 12 Smith John 1234 13 S John 1234 14 J Smith 1234 15 J S 1234 16 Dan Brown 1234 27 John Smith 1111 3I have tried this code for 65,000 rows. It took 20 minute. I have torun it for 21 million row data. I now that this is not an efficientcode.INSERT into temp_FnLnSSN_grpSELECT c1.fname, c1.lname, c1.ssn AS ssn, c3.tu_id,(SELECT 1 + count(*)FROM distFLS AS c2WHERE c2.ssn < c1.ssnor (c2.ssn = c1.ssn and (substring(c2.fname,1,1) =substring(c1.fname,1,1) or substring(c2.lname,1,1) =substring(c1.lname,1,1)or substring(c2.fname,1,1) =substring(c1.lname,1,1) or substring(c2.lname,1,1) =substring(c1.fname,1,1)))) AS group_numberFROM distFLS AS c1JOIN tu_people_data AS c3ON (c1.ssn = c3.ssn andc1.fname = c3.fname andc1.lname= c3.lname)dist FLS is distinct First Name, last Name and SSN table from thepeople table.I have posted part of this question, schema one week ago. Please referthis thread.http://groups.google.com/group/comp...6eb380b5f2e6de6
View 5 Replies
View Related