I'm trying to compare about 28 million records (270 length) from table A and B using the Lookup task as described in this forum. The process works fine with about two million records or so on my desktop ( p.4 3.39GHz, 1.5 GB Ram), but hangs with the amount of data I'm trying to process. I tried using full and partial caching, but to no avail. I'm thinking this is a hardware resource problem. So, does anyone has any recommendation on the hardware needed for this kind of operation and/or suggestion? Thanks in advance...
Hi all,I was given a task to create a houseHolding logic under a table thathave millions records.first let me explain what is a house holding:let's say I have 2 records that have the same phone number, that meanthat both records are under the same household, but this can get morecomplicatedthis article explain ithttp://www.teradata.com/t/page/115924/index.htmlif anyone worked with household he knows that you need to scan thetable many time to get all the house holds, I used a dts to do it.I tested the dts on 11 records like the article did and that workgreat, but once I went to million records each loop is taking me 2 houror so....a and I have no idea how how many loops I will have to do.if anyone out there worked with household queries and used sql, yourimput would help me allotthanks.
While doing migration by using cursors for the below given sample data its taking more hours to complete the process. Therefore want to know is there any way I can do it in simple query.
Iam having Amount value alone and Balance has to be calculated value based on CalType. At the same time the Balance has to be reset as 0 when AcNo has changed.
Hi,There are about 30 millions records on my mssql server and I want to access 2 million of them at one time. However, when I try to access with sql command I get time out error. I want to select first 100 record and select the other 100 and so on. May I obtain this?For example;select * from tbl_Customer where name = @name_ ->time out errorSomeone has said that you can solve this problem with < cursors > but I can't find enough article. Thanks...
i have a query to delete millions of records. I whant to delete in batches of a 1000. My Select join statement will return millions of records so this takes alot of time how to i select a 1000 records delete everything that his not in those record and loop and not select the same records again.Here is what i have :
The iussue:Sql 2KI have to keep in the database the data from the last 3 months.Every day I have to load 2 millions records in the database.So every day I have to export (in an other database as historical datacontainer) and delete the 2 millions records inserted 3 month + one day ago.The main problem is that delete operation take a while...involvingtransaction log.The question are:1) How can I improve this operation (export/delete)2) If we decide to migrate to SQL 2005, may we use some feature, as"partitioning" to resolve the problems ? In oracle I can use the "truncatepartition" statement, but in sql 2005, I'm reading, it cant be done.This becouse we can think to create a partition on the last three mounts tosplit data. The partitioning function can be dinamic or containing afunction that says "last 3 months ?" I dont think so.May you help usthank youMastino
Hello all quick question-- Im looking for the most effiecient way to extract data daily from a table with some 9.5 mill records and growing. These are transaction records and ideally I would like to bring over the last days transactions and add them to my existing table. I cannot use the transaction date as sometime we have to operate in an "Offline" mode where the records are brought over sometime later. This could be days are unfortunetaly a week or more. there are some 30 fields in the transaction table so is there a more efficient way to do this simply creating a concatenated key?? Would it be more effiecient to drop and recreate the table daily? that sounds extreme so wanted to get a few ideas.
) AS BEGIN SET NOCOUNT ON --Exception Handling Variable Declaration DECLARE @ErrorMessage NVARCHAR(200), @ErrorNumber INT, @ErrorSeverity INT, @ErrorState INT, @ErrorProcedure NVARCHAR(50), @ErrorLine INT, @ErrorDesc NVARCHAR(100)
SET @ErrorDesc='Error Occured While Inserting into TIX_PAYMENT_SCHEDULE FROM XML'
INSERT INTO TIX_PAYMENT_SCHEDULE ( OwedAmountId, ProposalId, BrandId, DueDate, OverdueDate , CreatedDateTime, LastUpdatedDateTime, ExpectedAmount, ActualAmountReceived, ScheduleBatchJournalId, RuleId, TransactionStatusId, ActionId, IsLate, IsPaymentReceived , IsValidSchedule, --Added by DC : 119 IsCatchupBalanced, CatchupBalanceIdentifier, HasModified --------------------------------------------------- ) SELECT Main.ELEMENT.value('(OwedAmountId)[1]','int') AS OwedAmountId, Main.ELEMENT.value('(ProposalId)[1]','int') AS ProposalId, Main.ELEMENT.value('(BrandId)[1]','int') AS BrandId, convert(datetime,Main.ELEMENT.value('(DueDate)[1]','varchar(100)')) AS DueDate, convert(datetime,Main.ELEMENT.value('(OverdueDate)[1]','varchar(100)')) AS OverdueDate, @ToDate AS CreatedDateTime, @ToDate AS LastUpdatedDateTime, convert(decimal(18,2),Main.ELEMENT.value('(ExpectedAmount)[1]','varchar(100)')) AS ExpectedAmount, convert(decimal(18,2),Main.ELEMENT.value('(ActualAmountReceived)[1]','varchar(100)')) AS ActualAmountReceived, Main.ELEMENT.value('(ScheduleBatchJournalId)[1]','bigint') AS ScheduleBatchJournalId, Main.ELEMENT.value('(RuleId)[1]','int') AS RuleId, Main.ELEMENT.value('(TransactionStatusId)[1]','int') AS TransactionStatusId, Main.ELEMENT.value('(ActionId)[1]','int') AS ActionId, Main.ELEMENT.value('(IsLate)[1]','char(1)') AS IsLate, Main.ELEMENT.value('(IsPaymentReceived)[1]','char(1)') AS IsPaymentReceived, Main.ELEMENT.value('(IsValidSchedule)[1]','char(1)') AS IsValidSchedule
--Added by DC for 119
,Main.ELEMENT.value('(IsCatchupBalanced)[1]','char(1)') AS IsCatchupBalanced ,Main.ELEMENT.value('(CatchupBalanceIdentifier)[1]','nvarchar(1000)') AS CatchupBalanceIdentifier ,@HasModified ---------------------------------------------------------------------
FROM @XMLParams.nodes ('(/ROOT/DATA)') AS Main(ELEMENT)
SELECT @ErrorMessage = @ErrorDesc+Char(13)+Error_Message(), @ErrorSeverity = Error_Severity(), @ErrorState = Error_State(), @ErrorNumber = Error_Number(), @ErrorProcedure = Error_Procedure(), @ErrorLine = Error_Line() RAISERROR( @ErrorMessage, @ErrorSeverity, @ErrorState, @ErrorNumber, @ErrorProcedure, @ErrorLine ) END CATCH --Main END CATCH END --Main END
BEGIN TRY --Exception Handling SET @ErrorDesc='Error Occured while fetching records from TIX_PAYMENT_SCHEDULE'
SELECT PaymentScheduleId, OwedAmountId, ProposalId, DueDate, OverdueDate, ExpectedAmount, TransactionStatusId, IsPaymentReceived, IsLate, ActionId, ActualAmountReceived, IsValidSchedule, BrandId, CaseScheduleId, ReasonId, Comments, NoOfDays, ActionDate, IsCatchupBalanced, CatchupBalanceIdentifier, HasModified from TIX_PAYMENT_SCHEDULE with (nolock) WHERE DUEDATE <=@ToDate AND IsValidSchedule=@IsValidSchedule
SELECT DISTINCT OwedAmountId,proposalId,brandId from TIX_PAYMENT_SCHEDULE with (nolock) WHERE DUEDATE <=@ToDate AND IsValidSchedule=@IsValidSchedule Order By OwedAmountId,ProposalId,BrandId asc SELECT DISTINCT ProposalId from TIX_PAYMENT_SCHEDULE with (nolock) WHERE DUEDATE <=@ToDate AND IsValidSchedule=@IsValidSchedule Order By ProposalId asc
END TRY BEGIN CATCH SELECT @ErrorMessage=@ErrorDesc+CHAR(13)+ Error_Message(), @ErrorNumber=Error_Number(), @ErrorState=Error_State(), @ErrorProcedure=Error_Procedure(), @ErrorLine=Error_Line(), @ErrorSeverity=Error_Severity()
I want to delete 30-40 million rows from a transactional table. Whats the fastest way to delete these rows. just to delete 300,000 rows it takes 30 min. also i don't want to truncate the table.
Hi, My full text search on 2 millions records is taking time to show the result. I have created full text catalog in RAM drive to make the retrival process faster. But still its taking more than 1 minute to get the matching pattern. I am using SQL server 2005. I have 2 columns (id,text) in my table..
This is my unique index script
This is my query..
SELECT D.[id], D.productname FROM dbo.Products AS D WHERE CONTAINS(productname, 'ford')
What should i do to show the result in 3-4 seconds.
USE [Testing] GO /****** Object: Table [dbo].[Testing] Script Date: 4/25/2014 11:08:18 AM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON
[Code] ....
It seems to work fine with one million records.
Each primary key is unique, but the begindate is non-unique, and i guess even if i use datetime2 and add nanoseconds, from what i have read, there is a chance that i could have a duplicate datetime since the date is imported via XML from multiple sources.
Hey Guys, I have a contacts table that contains ID, First Name, Last Name, and Phone Number, Date Entered, Changed. Every time, the data is modified and saved, it will insert a new record in the table. So, Ill create a new record for a contact named Ryan, and then come back a day later and update the last name and phone number. So theSQL table would look like...1 Ryan Scott 818-550-0000 05/08/2008 Null2 Ryan Peters 000-000-0000 05/09/2008 Null How do I write a sql query that will run an update after the insert of the second record to fill in the Changed field with the data that changed?So I want to have record 2, end up looking like this... 2 Ryan Peters 000-000-0000 05/09/2008 LastName,PhoneNumberAny ideas?
Thanks for your help... I have two databases in two different servers, I am running this script which shows customers not in the second server. I am getting an error shown below. any idea of how to solve this issue. Ali
CREATE view v_show_customers_not_in_GP as select customer_id,company_name,contact_fname,contact_lna me,phone,alt_phone,fax,email,street_1,street_2,cit y,c.name,s.code as state,zip_code FROM customer v,country c, state s WHERE v.country_id =c.country_id and v.state_id = s.state_id and convert(char(15),customer_id ) NOT IN (select custnmbr from servername.dbname.dbo.RM00101 )
Server: Msg 18452, Level 14, State 1, Line 1 Login failed for user '(null)'. Reason: Not associated with a trusted SQL Server connection.
I need to compare records between two tables. There is no ID in the tables to do a simple join between them. So, what I'm looking for is: get the first record from table1 and read all record from table2 and give me back the most similar record. The String Distance is a predefined function.
Select a.table1 ,b.table2 from table1 a, table2 b where StringDistance (''a.table1,'b.table2') >90
Below is the code for two data sets and I can't seem to get my head around the issue. I need to find the number of 'ER' visits and 'IN' visits, separately, in dbo.VisitData for the 'Active' patients in dbo.PatientStatus. So, consider patient 69. He is Active on 5/5/2014 but becomes Inactive on 9/15/2014. I only want to count the number of visits ER or IN that are between those dates. In addition if patient 69 becomes active again after 9/15/2014, I need to capture that data as well. Patients can change there status multiple times.
I have two tables of book information. One that has descriptions of thebook in it, and the isbn, and the other that has the book title,inventory data, prices, the isbn.Because of some techncal constraints I won't get into now, I can'tcombine them both into one table. No problem. Things are going fine aslong as there is a description in the one table to corrispond to theisbn and other data in the other table.However, about half of the products are not yet entered into thedescrition table. I'd like to run a sql query that pulls up all theisbns that don't exist in the other. In other words, I'd like to get aquery that tells me exactly which isbns do not yet have descrition datain them. I know there is some sql that says to search from one filewhere the number does not exist in the other, but it slips my mind. Cansomeone help me on this please?Thank you!Bill*** Sent via Developersdex http://www.developersdex.com ***Don't just participate in USENET...get rewarded for it!
I have this 40,000,000 rows table... I am trying to clean this 'Contacts' table since I know there are a lot of duplicates.
At first, I wanted to get a count of how many there are.
I need to compare records where these fields are matched:
MATCHED: (email, firstname) but not MATCH: (lastname, phone, mobile). MATCHED: (email, firstname, mobile) But not MATCH: (lastname, phone) MATCHED: (email, firstname, lastname) But not MATCH: (phone, mobile)
I have a scenario to compare previous records based on each ID columns. For each ID, there would be few records, I have a column called "compare", We have to compare all Compare 1 records with Compare 0 Records. If Dt is lesser or equal to comparing DT, then show 0. Else 1
We always only one Compare 0 records in my table, so all compare 1 columns will compare with only one row per ID
My tables look like
Declare @tab1 table (ID Varchar(3), Dt Date, Compare Int) Insert Into @tab1 values ('101','2015-07-01',0) Insert Into @tab1 values ('101','2015-07-02',1) Insert Into @tab1 values ('101','2015-07-03',1) Insert Into @tab1 values ('101','2015-07-01',1) Insert Into @tab1 values ('101','2015-06-30',1)
Insert Into @tab1 values ('102','2015-07-01',0) Insert Into @tab1 values ('102','2015-07-02',1) Insert Into @tab1 values ('102','2015-07-01',1)
select * from @tab1
1.) In the above scenario for ID = '101', we have 5 records, first record has Compare value 0, which mean all other 4 records need to compare with this record only
2.) If Compare 1 record's Dt is less or equal to Compare 0's DT, then show 0 in next column
3.) If Compare 1 record's Dt is greater than Compare 0's DT, then show 1 in next column
I'm trying to avoid a large amount of manual data manipulation.
Here's the background: Legacy system that has (well let's call apples apples) pretty much no method of enforcing data integrity, which has caused a fairly decent amount of garbage data to be inserted in some tables. Pulling one of the [Individuals] table from within this Legacy system and inserting it into a production system, into the Table schema currently in place to track [Individuals] in this Production system.
Problem: Inserting the information is easy, how to deduplicate the records that exist within the staging table that the legacy [Individuals] table has been dumped into in production, prior to insertion. (Wanting to do this programmatically with SQL or SSIS preferably, so that I can alter it later to allow for updating existing/inserting new)
Staging Table Schema:
; CREATE TABLE [dbo].[stage_Individuals]( [SysID] [int] NULL, --Unique, though it's not an index intended to identify the [Individuals] [JJISID] [nvarchar](10) NULL, [NameLast] [nvarchar](30) NULL, [NameFirst] [nvarchar](30) NULL, [NameMiddle] [nvarchar](30) NULL,
Scenario: There are records that duplicate the JJISID, though this value is supposed to be unique for every individual. The SYSID is just a Clustered Index (I'm assuming) within the Legacy system and will be most likely dropped when inserted into the Production [Inviduals] table. There are records that are missing their JJISID, though this isn't supposed to happen either, but have valid information within SSN/DOB/Name/etc that can be merged into the correct record that has a JJISID assigned. There is really no data conformity, some records have NULLS for everything except JJISID, or some records will have all the [Individuals] information excluding the JJISID.
Currently I am running the following SQL just to get a list of the records that have a duplicate JJISID (I have other's that partition by Name/DOB/etc and will adapt whatever I come up with to be used for those as well):
; select j.* from (select ROW_NUMBER() OVER (PARTITION BY JJISID ORDER BY JJISID) as RowNum, stage_Individuals.*, COUNT(*) OVER (partition by jjisid) as cnt from stage_Individuals) as j where cnt > 1 and j.JJISID is not nullNow, with SQL Server 2012 or later I could use LAG and LEAD w/ the RowNum value to do my data manipulation...but that won't work because we are on SQL Server 2008 in this environment.
With, the following as a potential solution:
GSquared (3/16/2010)Here's a query that seems to do what you need. Try it, let me know if it works.
Performance on it will be a problem, but I can't fine tune that. You'll need to look at various method for getting this kind of data from the table and work out which variation will be best for your data. Without access to the actual table, I can't do that.
; WITH CTE AS (SELECT master_id, MIN(ID) AS first_id, MAX(Account_Expiry) AS latest_expiry FROM #People GROUP BY master_id) SELECT P1.master_id,
Unfortunately, I don't think that will accomplish what I'm looking for - I have some records that are duplicated 6 times, and I'm wanting to keep the values within these that aren't NULL.
Basically what I'm looking for, is to update any column with a NULL value to the corresponding Duplicate [Individuals] record value for that column.
**EDIT - Example, Record 1 has a JJISID with NULL NameFirst & NameLast BUT Record 2 has the same JJISID and values for NameFirst & NameLast. I'm wanting to propogate the NameFirst & NameLast from Record2 into Record1
I have a problem where I have 2 compare 2 records from the same table. This part looks easy but the problem is for a User there can be multiple records and I have 2 compare each record with its previous instance based on the timestamp. Not only I have to compare I have to perform some analysis. Below is the Table script and sample output.
Givens: All SQL Server 2008 or 2012 tools at your disposal.
Production database contains the following tables (simplified for example: constraints ignored, etc.) associated with a racing video game’s server.
-- A player of our game
-- Table greater than 10 million rows
CREATE TABLE [dbo].[User] ( [UserId] [bigint] NOT NULL ,[country] [int] NULL -- User’s home country ,[name] [nvarchar](15) NULL -- User’s displayable name (‘John’, ‘Bill’) ,[subscriptionTier] [int] NULL ) -- 0 == free, 1 == paid, for instance
Assume that rows get written into the event tables at a rate of 1,000 a minute,are never updated once written and currently are only read on a replica/reporting server.
Question Background: Write up a single query that would return the following: List of users and whose “TotalMoneyEarned” value ever grew (between logon events) at a rate of more than 1,000 per minute (we’d consider these suspicious and flag them for later investigation).
For instance, if the sample data were:
-- example of [Events.UserLogon] data -- not the query output we want
Event 1 is okay because there’s nothing to compare it against
Event 2 is okay because the TotalMoneyEarned only grew 500 in a minute
Event 3 should be flagged, as the value grew 1500 in a minute
Event 4 is okay, as it grew 7,000 in 8 minutes (< 1000 per minute)
Query Output (your query should return data in a format like this):
User Flagged Logon Time Rate Since Last Logon (money/minute) John 2010-10-16 00:21:56 1500 Dave 2010-10-16 00:30:50 3200 Bill 2010-10-16 00:35:23 1000
It is likely that you will need to create sample data for both the User and [Events.Logon] tables. We are looking for a single query that returns data like what is represented in Query Output.
as you can see, the records have a 30minutes time interval. i need to create a query to know if there are missing records in the table. so basically the result should be this:
I am creating one SSIS package where my source is oracle. I have transferred the data from Oracle to flat file as per client requirement.I have to create single package for 2 country 1 is US and another is CANADA Columns are below
ZONE_ID, ZONE_NAME Zone Id having data like 10001,10002,10003,20001,2002,2003
Where zone_id start with 1000 is US Zone and Zone_Id start with 2000 is Canada Zone.
For US: 1. Load geography data from DB tables into flat files 2. Load geography data from flat files to Spectrum DB tables
For Canada: 1. Load geography data from DB tables into flat files 2. Load geography data from flat files to Spectrum DB tables
Now I want to look from flat file if Zone_id start with 1000 then it must go to US_DFT and if Zone_id start with 2000 then it must go to CANADA_DFT.
1. Flat File Source 2. Conditional Split, Case Good = !ISNULL(KEY) Case Error = ISNULL(KEY) 3. Case Good -> Writes to Good Flat File (with timestamp in the title) 4. Case Error -> Writes to Error Flat File (with timestamp in the title)
Most job runs have no errors but the error file is created as a zero byte file anyway. If there are no error records I don't want the error file created. How might I accomplish this?
i have an application that generate a lot rows from 1 mellion to 2 mellions rows i wana insert this record in MS SQL server in a fast way
i am currentlly loop through this records while it is loaded in dataset building a command text that generate insert query for each row and run it against SQL server
but it takes a lot of time to be finished is there r a way to bulk insert this data?
I have been trying to store millions of files (layer tiles) in NTFS now for a while with little success as the read/write speed drops off the chart or NTFS itself gets corrupted. My questions are:
1) Is there any way to store millions of files successfully in NTFS? 2) What does VE use to store its tiles (I am guessing SQL Server)? 3) If VE does not use SQL Server to store tile images, has anyone tried it and what are the pros/cons?
Hi everyone - I have an ETL package which loads about 10 million rows from SQL 2005 staging tables to new, empty tables (no indexes or constraints) in another SQL 2005 DB to be SWITCHED into the main partioned data tables.
Both databases reside on the same SQL Server instance - it is a dev server so the disk aren't super fast/SAN speeds but it has plenty of RAM/CPU & SCSI disks.
The insert takes about 45 minutes - can I get this working any faster?? or is this typical for 10 million rows?? I've messed about withe the data flow a few times but I can't seem to get any significant improvements.
Any tips anyone??
I perform several lookups on dimensions - these are not cached.
I do query the source table concurrently with different WHERE clauses & run two pipelines processing the data into 2 destination tables.
Would it be better to query the base table once & use a conditional split instead of the two separate queries??
I also mulicast from each pipeline & use a UNION ALL to log some of the rows from each pipeline to anther destination table.
Hope this makes sense?? Any ideas or tips on how I can speed up this kinda transform would be appeciated..
I'm using oleDB connections.
Hope ths makes some kinda sense!! Thanks for any advice!!
Hello,Currently we are in the process of implementing a sql server database where couple tables will have millions of rows ( about 98 millions and will grow) and a web site that will retrieve and sort the data ( read only). How asp.net gridview and sqldatareader act situation like that? Will it be a very slow response? Is there any alternative? Is there any example on the net? Assuming tables are well tuned and well indexed. Thank you in advance.
I have a DataTable in memory and I want to write a C# code to dump the data into a SQL database. Is there a faster way of dumping millions of rows into a SQL table besides running INSERT INTO row by row?
I have a table with first name, last name, SSN(social security number)and other columns.I want to assign group number according to this business logic.1. Records with equal SSN and (similar first name or last name) belongto the same group.John Smith 1234Smith John 1234S John 1234J Smith 1234John Smith and Smith John falls in the same group Number as long asthey have similar SSN.This is because I have a record of equal SSN but the first name andlast name is switched because of people who make error inserting lastname as first name and vice versa. John Smith and Smith John will haveequal group Name if they have equal SSN.2. There are records with equal SSN but different first name and lastname. These belong to different group numbers.Equal SSN doesn't guarantee equal group number, at least one of thefirst name or last name should be the same. John Smith and Dan Brownwith equal SSN=1234 shouldn't fall in the same group number.Sample data:Id Fname lname SSN grpNum1 John Smith 1234 12 Smith John 1234 13 S John 1234 14 J Smith 1234 15 J S 1234 16 Dan Brown 1234 27 John Smith 1111 3I have tried this code for 65,000 rows. It took 20 minute. I have torun it for 21 million row data. I now that this is not an efficientcode.INSERT into temp_FnLnSSN_grpSELECT c1.fname, c1.lname, c1.ssn AS ssn, c3.tu_id,(SELECT 1 + count(*)FROM distFLS AS c2WHERE c2.ssn < c1.ssnor (c2.ssn = c1.ssn and (substring(c2.fname,1,1) =substring(c1.fname,1,1) or substring(c2.lname,1,1) =substring(c1.lname,1,1)or substring(c2.fname,1,1) =substring(c1.lname,1,1) or substring(c2.lname,1,1) =substring(c1.fname,1,1)))) AS group_numberFROM distFLS AS c1JOIN tu_people_data AS c3ON (c1.ssn = c3.ssn andc1.fname = c3.fname andc1.lname= c3.lname)dist FLS is distinct First Name, last Name and SSN table from thepeople table.I have posted part of this question, schema one week ago. Please referthis thread.http://groups.google.com/group/comp...6eb380b5f2e6de6