SQL Server 2008 :: Data Fetching 80 Million Records?
Mar 24, 2015
i have table below
CREATE TABLE [dbo].[DR_Test](
[source_item_id] [int] NOT NULL,
[source_line_no] [int] NULL,
[buyer_id] [int] NOT NULL,
[seller_member_id] [int] NULL,
[code]...
the table contains more than 80 million records so when i fetch the data using buyer_id & timezone its taking lot of more than 1 hours or so....& where buyer_id is not unique.how to fetch the data fast or need to change the structure of the table
I have a requirement to delete 1 Million records from a table having 10 Million data and it's being queried on 24/7 basis (don't have a downtime). how can I achieve that?
I have a pretty simple SSIS package that fast loads a 100 million record table into a SQL Server 2008 table on a daily basis. This normally runs fine and completes in about 1 hour. As this is perhaps one of our largest running SSIS packages, about once every 2-3 weeks this SSIS will fail/drop connection. Once it fails, the large number of records will start rolling back. This rollback process can take 1+ hours so I cannot even restart the failed SSIS package immediately. This is a problem.
I am looking for a solution or option so I do not have to wait on that rollback to restart this particular, long running SSIS package. Is there an option/setting to leave the partial data set committed and not rollback? Then I could just restart the SSIS package immediately or set it the SSIS to auto-restart 1 time on failure. The first step in the SSIS does a truncate of the destination table.
How well SQL Server can support 300 million records... Any body is working on big database like this. can anyone give me some input on this. it's going to be 60GB database size.
I want to update tableToUpdate in batches of 5000 per batch and set the lastenecryptionDT to null based on the the join to the tableValues using the column ENCRYPTIONID, and also output updated rows into another table. Incase I would need to do a rollback.
I have a query needs to look for 5 records data in a table. Basically i need to hardcode. Below is my query which didn't work out.
select BF_ORGN_CD, BF_BDOB_CD, BF_TM_PERD_CD,data from BF_DATA WHERE (BF_ORGN_CD,BF_BDOB_CD,BF_TM_PERD_CD) in ***** i guess this is the wrong query**** ('A1', 'B1', 'C1') ('A2', 'B2', 'C2') ('A3', 'B3', 'C3') ('A4', 'B4', 'C4') ('A5', 'B5', 'C5')
but if i use the query below it will generate more records than these 5 records
select BF_ORGN_CD, BF_BDOB_CD, BF_TM_PERD_CD,data from BF_DATA WHERE (BF_ORGN_CD) in ('A1', 'A2', 'A3', 'A4', 'A5') and (BF_BDOB_CD) in ('B1', 'B2', 'B3', 'B4', 'B5') and (BF_TM_PERD_CD) in ('C1', 'C2', 'C3', 'C4', 'C5')
Hi, I have a small Problem. Lets say i have 6 rows in a table (two column month and MonthCost )naming jan,feb,mar,apr,may.jun and all of them have some data. but while displaying in Ui i need to show all the months from Jan to december and july to december with values as something, say 0 so the values displayed in gridview sholud be like this Month MontCost Jan 1 Feb 1 Mar 1 . . . Jul 0 Dec 0 . I tried it can be done using a temp table in db , but think of a scenaio where the i want to display for Number of times( 1 to 12 * 50). I don't want to hit performance and i am uisng sql server 2000. Any hints or suggestions are welcomed.
Hi, I get 5000 records on executing a query. In the form, I would like to display 50 records at a time in the grid from this resulted query, so i create 100 link labels(dynamically created based on the no. of records resulted from the query) within a panel.
When clicked on link1, first 50 records should be displayed,on clicking link2 next 50 records(ie, 51 to 100) should be displayed and so on. So , i wrote the query as---
select top(50)* from tblindividual where id not in (select top(intValue)id from tblindividual )
where intValue would be 50 for link2,100 for link3,150 for link4 and so on
If i want to fetch last 50 records of the query, intValue would be 4950.Here the "not in" list becomes very big(1,2,3,.......,4949,4950) and hence the query is becoming a bit slow...
Is there any other method(query) to get the same result??.. because i heard that using "not in" keyword would make query execution slow.
ID DeviceName Status 1 Sony Good 2 Toshiba OK 3 Sony Bad 4 Tata OK
I need to return the following records
ID DeviceName Status 2 Toshiba OK 3 Sony Bad 4 Tata OK
If there are more than one record for the Device, then record with the latest ID should be returned. If there is only one record for the Device, then that record should be returned.
Can this be achieved through a single query. Any help is appreciated.
I am trying to fetch data from IBM DB2 to SQL Server 2005. The problem I am facing is when I create the OLE DB Connection (I am using the "IBM DB2 UDB for iSeries IBMDA400 OLE DB Provider") and see the "Preview", I get "System.Byte[]" in a couple of columns for all the rows, instead of the actual data. The datatype of the original field is "Byte Stream". I have tried all options, but, failed. I believe there is something in the "Force Translate" property of the OLE DB Connection. Right now it is set to "65535". I am not sure if that needs to be changed.
I was earlier using a DTS package, where I used ODBC for connecting to the same database. In ODBC, there is a "Translation" tab where there is a check box labelled: "Convert binary text (CCSID 65535) to text". When I check this box, I am able to see the data correctly.
But, now I have moved to SSIS and I am facing the same problem as I am not using the ODBC connection.
I am fetching large amount of data from teradata to sql server using linked server. I am facing below query:
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)
I am currently working on a simple page to insert 1.6 million UK postcode records into an SQL server table. The table has three columns for the postcode, longditude coordinate and lattitude coordinate. The data is sourced from a pipe (|) delimited txt file and inserted into the database using a FOR loop. The problem I have is that the page will hang after inserting only 10,000 records, the page displays either an invalid View State error or a page cannot be found error. Now I assume the viewstate error stems from the fact that there is a form on the page which simply contains a button to execute the script and a few labels to show the progress. But without the form and associated viewstate the insert still fails to complete.... any ideas?? Would I be better running this on a thread or should I just do it in stages and be patient. I have now modified the page to read the database on load and pick up from where it crashes?
I have a table that has 4+ million records. I need to update those records. I am facing some performance issue. Can someone please advice?
update stage set batch_status = 1 where update_status = 0
Update transaction Set aId = s.aId, b = s.b,
from stage s Where s.aId = transaction.aId and s.batch_status = 1
Update stage Set update_status = 1, batch_status = 2
where
batch_status = 1
When I run the above query with "set rowcount 1000", it runs in one minute. When I run the query for "set rowcount 10000", it runs in 1 hour 56 minutes. Can someone help me to optimize it?
Hey folks...So I have a table that looks like this:CREATE TABLE [tblStation] ([CAMPAIGN] [varchar] (8),[LISTNUM] [varchar] (10),[PHONE] [varchar] (10),[EVENTTIME] [datetime] ,[STATION] [int],[OPERATOR] [varchar] (16),[EVENTCODE] [varchar],[CALLSPAN] [decimal](18, 0),[FDISP] [int],[RECORDNUM] [varchar],[STC] [varchar],[PROMOC] [varchar],[EXP_CAMP] [varchar],[PROMO3] [varchar],[MAXATT] [char],[LISTNAME] [varchar],[SITENAME] [char],[Row_id] [int] IDENTITYIt's taking nine seconds to run the following command:SELECT count([fdisp])FROM [TrunkFiles_new].[dbo].[tblStation] WITH (NOLOCK)WHERE fdisp IS NULLAnyone familiar with a table of this size having performance likethis? The [fdisp] column has a non clustered index on it.Thanks in advance...
i have a directory database with approx. 80 million records. i am feeding the database with bulk_insert. Indexing one of the fields took about 8 hrs. After indexing when i run queries with the indexed field the response time is under 1 sec. However if i run select queries with like on non-indexed fields it takes more than 2 mins. So i decided to index 4 other fields in the database and it looks like the indexing process is going to run for 2 days. i am a novice in SQL database design and i am not sure if this is the best way to index the table. i am just using create index. Any suggestions / advice welcome.
Hello, What is the fastest way to update 20million records in our database. I have tried to do a simple update statement like this: update trail_log with (tablockx, holdlock) set trail_log .entry_by = users.user_identity from users where trail_log.entry_by = users.user_id
but it take 10 plus hours to run since it cannot commit the transactions until the very end. So was was thinking that I need to commit in batch like after 50K but that is slow as well. Set rowcount 50000 Declare @rc int Set @rc=50000 While @rc=50000 Begin Begin Transaction update trail_log With (tablockx, holdlock) set trail_log.entry_by = users.user_identity from users where trail_log.entry_by = users.user_id and trail_log.entry_by not like '%[0-9]%' Select @rc=@@rowcount --Commit the transaction Commit End go I have let the above statement run for 1.5 hours and it only update 450000 rows. Any ideas... Maybe I'm doing it wrong. Please Help!!
I have a sql script that updates records in a table with 40 million records.
There is some functionality in the script that could be put away in functions for code reuse/elegance.
Functions would cause execution overhead.
What else could I use besides functions that would allow me the code reuse and not compromise the execution over head? Is there any thing like includes in TSQL that would allow me to do so?
declare @table table ( ParentID INT, ChildID INT, Value float ) INSERT INTO @table SELECT 1,1,1.2
[code]....
This case ParentID - Child 1 ,1 & 2,2 and 3,3 records are called as parent where as null , 1 is child whoose parent is 1 similarly null,2 records are child whoose parent is 2 , .....
Now my requirement is to display parent records with value ascending and display next child records to the corresponding parent and parent records are sorted ascending
I have a scenario to fetch records for each ID on 2 conditions. There are 2 types of Product type for each ID. PFE and PRI. I need only latest active PFE and at the same time all the latest PRI should be closed. (meaning, only PFE should be in Active status currently)
1.) Latest Product type PFE should be A (active) status for the particular ID 2.) At the same time ALL the latest PRI should be C(closed) status for the same ID
I have give example with 3 scenarios and desired output
1.) For ID 101, Latest PFE is active and all latest PRI is closed ----> Should come in result 2.) For ID 102, Latest PFE is Closed and all latest PRI is closed ---->Should NOT come in result
Hello,I have one application which is having written in asp.net & plain asp.I am having one button on asp page,when i will click on that button, then itwill execute one other asp page.And after the execution of that second asp page, I redirect it to someASPX page with some values.On the ASPX page, it will connect to the Database, and insert the values.Thus, sometime, the following error is occuring :"Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding."Main thing is that i am testing it on my local machine, then also it is giving me this error ...Insert into statement is also not much complicated :just I inserted three values in one table.Is this occuring due to the mixer of asp & aspx??Plz give me some proper solution.I want it in efficient manner, because this application will be accessed by number of users at the same time.Plz help me.Thanks,Sandy
I have a new client with an existing system that has just over 2 million business listings in one table. Each business listing is associated with one business category.
* Company Table (around 20 fields):
companyID companyName categoryID state postCode etc.
* Category Table (5 fields)
categoryID categoryName etc.
We are using MSSQL 2005 Express Edition with Advanced Services
A free text search needs to be performed on the companyName and categoryName limited by region (state and or postcode).
1) What kind of response times should I expect for the free text search (I have not used the free text search before)
2) How should I index the companyName and categoryName so they are both used in a joined query? i.e. Do I just configure the free text search index on each field separately and it should work?
I want to compare ONLY 1 Column values from 2 tables having more than 4.9 million records. There is a difference of 4000 rows between the 2 tables.
SELECT ID From TABLE1 where ID not in (SELECT DISTINCT ID From TABLE2)
My above query took nearly 4.5 hours to run and I had to cancel it. Is there a better way to write the query . I just want to compare the ID - column values which are missing in TABLE2
I come from a web based world were loading 1.5 million records into a temp table is suicide. I’m doing more data warehouse stuff now and I was looking into optimizing a buddies proc and noticed he was loading 1.5 million records into a temp table. We had a discussion about it because being from a web world I was drastically against it. He on the other hand didn’t feel it was an issue being it gets called once maybe twice a day. The tempdb is set to autogrow and it is on a different drive than all the other databases on the box. It has one ldf and mdf. He’s creating an index on the table after load. Why we shouldn’t be loading 1.5 million recs into temp table?
Hi I have 2 tables with more then million records in each and I have to perform full outer join. The problem is that the join clause contains 2 different parameters (int and string) like this:
Select * From a full outer join b On a.cli = b.cli OR a.reference = b.reference
Because of the OR in the clause and the million records the query is infinite. If I change to one rule only then it works fine.
How can I join these 2 big tables with 2 rules? Thanks Itay
I am trying to update a large table which consists of 45 million records , it is taking more than 2 days to the update , below is my approach
1. The table has only one clustered index and no other indexes on the table. 2. I am updating in batches say 20000 record-wise. 3. Changed the recovery mode to bulk logged and auto-growth size is set to 300MB and there is enough space in my disk for transaction log .
I have tried to process > 3 million Fuzzy grouping records on two different servers with no success. 3 mill works but anything above 4 mill doesn't. Some background:
We are trying to de-dup our customer table on: name (.5 min), address1 (.5 min), city (.5 min), state (exact). .8 overall record min score. Output includes additional fields: customerid, sourceid, address2, country, phonenumber Without SP1 installed I couldn't even get a few hundred thousand records to process Two different servers - same problems. Note that SSIS and SQL Server are running locally on both The higher end server has 4GB RAM, the other 2.5 GB RAM. Plenty of free disk space on both SQL Server is configured to use 2 GB of RAM max The page file is currently at 15GB
After running a number of test on both servers trying different batch sizes etc. the one thing I noticed is that it seems to always error out when SSIS takes over and starts chewing up all the available RAM. This happens after the index is created and SSIS starts "warming caches". On both servers SQL Server uses up about 1.6GB of RAM at this point while SSIS keeps taking over RAM until all physical RAM is used up.
Some questions:
Has anyone been able to process more then 3 million records and if so what is your hardware configuration? Should we try running SSIS from a different server so it has access to the full amount of physical RAM? (so it doesn't have to fight for RAM with SQL Server) Should we install Win 2003 Enterprise Server so we can add more RAM? Any ideas why switching to the page file might be causing errors?
I have 1+ CSV files (using a foreach loop) which I'm doing a lot of transform work on and then inserting into a SQL database table. Each CSV file usually contains about 2 days worth of data (contains date stamps) - somewhere in the region of 60k records per day. The destination table currently contains 3 million+ rows and will get bigger. I need to make sure that before inserting into the destination table, the data doesn't already exist.
I've read the following article: http://www.sqlis.com/311.aspx While the lookup method works, it takes ages and eats up memory as it caches the 3m+ records before running for each CSV. Obviously this will only get worse as the table grows in size.
To make things a little more efficient what I'd like to do, is first derive the dates I'm dealing with in the current file - essentially storing the max(date) and min(date) in variables. Then in the lookup SQL use those vars, to reduce the amount of data that needs to be brought into the transformation to check against before inserting into the destination table. Lookup SQL eg. SELECT * FROM MyTable WHERE Date BETWEEN varMinDate AND varMaxDate.
Ideally I'd use an aggregate transformation and then use the subsequent output from that either in the lookup query or store the output in vars, but I don't think you can do that and I get the feeling I'm approaching this with the wrong mindset.
I have a table that I need to do some computations on all the data but first I need to remove the duplicate records and insert the results into a destination table. Here's the example below. My table has 3.1 million rows. I have tried using the DISTINCT and the GROUP BY but both ways to select the data takes about half a minute to run. I'm wondering if there is a way to increase performance. Users are ok with this time since the process runs overnight but improving it won't hurt. I do have a clustered index on these fields but that doesn't seem to improve any.
I would like to know that, I have three instances of the same database at three different servers and I am trying to fetch the data from the select query. "select * from table_name"
I would like to know, whether the order of rows fetched by this query will be different on different servers of sql server or the same order of rows will be fetched.
For me the output is coming different on each server database with he same query . Pls let me know, is there any default order by or it takes it randomly.
I'm currently developing an application using C# and SQL Server Express, in which one of the functionalities is to allow the user to browse through the existing products, wether by browsing one by one (next, previous, first, last) or performing a search (by product name, id, etc.). I'm currently using the default DataSource and TableAdapter aproach, but I found out that when I execute the TableAdapter.Fill() method, all the data in the table is fetched... what with 10,000 records it takes a bit to perform, specially on slower connections.
So my doubt is, what would be the best aproach to solve this problem? I'm considering loading first position, and then when the user presses the next button I'd fetch second position, etc, this using ROW_NUMBER(). I could even cache every other 10 records or something like that. Since it is possible to know any records row number (as long as the records are ordered) it would even work when the user performs a search and goes directly into row number 5000 for instance.
Would this aproach be a good practice, because that's all I want, to do things the right way? Thank you for your time
EDIT: I've noticed that when creating a SqlDataReader there is the option to provide a CommandBehavior, but I'm not quite undestanding how this could be used, would some tweaking in this area do the job?