We maintain a 175 million record database table for our customer.
This is an extract of some data collected for them by a third party
vendor, who sends us regular updates to that data (monthly).
The original data for the table came in the form of a single, large
text file, which we imported.
This table contains name and address information on potential
customers.
It is a maintenance nightmare for us, as prior to this the largest
table we maintained was about 10 million records, with less
complicated updates required.
Here is the problem:
* In order to do the searching we need to do on the table it has 8 of
its 20 columns indexed.
* It takes hours and hours to do anything to the table.
* I'd like to cut down as much as possible the time required to update
the file.
We receive monthly one file containing 10 million records that are
new, and can just be appended to the table (no problem, simple import
into SQL Server).
We also receive monthly one file containing 10 million records that
are updates of information in the table. This is the tricky one. The
only way to uniquely pair up a record in the update file with a record
in the full database table is by a combination of individual_id, zip,
and zip_plus4.
There can be multiple records in the database for any given
individual, because that individual could have a history that includes
multiple addresses.
How would you recommend handling this update? So far I have mostly
tried a number of execution plans involving deleting out the records
in the table that match those in the text file, so I can then import
the text file, but the best of those plans takes well over 6 hours to
run.
My latest thought: Would it help in any way to partition the table
into a number of smaller tables, with a view used to reference them?
We have no performance issues querying the table, but I need some
thoughts on how to better maintain it.
One more thing, we do have 2 copies of the table on the server at all
times so that one can be actively used in production while we run
updates on the other one, so I can certainly try out some suggestions
over the next week.
I don't work much with the back end of software development so there is a lot about SQL Server I do not know. We are building a database. The database will have about 10 tables in it. 3 of these tables will probably have a huge amount of data in them. Specifically each one of the 3 tables will each have about a half a million database records in it. Each record is about 100 characters max in length.(Im am including numbers as characters and summing the individual columns/fields to come up with 100). Will a SQL server database table with A half a million records in it be possible? We have tried to normalize the database to cut down on the size of the table but it all comes out to about a half a million records per table. Any help is deeply appreciated. Bill
I have a requirement to delete 1 Million records from a table having 10 Million data and it's being queried on 24/7 basis (don't have a downtime). how can I achieve that?
i have a directory database with approx. 80 million records. i am feeding the database with bulk_insert. Indexing one of the fields took about 8 hrs. After indexing when i run queries with the indexed field the response time is under 1 sec. However if i run select queries with like on non-indexed fields it takes more than 2 mins. So i decided to index 4 other fields in the database and it looks like the indexing process is going to run for 2 days. i am a novice in SQL database design and i am not sure if this is the best way to index the table. i am just using create index. Any suggestions / advice welcome.
I come from a web based world were loading 1.5 million records into a temp table is suicide. I’m doing more data warehouse stuff now and I was looking into optimizing a buddies proc and noticed he was loading 1.5 million records into a temp table. We had a discussion about it because being from a web world I was drastically against it. He on the other hand didn’t feel it was an issue being it gets called once maybe twice a day. The tempdb is set to autogrow and it is on a different drive than all the other databases on the box. It has one ldf and mdf. He’s creating an index on the table after load. Why we shouldn’t be loading 1.5 million recs into temp table?
i need to design a database table which will store supplier's demand information. 1 supplier will probably have 10000 records and there are posibility that there are 10,000 suppliers. So, in total, the number of records will be 10000 * 10000 = XXXXA LOT XXXX which will be very large number of record to be inserted into a table. So, how can i design an table and structure to cater this scenario? Thanks.
I'm looking for some performance assistance on updating a column value in a table that contains approximately 50 million rows. I have a permanent table in another database that has the key column and value to be set. My query is listed below, but I'm afraid it will run quite awhile. Any suggestions would be appreciated.
update mytable set column2 = b.column2 from mytable as a join mytable1 as b on a.column1 = b.column1
There is a one to one relationship between the two tables.
I am trying to update a large table which consists of 45 million records , it is taking more than 2 days to the update , below is my approach
1. The table has only one clustered index and no other indexes on the table. 2. I am updating in batches say 20000 record-wise. 3. Changed the recovery mode to bulk logged and auto-growth size is set to  300MB and there is enough space in my disk for transaction log .
My environment is SQL 2000. I have a table with 500 million rows. The table is consistently getting updated and inserted. I can not take the table offline. My clustered index needs to be rebuilt due to decreased performance. How do I accomplish this?
I am using DTS and VBScript in DataPump tasks in order to transfer large amounts of data from text files to an SQL database.
As the database uses a normalized schema, there is often the case of inserting multiple records in a destination table from various fields of the same record of the source text file.
For example, if the source record contains information about goods sold like date, customer, item code, item name and total amount, and does so for a maximum of 3 goods per sale (row), therefore has the structure:
I have tried using a datapump task and VBScript, and I guess it has to do with the DTSTransformStat_**** constants, but none of those I used seems to work
DELETING 100 million from a table weekly SQl SERVER 2000Hi AllWe have a table in SQL SERVER 2000 which has about 250 million recordsand this will be growing by 100 million every week. At a time the tableshould contain just 13 weeks of data. when the 14th week data needs tobe loaded the first week's data has to be deleted.And this deletes 100 million every week, since the delete is taking lotof transaction log space the job is not successful.Can you please help with what are the approaches we can take to fixthis problem?Performance and transaction log are the issues we are facing. We trieddeletion in steps too but that also is taking time. What are thedifferent ways we can address this quickly.Please reply at the earliest.ThanksHarish
I am doing a performance testing for In-memory option is sql server 2014. As a part I want to insert 500 million rows of records into a in-memory enabled test table I have created.
I need a sample script to insert 500 million records into a table ....
I have a table that I need to do some computations on all the data but first I need to remove the duplicate records and insert the results into a destination table. Here's the example below. My table has 3.1 million rows. I have tried using the DISTINCT and the GROUP BY but both ways to select the data takes about half a minute to run. I'm wondering if there is a way to increase performance. Users are ok with this time since the process runs overnight but improving it won't hurt. I do have a clustered index on these fields but that doesn't seem to improve any.
Can anyone help me on this... when i select data from table using select statement it takes huge amount of time....The table contains 7 million entries and when i select by mentioning a criteria it takes around 45 secs..The system has 4GB RAM and Dual Processing CPU. The select statement does not contain any grouping and all..
Will it take this much time to retrieve data.?. The table does include an indexed field, So can anyone help me on the different things i can do to make the retrieval faster?
Generating the 4 lines is not the issue; I call 3 functions to do that together with cross apply.One function to get all dates between the start and end date (dbo.AllDays returning a table with only a datevalue column); one function to have these dates evaluated against a work schedule (dbo.HRCapacityHours) and one function to get the absence records (dbo.HRAbsenceHours) What I can't get fixed is having the correct hours per line.
I have a row that is being used log track plays on our website.
Here's the table:
CREATE TABLE [dbo].[Music_BandTrackPlays]( [ListenDate] [datetime] NOT NULL DEFAULT (getdate()), [TrackId] [int] NOT NULL, [IPAddress] [varchar](20) ) ON [PRIMARY]
There's a CLUSTERED INDEX on ListenDate ASC and a NON CLUSTERED INDEX on the TrackId.
I have a TRIGGER on the Music_BandTrackPlays table that looks like the following:
CREATE TRIGGER [trig_Increment_Music_BandTrackPlays_PlayCount] ON [dbo].[Music_BandTrackPlays] AFTER INSERT AS UPDATE Music_BandTracks SET Music_BandTracks.PlayCount = Music_BandTracks.PlayCount + TP.PlayCount FROM (SELECT TrackId, COUNT(*) AS PlayCount FROM inserted GROUP BY TrackId) AS TP WHERE Music_BandTracks.TrackId = TP.TrackId
When a simple INSERT statement is done on the Music_BandTrackPlays table, it can take quite a long time. When I remove the TRIGGER the INSERTs are immediate. The Execution plan for the TRIGGER shows that a 'Inserted Scan' is taking up most of the resources.
How exactly is the pseudo 'inserted' table formed?
For now, I think the easiest thing to do is update my logging page so it performs 2 queries. One to UPDATE the Music_BandTracks table and increment the counter, and perform the INSERT into the Music_BandTrackPlays table seperately.
I'm ok with that solution but I would really like to understand why the TRIGGER is taking so long. The 'inserted' pseudo table will be 1 row 99% of the time. Does SQL Server perform a table scan on all 20 million rows in order to determine what's new and put it in the inserted pseudo table?
IF NOT EXISTS (SELECT TOP 1 1 FROM dbo.syscolumns WHERE id = OBJECT_ID(N'dbo.Employee) and name = 'DoNotCall') BEGIN ALTER TABLE [dbo].[Employee] ADD [DoNotCall] bit not null Constraint DoNot_Call_Default DEFAULT 0 IF ( @@ERROR <> 0 ) GOTO QuitWithRollback END
It just takes a LOT of time in SQL Server Management studio. I have to cancel the query and cancelling takes a whole lot time. I am using SQL Server 2008.
I need to use Bulk insert statement for copying a table with 200 million rows to another table on the same server...the table has no primary key or identity column.... script for BULK INSERT ...
Hi I have a table with a user column and other columns. User column id the primary key.
I want to create a copy of the record where the user="user1" and insert that copy in the same table in a new created record. But I want the new record to have a value of "user2" in the user column instead of "user1" since it's a primary key
Is that possible to restrict inserting the record if record already exist in the table.
Scenario: query should be
We are inserting a bulk information of data, it should not insert the row if it already exist in the table. excluding that it should insert the other rows.
Hi All,I have a table in SQL Server 2000 that contains several million memberids. Some of these member ids are duplicated in the table, and eachrecord is tagged with a 1 or a 2 in [recsrc] to indicate where theycame from.I want to remove all member ids records from the table that have arecsrc of 1 where the same member id also exists in the table with arecsrc of 2.So, if the member id has a recsrc of 1, and no other record exists inthe table with the same member id and a recsrc of 2, I want it leftuntouched.So, in a theortetical dataset of member id and recsrc:0001, 10002, 20001, 20003, 10004, 2I am looking to only delete the first record, because it has a recsrcof 1 and there is another record in the table with the same member idand a recsrc of 2.I'd very much appreciate it if someone could help me achieve this!Much warmth,Murray
Hello,I am using SQL 2005 and Cognos' Data Manager. It is an ETL tool fordata warehousing.I have a problem with time it takes to load new changes, and I amseeking advice on a better way to manage the data.I have a table that tracks student attendance and it contains about 13million records. On a daily basis, there are 5,000 - 20,000 inserts and10,000 - 50,000 updates.The daily data comes for two different text files from my operationsystem; current and historical (CLSFIL and CLSHIS).The data is loaded into a staging area from the operational system,where data cleansing and other fields are added to the table.The final step is delivering the table to my target database, which isused for reporting.Heres the situation: I find it takes 45 minutes to do a relationalupdate, where only the records that changed in the last day will beloaded. However, if I choose the native API load instead of aRelational Load, it can load all 13M records in 7 minutes. The table isheavly indexedAt some point, the API load will take more time than the relationalload, (the changes and new records will remain a constant, but the filewill continue to grow).I'm seeking another solution is more efficient. I'm considering twotables for history and current and creating a view for reporting via aunion.This a good idea? How can I make the view effeicent to use the whereclause? Looking to bounce around ideas.Other Ideas?Thanks in AdvanceRob(I maintain the key relationships in the tool, not the tables. I knowI have lots to learn and improvments)CREATE TABLE "dbo"."F_BI_Class_Attendance_Detail"("CLASS_ATTENDANCE_ID" VARCHAR(50) NULL,"CLASSES_OFFERED_ID" VARCHAR(26) NULL,"CLASS_CAMPUS_ID" VARCHAR(10) NULL,"STUDENT_ID" CHAR(20) NULL,"FULL_CLASS_ID" CHAR(15) NOT NULL,"SESSION_ID" CHAR(10) NULL,"SECTION_ID" VARCHAR(5) NULL,"MEET_DT" DATETIME NULL,"MEETING" SMALLINT NULL,"PRESENT" CHAR(2) NOT NULL,"SESSION_SKEY" BIGINT NULL,"STUDENT_SKEY" BIGINT NULL,"CLASS_CAMPUS_SKEY" BIGINT NULL,"CLASSES_OFFERED_SKEY" BIGINT NULL,"LOAD_DT" DATETIME NULL,"COMPUTED_DT" DATETIME NULL);
I have an application that runs on several sites that has a table with 36 columns mostly ints och small varchars.
I currently have only one table that stores the data and five indexes and since the table on one location (and others soon) has about 18 million rows I have been trying to come up with a better solution (but only if needed, I dont think I have to tell you that I am a programmer and not an dba). The db file size with all the indexes is more then 10gb, in it self is not an problem but is it a bad solution to have it that way?
The questions are:
Are there any big benefits if i split it into several smaller tables or even smaler databases and make the SPs that gets the data aware that say 2006 years data is in table a and so on? Its quite important that there are fast SELECTS and that need is far more important then to decrease the size of the database file and so on.
How many rows is okay to have in one table (with 25 columns) before its too big?
I'm inserting from TempAccrual to VacationAccrual . It works nicely, however if I run this script again it will insert the same values again in VacationAccrual. How do I block that? IF there is a small change in one of the column in TempAccrual then allow insert. Here is my query
INSERT INTO vacationaccrual (empno, accrued_vacation, accrued_sick_effective_date, accrued_sick, import_date)
I having a bit of confuse here. Can you please help me?
I have about 5000 records all ready in oen table. Everything that I query is related to that table one way or the other. Now i having 2000 - 3000 more records to store in the database. In term of relation database then I can store the new data in a different table so I can can query it. Most of my queries are searching.
So the question is is this better to store the data in another table or should store everything in the old table? Thanks a lot in advance for your help. I really do appreciate that.