I am attempting to do a rather simple purge task on a very large table. This task will need to take place daily and delete records older than 6 months out of the database. On first pass this will delete well over 130 million rows. I thought the best way to handle this is create a proc and call the proc from a SQL Agent Job that runs nightly. Here is an example of the script:
CREATE PROCEDURE usp_Purge_WCFLogger
AS
SET NOCOUNT ON
EXEC sp_rename 'dbo.logs', 'logs_work'
GO
SELECT * INTO dbo.Logs_Backup FROM dbo.Logs_Work WHERE TIMESTAMP < DATEADD(month, -6, GETDATE())
I have a table with about 466 Million rows. In this table there is a int column called WeeksToRetain as well as a EventDate column containing the date the row was inserted. I am trying to delete all the rows that that should be deleted according to the WeeksToRetain. For example, if the EventDate is 5/07/15 with a 1 in the WeeksToRetain column the row should be removed by 5/14/15. I am not sure what days SQL considers the beginning and end of the week. However the core issue I am having is the sheer mass of deletions I must do and log growth.
So I am trying to do the delete in batches. More specifically I want to load a temporary table with a million rows, then use the temporary table to load a sub temporary table with 100,000 rows and join this temporary table to the table I want to delete from looping through 10 times to get 1 million. The Logging.EvenLog table which is the table I'm trying to purge has a clustered index on EventDate (ASC). I would like to run this in a schedule job with enough time between executions for log backups to run.
DECLARE @i int DECLARE @RowCount int DECLARE @NextBatchDate datetime CREATE TABLE #BatchProcess ( EventDate datetime, ApplicationID int,
Our system runs a SQL Server 2012 DB, it has a table (table_a) which has over 10M records. Our system have to receive data file from previous system daily which contains approximate 3M updated or new records for table_a. My job is to update table_a with the new data.
The initial solution is:
1 Create a table (table_b) which structur is as the same as table_a
2 Use BCP to import updated records into table_b
3 Remove outdated data in table_a: delete from table_a inner join table_b on table_a.key_fileds = table_b.key_fields
4 Append updated or new data into table_a: insert into table_a select * from table_b
As the test result, this solution is very inefficient. Step 3 costs several hours, e.g. How can I improve it?
I'm running a resource-intensive stored procedure, which reads a filewith about 50,000 lines with a BULK INSERT into a temp table, thengoes through it and inserts a record for each line into another table.While this procedure is running, SQL server stops accepting any otherrequests coming from the website.Question:Is there a way to make SQL server "listen", or emulate an "interrupt"to other requests while in the middle of a long intensive process?I really appreciate your replies.Thank you,Oleg.
I do have very old versions of duplicate store procedures on my databases. I know there is no "safe" way to do this using DMVs, so I am planning to combine that with a trace. But I would like to get others opinions about that.
Here's the DMV I am planning to use:
SELECT CASE WHEN database_id = 32767 then 'Resource' ELSE DB_NAME(database_id)END AS DBName ,OBJECT_SCHEMA_NAME(object_id,database_id) AS [SCHEMA_NAME] ,OBJECT_NAME(object_id,database_id)AS [OBJECT_NAME] ,cached_time ,last_execution_time ,execution_count
[Code] ....
I will save that on a local table and run it every 5 min maybe? Or at an interval equal or lower than PLE?
I need to create a Clustered Index (CI) on a very large SQL Server 2012 database table. This table has about approximately 10 billion rows, 500 GB in size. The job ran for about 20 hours into it and then fails with error: "Out of disk space in tempdb". My tempDB size is 1.8TB, but yet it's still not enough.
Here is my script:
CREATE CLUSTERED INDEX CI_IndexName ON TableName(Column1,Column2) WITH (MAXDOP= 4, ONLINE=ON, SORT_IN_TEMPDB = ON, DATA_COMPRESSION=PAGE) ON sh_WeekDT(Day_DT) GO
I have a very large table that I need to partition. Ideally the table will write to three filegroups. I have defined the Partition function and scheme as follows.
CREATE PARTITION FUNCTION vm_Visits_PM(datetime) AS RANGE RIGHT FOR VALUES ('2012-07-01', '2013-06-01') CREATE PARTITION SCHEME vm_Visits_PS AS PARTITION vm_Visits_PM TO (vm_Visits_Data_Archive2, vm_Visits_Data_Archive, vm_Visits_Data)
This should create three partitions of the vm_Visits table. I am having a few issues, the first has to do with adding a new clustered index Primary Key to the existing table. The main issue here is that the closed column is nullable (It is a datetime by the way). So running the following makes SQL Server upset:
ALTER TABLE dbo.vm_Visits ADD CONSTRAINT [PK_vm_Visits] PRIMARY KEY CLUSTERED ( VisitID ASC, Closed ) ON [vm_Visits_PS](Closed)
I need to define a primary key on the VisitId column, but I need to include the Closed column in order to partition on it.how I would move data between partitions on a monthly basis. Would I simply update the Partition function, or have to to some sort of merge, split, or switch function?
Other than right-clicking on each individual table in SSMS and generating a CREATE script, is there a simple way to generate CREATE TABLE scripts for tables within a given database?
Background: I have a bunch of tables in one database, and I would like to add tables to a second database that have the same names and basic structures of some of the tables from the first database.
I do not need to transfer any data from the tables, this is a seperate project that will use a similar data structure. I just want to generate the CREATE TABLE scripts for 30ish tables within the first database, and then I'll tweak the scripts as appropriate and run them against the new database.
Hi all I have table with about 67 million records that are marked for deletion.
I know that I can
DELETE from table WHERE ToBeDeleted='t'
But this may be too big a task for the server considering the amount of data to delete at once. And if it runs out of resources and errors then nothing gets deleted....
Is there a way to segmant or loop so i can delete like 100k records at a time?
greets all, ive got a table with batches of records. each group of records has a batch id as part of the PK in the form BTCXXXX where XXXX is an auto-incremented number. so lets say i have 100 batches of 20k records per batch in the table. so the distinct batch ids are BTC0200 (oldest batch) through BTC0300 (newest batch). i only want to keep the 90 most recent batches in the table at any given time. is it ok to just subtract 90 from the last batch id and do something like:
DECLARE @batch_id char(10) SET @batch_id = 'BTC' + batch_num-90
DELETE FROM ITEM_BATCH WHERE BATCH_ID < @batch_id
i want to cover if the table has more than 90 batches and if the table has less then 90 batches. is this a feasible approach?
We have staging table in which data is dumped from files . The staging table is truncated for every load . In order to retain data from staging table we are creating staging_purge table which hold the staging data. what is the fastest way to copy data from staging to purge table without impacting the load process.
I hope someone can help me with a big problem... I'm using Citrix Resource Management Services with a SQL 2000 database. Their are 15 citrix servers which are all reporting to the SQL database.
The database is expanding very quickly and is becoming slower and slower.
My question is: I want to schedule a purge of old records on a friday afternoon, like this:
I am just trying to find a good article on the process SQL goes through when shutting down and starting up, so far I have not found anything definitive on Google. I am assuming a checkpoint is invoked and committed transactions are written to disk, while uncommitted are rolled back, but I would like an official textual description of what happens.
In my SQL Server Errorlog, I see the below error. The system has 8 GB of RAM with enough free RAM, something I can do to prevent this alert? (Note: I have no MIN/MAX memory set on this Instance)
A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: 328 seconds. Working set (KB): 76896, committed (KB): 167628, memory utilization: 45%.
now i use a SQL job to check the status column, pick xml and load to appropriate tables. instead of SQL job can i use a trigger whenever there is a insert it fires sp to load data would there be any disadvantage.
We have a large OLAP database, about 2.5 TB spread out over 3 data files on three different drives, and recently someone ran a query that created a table that continued to grow until the data files filled the available disk space (about 3 TB total - 1 TB per drive).
Tonight I plan on running a full backup (it's in Simple mode) and running a ShrinkFile on all three files sequentially with TRUNCATEONLY just so it will remove the space after the last extent. Any way to tell ahead of time how much space this will recover?
Granted running a DB Shrink is one of those things you just don't do, but this is a one-time shot and unavoidable to get the file size back under control.
I am fetching large amount of data from teradata to sql server using linked server. I am facing below query:
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)
A New Monthly data is being loaded, checked and finally approved after 6 or 7 iteration before approval.Because of this iteration the monthly data set is being added then deleted then added then deleted few times.Because the table is big this process takes time, any thoughts on how to make the delete insert process faster.Keep in mind I cannot do much because it is a production table and is being access by other users to do other analysis.
Delete is done based on trx_date which is a year/month combo, like 201508.
The table has monthly sales by customer aggregated.
The table structure is:
CREATE TABLE [dbo].[Sales]( [batch_key] [int] NOT NULL, [Company_key] [int] NOT NULL, [customer_key] [char](22) NOT NULL, [Trx_Date] [int] NOT NULL, [account] [nvarchar](35) NOT NULL,
Hi,I'm implementing mass information procedures that is stored in a SQL Database.What methods and actions i need to take in order for the process to be faster in a case like this (its more then 1000000 rows).I'm trying also to improve the memory usage since i use the DataTable in C# and I'm looking for a better way to process the retrieved data. is there a better class or method thats improving the speed and preventing any memory leaks ?please advise....Thanks for any help,Lior S ;)
Hi, I'm running an application on a server which grabs data from a database table on another server using SqlConnection, SqlDataAdapter and DataSet. The application then updates every row in that DataSet's DataTable and the updates are saved back using DataAdapter. The code is pretty much straightforward code that you would find on MSDN documentation for using DataSets. The table contains a little over a million rows. When I run the application, I get an error saying the Server Application is not available. Upon looking into the application event log, I get this message. aspnet_wp.exe was recycled because memory consumption exceeded the 306 MB (60 percent of available RAM) How do I get round this? I thought DataSets were supposed to handle large datatables comfortably without having memory issues. -Thanks
In a Library Management database we have these tables
1) Document ( DocNo , Doc_type , permalink,inDate) 2)Title(id, DocNo,Main_Title, Other_Title) 3)Author(id , Author_Name , Author_Family,Type--Like:main author , translator ,....) 4)Publisher(id,DocNo , Name,Publisedate,address) 5)Subject(id,DocNo,Subject) 6)Description(id,DocNo,ISBN,description)--one document may have some ISBN,etc
In document table I have 500,000 records.
I want to search a word in these tables ,for example i want to search 'Computer' ,this word may be in subject or title or description and etc. How can I do this with best performance?
I have a source table in the staging database stg.fact and it needs to be merged into the warehouse table whs.Fact.
stg.fact is not a delta feed it is basically an intra-day refresh.
Both tables have a last updated date so its easy to see which have changed.
It will be new (insert) or changed (update) data that I am interested in, there are no deletions.
As this could be in the millions of rows that are inserts or updates then this needs to be efficient.
I expect whs.Fact to go to >150 million rows.
When I have done this before I started with T-SQL Merge statement and that was not performant once I got to this size.
My original option was to do this is SSIS with a lookup task that marks the inserts and updates and deal with them seperately. However I set up the lookup tranformation the reference data set will have a package variable in the SQL commnd. This does not seem possible with the lookup in 2012! Currently looking at Merge Join transformation and any clever basic T-SQL that could work as this will need to be fast, and thats where I think that T-SQL may be the better route.
Both tables will have >100,000,000 rows Both tables have the last updated date The Tables are in different databases but on the same SQL Instance Each table holds 5 integer columns, one Varchar, one datatime
Last time I used Merge it was a wider table with lots of columns so don't know if this would be an option.
We have a table to 100M rows and up until now we were fine with an non clustered index a varchar(4000) because we never went above 900 bytes (yes it is a bad design).We have the need to support international character sets now so the column was updated to nvarchar(4000) and now we have data past the 900 byte limit.
The data is long, seems useless but is needed by the business and they need to be able to search "where bigcolumn like 'test%'". With an index, even with a huge amount of data, it was 'fast'. Now of course without an index it is unusable. The wildcard is always at the end of the search. I made a full text index on the column and basic queries such as: select * from ourtable where contains(bigcolumn, 'AReallyLongStringofTextHere') works fine unless there is a space in the data. We loose thousands of returned rows because of spaces in the data.
I have tried select * from ourtable where contains(bigcolumn, '"AReallyLongStringofTextHere that includes spaces"') but not all of the data is returned. I get 112 rows with the contains statement. The table scanning statement of "select * from ourtable where bigcolumn like 'AReallyLongStringofTextHere that includes spaces%' returns 1939 rows.I understand that a full text index is breaking the long string up since it contains spaces. Is there a way to retain the entire string as 1 index entry or is there a way to fix my query to return all of the rows?
We recently installed SQL server 2005 on a couple of our servers. I use Visual Basic 6.0 at the moment and use ADO to connect to our various SQL servers.
I recently discovered on one of the new servers, that every time my programs runs, (every 4 minutes for 12 hours a day) the SQL process shown in task manager grows by 1-10 Megs.
The SQL process was at 776,912K when I rebooted this afternoon. It started back up at 106,120K.
I am not doing anything differently than I did when my programs were talking to SQL 2000, and I have never seen this memory leak issue. Is there something extra I need to do in SQL 2005 to finish/clear these SQL queries and not bog down SQL's memory?
An example of how I would connect and do a SQL transaction:
I need to get this data into an SQL table in the following form so I can use it to further manipulate the data and update several other tables. I am thinking that UNPIVOT or CROSS APPLY might be the way to go, but am not sure how to code it.
so async cursor population is supposed to create the cursor and return the cursor id quickly, while the server works on async populating the results. For a keyset-driven cursor, SQL Server stores the key sets in tempdb, which it then uses to fetch data for cursor results. Anyway, this works fine for smaller tables, but I'm finding for large result sets, the async cursor population is very slow and indeed seems to approximate synchronous time. The wait stat I get while it is running (supposedly asynchronously) is TRANSACTION_MUTEX.
Example: --enable async cursor exec dbo.sp_configure 'cursor threshold', 0; reconfigure; declare @cursor int, @stmt nvarchar(max), @scrollopt int, @ccopt int, @rowcount int; --example of giant result set set @stmt = 'select * from sys.all_objects o1, sys.all_objects o1';
[code]...
Note that using the SQL "select * from sys.all_objects o1" is much faster than "select * from sys.all_objects o1, sys.all_objects o2". However, if cursor population is async, I'd expect the time to return a cursor id to be similar between the two.
I have stored procedure .In SP i am using cursur to load data from Parent to several child table.
I have attached the script with this message.
And my problem is how to use direct select and insert or load to speedup the process instead of cursor.
USE [IconicMarketing] GO /****** Object: StoredProcedure [dbo].[SP_DMS_INVENTORY] Script Date: 3/6/2015 3:34:03 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO
Greetings, I have a SQL Server 2005 database which is populated with test data. I need to copy this database to a new instance and then purge the copied database of it's contents for the next round of testing. I know there exists a Copy Database Wizard w/in the SQL Server Management Studio; I'm assuming I need to perform this first, then purge the existing data w/in the copy. The Copy operation looks pretty straight forward, but I haven't a clue on how to perform the purge. Can someone help? Regards, Loopsludge
* SQL Server 2008 R2 * Database was created from a third party product. The product writes to the 3 tables that I need to make changes to 24/7 and downtime is not an option. All changes must be done live. * Database overall size is ~200 GB * The 3 tables I must update make up ~190 GB of that space. * Tables have no primary key or ID columns. Therefore, the data is highly fragmented. * Of the ~190 GB of space allocated for the tables, there is roughly 70 GB of actual data. * Rows of the table are not guaranteed to be unique. In fact, on one of the tables, tests were ran with a small sample of data and duplicates were very much evident.
What I'm trying to accomplish here is to get an ID column added to the 3 tables and set that ID field as the primary key. Doing so will force the data to become much less fragmented than it is currently and with purging and new inserts, eventually fragmentation will be nearly non-existent.
Problem: Making table changes on tables this large while data is constantly being added poses many risks and can cause data loss. This was tried on a smaller table than these three and the entire table was lost in the process. Restore from backup was needed to get back to most recent log backup point.
Original Solution: My original plan was to create a backup of each table and run the script below to migrate the majority of the data temporarily into the new table. I could then update the original table (which now would contain much less data) and then migrate the data back.
Original Solution Problem: The problem with the solution above is that it calls the DELETE function on the original table using the values from the temporary table. When there are duplicate rows, which have not all been inserted into the backup table yet, they will all be removed from the original table because there is nothing unique to separate them out. In my testing, I had 10,000 rows in the original table and ended up with 9,959 rows in the backup table.
Question 1: Is my approach to making these table changes reasonable? Question 2a: If so, how can I make sure I don't lose data as part of this temporary migration of the data to my backup tables? Question 2b: If not, what would be a better approach that isn't going to cause disruption to the application that INSERTs data 24/7 and won't have any risk of data loss?
I have a large table containing about 800 million rows with an average row length of about 1K. The columns in the table are char columns. I need to move the contents of this table into a similar table where the target columns are varchar. The original table column definitions are compatible with the target table but the reverse is not necessarily true. For example, one column is being changed from int to bigint. The table is partitioned.
So, what is the fastest way to migrate the data. I was thinking to unload each partition into a flat file and load the target table running multiple load streams? Is this a good way?
I need to write a process to get file size in kb and record count in a file. I was planning on writing a c# console app that takes the file path and name as a param however should i use a CLR?
I cant put a script in the ssis when it's bringing the file down because it has been deemed that we only use ssis for file consumption.