Database maintenance can be a full -time job when working within a large corporation. Cleanup or deletion of records can especially be time-consuming and error prone when dealing with large data tables with millions of records. Often developers will try to run a basic delete statement without knowing about batching or truncating the activity log. Below is a simple method for performing a batch delete.
Define you DELETE statement
The first step is to define your DELETE statement with the appropriate WHERE clause. If for example you want to delete all records before a set date then your query would look similar to:
DELETE FROM my_table WHERE action_ts < '06/01/2009'
Implementing the basic batch loop
To implement a basic loop we need to define some variables to keep track of a few variables. First we define @err to obtain the error id if an error occours, and @nCount to hold the total number of rows deleted.
DECLARE @err int, @nCount int SELECT @nCount=0
Next we implement or while statement to execute or delete statement while more records are available to be deleted. In this example we will test if any rows exist matching or deletion statement.
WHILE EXISTS (SELECT * FROM my_table WHERE action_ts < '06/01/2009') BEGIN ... END
In our while loop we will delete our records in batches by use of the TOP command that limits how many records are effected. We will be using a batch size of 10000.
DELETE TOP 10000 FROM my_table WHERE action_ts < '06/01/2009'
After our delete query we want to check if any errors occurred and if so break out of our batch loop. This is done by obtaining the current error status and if an error occurred then breaking out of the loop by the break command.
SELECT @err=@@error, @nCount=@nCount+@@rowcount IF @err=0 begin ... end else begin raiserror 20001 'Failed to delete date from table my_table' break end
Dumping the activity log and output statements
To be notified on how many rows are being deleted we can output the @nCount variable that is udpated after every deletion with the number of rows affected by the last delete statement. We can use the print statement to output the number.
print '%l! rows deleted', @nCount
Also when dealing with a large amount of rows it is possible for the activity log to get full for the database. To clean up the log use the dump tran command with the truncate_only option.
DUMP TRAN my_db WITH TRUNCATE_ONLY
The Full Batch Delete Script
Putting the entire script together we end up with:
DECLARE @err int, @nCount int SELECT @nCount=0 WHILE EXISTS (SELECT * FROM my_table WHERE action_ts < '06/01/2009') BEGIN DELETE TOP 10000 FROM my_table WHERE action_ts < '06/01/2009' SELECT @err=@@error, @nCount=@nCount+@@rowcount IF @err=0 BEGIN DUMP TRAN my_db WITH TRUNCATE_ONLY print '%l! rows deleted', @nCount END ELSE BEGIN raiserror 20001 'Failed to delete date from table my_table' break END END
For those that worry about the extra time needed for doing the exists select clause in the while loop you could optionally use a separate variable to hold the latest row count and test for if the row count is greater than 0.
Comments & Questions
I believe you need to have the T-SQL changed in your article for the DELETE TOP to
DELETE TOP (10000)
I needed to use the parentheses to get the DELETE TOP to work properly in SQL 2005.
James, you are correct in that Microsoft’s version of T-SQL does require the parenthesis. I currently do most of my work in Sybase T-SQL which does not require their use. Thanks for the input, I’m sure it will be helpful to others.
Add Your Comment