I have a large SQL Server database with a table at about 45 million records. I am archiving this table, and need to remove all entries greater than two years ago. I have the inserting into my archive table working fine, but I'm having issues with efficiency when deleting.
My problem lies within the indexes currently on the table. I would like to delete (and archival insert) in 1000 record chunks. To do this, I need to determine the "top" 1000 records fulfilling the requirement (greater than two years old). The DateTime stamp on the row is a clustered index, so this is great for grabbing the rows. However SQL 2000 does not allow DELETE TOP 1000.... so I need to do something like:
DELETE FROM <table> WHERE [UniqueID] IN
(SELECT TOP 1000 [UniqueID] FROM <table> WHERE [DateTime] < @TwoYearsAgo)
This would work great, if UniqueID was indexed. Since it is not, this takes a very long time (it is scanning the table for each of the 1000 records to be deleted). There are no other indexes on the table that uniquely identify the records. I am told it would be too costly to compute an index on UniqueID, as this is a live DB. Can anyone point out a way to optimize this query?
How about rewriting the query?
SET ROWCOUNT 1000
DELETE FROM <table> WHERE [DateTime] < @TwoYearsAgo
See documentation on SET ROWCOUNT (Transact-SQL).
Also note that per the documentation for DELETE, it supports the TOP
clause, but that is apparently new for SQL Server 2005 and up. I'm saying this since it sounds like it isn't supported on your database server, but have you actually tried using it? I don't have access to SQL Server 2000 documentation so I'm unsure if it is supported on that version. It very well might not be.
DELETE TOP (1000) FROM <table> WHERE [DateTime] < @TwoYearsAgo
Note the difference from the way TOP on select can be written, without the parenthesis. For UPDATE, DELETE and INSERT, the expression must be parenthesized, even if it's only a constant number like above.