I need to perform a daily update of a very large (300M records) and broad TABLE1
. The the source data for the updates is located in another table UTABLE
that is 10%-25% the rows of TABLE1
but is narrow. Both tables have record_id
as a primary key.
Presently, I am recreating TABLE1
using the following approach:
<!-- language: sql -->
1) SELECT (required columns) INTO TMP_TABLE1
FROM TABLE1 T join UTABLE U on T.record_id=U.record_id
2) DROP TABLE TABLE1
3) sp_rename 'TMP_TABLE1', 'TABLE1'
However this takes nearly 40 minutes on my server (60GB of RAM for SQL Server). I want to achieve a 50% performance gain - what other options can I try?
MERGE
and UPDATE
- something like the code below works faster only for a very small UTABLE
table - at full size, everything just hangs:
<!-- language: SQL -->
MERGE TABLE1 as target
USING UTABLE as source
ON target.record_id = source.record_id
WHEN MATCHED THEN
UPDATE SET Target.columns=source.columns
I heard that I can perform a batch MERGE by using ROWCOUNT - but I don't think it can be fast enough for a 300M row table.
Any SQL query hints that can be helpful?
Actually i've found out general recommendations for such a queries: Idea to use SQL Merge or Update is a very clever one but it fails when we need to update many records (i.e. 75M) in a big and wide table (i.e. 240M).
Looking at the query plan of the query below we can say that TABLE SCAN
of TABLE1 and final MERGE
are taking 90% of time.
MERGE TABLE1 as Target
USING UTABLE as source
ON Target.record_id = source.record_id
WHEN MATCHED AND (condition) THEN
UPDATE SET Target.columns=source.columns
So in order to use MERGE we need to:
UTABLE
smaller or specifying additional condition
that narrows part to-be-merged.TABLE1
twice less reduced my real query time from 11 hours to 40 minutes.As Mark mentioned you can use UPDATE
syntax and use WHERE
clause to narrow part to-be-merged - this will give same results. Also please avoid indexing TABLE1
as this will cause additional work to rebuild index during MERGE