Best practices for inserting/updating large amount of data in SQL Server 2008

Question 1

Best practices for inserting/updating large amount of data in SQL Server 2008

sql csv sql-update feed sql-insert

Mark Clancy · Feb 26, 2010 · Viewed 21.7k times · Source

Answer

Answer

Seeing that you're using SQL Server 2008, I would recommend this approach:

first bulkcopy your CSV files into a staging table
update your target table from that staging table using the MERGE command

Check out the MSDN docs and a great blog post on how to use the MERGE command.

Basically, you create a link between your actual data table and the staging table on a common criteria (e.g. a common primary key), and then you can define what to do when

the rows match, e.g. the row exists in both the source and the target table --> typically you'd either update some fields, or just ignore it all together
the row from the source doesn't exist in the target --> typically a case for an INSERT

You would have a MERGE statement something like this:

MERGE TargetTable AS t
USING SourceTable AS src
ON t.PrimaryKey = src.PrimaryKey

WHEN NOT MATCHED THEN
  INSERT (list OF fields)
  VALUES (list OF values)

WHEN MATCHED THEN
  UPDATE
    SET (list OF SET statements)
;

Of course, the ON clause can be much more involved if needed. And of course, your WHEN statements can also be more complex, e.g.

WHEN MATCHED AND (some other condition) THEN ......

and so forth.

MERGE is a very powerful and very useful new command in SQL Server 2008 - use it, if you can!

Question 2

I'm building a system for updating large amounts of data through various CSV feeds. Normally I would just loop though each row in the feed, do a select query to check if the item already exists and insert/update an item depending if it exists or not.

I feel this method isn't very scalable and could hammer the server on larger feeds. My solution is to loop through the items as normal but store them in memory. Then for every 100 or so items do a select on those 100 items and get a list of existing items in the database that match. Then concatenate the insert/update statements together and run them into the database. This would essentially cut down on the trips to the database.

Is this a scalable enough solution and are there any example tutorials on importing large feeds into a productive environment?

Thanks

Best practices for inserting/updating large amount of data in SQL Server 2008

Answer

Related questions