We have a need to update several tables that have parent/child relationships based on an Identity primary-key in the parent table, which is referred to by one or more child tables as a foreign key.
Our prototype in F# shows a lot of promise, with a 34x performance increase, but this code forces known Identity values in the parent table. When not forced, the Identity column does get correctly generated in the database when SqlBulkCopy inserts the rows, but the Identity values do NOT get updated in the in-memory DataTable. Further, even if they were, it is not clear if the DataSet would correctly fix-up the parent/child relationships, so that the child tables could subsequently be written with correct foreign key values.
Can anyone explain how to have SqlBulkCopy update Identity values, and further how to configure a DataSet so as to retain and update parent/child relationships, if this is not done automatically when a DataAdapter is called to FillSchema on the individual DataTables.
Answers that I'm not looking for:
Similar to the following unanswered question:
First of all: SqlBulkCopy is not possible to do what you want. As the name suggests, it's just a "one way street". I moves data into sql server as quick as possible. It's the .Net version of the old bulk copy command which imports raw text files into tables. So there is no way to get the identity values back if you are using SqlBulkCopy.
I have done a lot of bulk data processing and have faced this problem several times. The solution depends on your architecture and data distribution. Here are some ideas:
Create one set of target tables for each thread, import in these tables. At the end join these tables. Most of this can implemented in a quite generic way where you generate tables called TABLENAME_THREAD_ID automatically from tables called TABLENAME.
Move ID generation completly out of the database. For example, implement a central webservice which generates the IDs. In that case you should not generate one ID per call but rather generate ID ranges. Otherwise network overhead becomes usually a bottle neck.
Try to generate IDs out your data. If it's possible, your problem would have been gone. Don't say "it's not possible" to fast. Perhaps you can use string ids which can be cleaned up in a post processing step?
And one more remark: An increase of factor 34 when using BulkCopy sounds to small in opinion. If you want to insert data fast, make sure that your database is configured correctly.