How do I split flat file data and load into parent-child tables in database?

Brent Arias picture Brent Arias · Jul 12, 2011 · Viewed 14.8k times · Source

I have denormalized data (coming from a file) that needs to be imported into parent-child tables. The source data is something like this:

Account#    Name        Membership    Email
101         J Burns     Gold          [email protected]
101         J Burns     Gold          [email protected]
101         J Burns     Gold          [email protected]
227         H Gordon    Silver        [email protected]
350         B Clyde     Silver        [email protected]
350         B Clyde     Silver        [email protected]

What are the pieces, parts, or tactics of SSIS I should use to read the first three columns into a parent table, and the 4th column (Email) into a child table? I have several options for the parent key which I am permitted to take:

  • Directly use the Account# as the primary key
  • Use a surrogate key generated by SSIS during the import process
  • Configure an identity primary key

I'm sure I've listed my primary key options in increasing order of difficulty. I'd be interested in knowing how to do the first and the last option - I'll infer how to achieve the middle option. To emphasize again, I'm interested in a decidedly SSIS solution; I'm looking for an answer that uses the language of SSIS, rather than a procedural, technology neutral answer.

My question is somewhat similar to another SO question, having an answer of vague viability. I'm hoping more detailed guidance could be given. I already know how to solve this problem by creating a "staging" middle-step, where the parent-child separation is actually handled with straight SQL. However, I'm curious about how this can be done without that kind of middle-step.

It seems to me this kind of import would be so common, that there would be a well-published formulaic way to handle it - a technique that SSIS excels at. As yet, I've not quite seen any straight up answer to this.

Update #1: Based on comments, I've adjusted the sample data to be more obviously denormalized. I also removed "flat" from "flat file," so that semantics don't interfere with the question.

Update #2: I've amplified my interest in a solution spoken in the language of SSIS.

Answer

user756519 picture user756519 · Jul 12, 2011

Here is one possible option that you can consider in loading parent-child data. This option consists of two steps. In the first step, read the source file and write data to parent table. In the second step, read the source file again and use lookup transformation to fetch the parent info in order to write data to the child table. Following example uses the data provided in the question. This example was created using SSIS 2008 R2 and SQL Server 2008 database.

Step-by-Step process:

  1. Create a sample flat file named Source.txt as shown in screenshot #1.

  2. In the SQL database, create two tables named dbo.Parent and dbo.Child using the scripts given under SQL Scripts section. Both the tables have an auto generated identity column.

  3. On the package, place an OLE DB connection to connect to the SQL Server and Flat File connection to read the source file as shown in screenshot #2. Configure the flat file connection as shown in screenshots #3 - #9.

  4. On the Control Flow tab, place two Data Flow Tasks as shown in screenshot #10.

  5. Inside the data flow task named Parent, place a Flat File source, Sort transformation and an OLE DB destination as shown in screenshot #11.

  6. Configure the flat file source as shown in screenshots #12 and #13. We need to read the flat file source.

  7. Configure the sort transformation as shown in screenshot #14. We need to eliminate the duplicate values so that only the unique records are inserted into the parent table dbo.Parent.

  8. Configure the ole db destination as shown in screenshots #15 and #16. We need to insert the data into the parent table dbo.Parent.

  9. Inside the data flow task named Child, place a Flat File source, Lookup transformation and an OLE DB destination as shown in screenshot #17.

  10. Configure the flat file source as shown in screenshots #12 and #13. This configuration is same as the flat file source in the previous data flow task.

  11. Configure the lookup transformation as shown in screenshots #18 and #20. We need to find the parent id from the table dbo.Parent using the other key columns present in the file. The key columns here are the Account, Name and Email. If the file happened to have a unique column, you could just use that column alone to fetch the parent id.

  12. Configure the ole db destination as shown in screenshots #21 and #22. We need to insert the Email column along with the Parent id into the table dbo.Child.

  13. Screenshot #23 shows data in the tables before the package execution.

  14. Screenshots #24 and #25 show sample package execution.

  15. Screenshot #26 shows data in the tables after the package execution.

Hope that helps.

SQL Scripts:

CREATE TABLE [dbo].[Child](
    [ChildId] [int] IDENTITY(1,1) NOT NULL,
    [ParentId] [int] NULL,
    [Email] [varchar](21) NULL,
CONSTRAINT [PK_Child] PRIMARY KEY CLUSTERED ([ChildId] ASC)) ON [PRIMARY]
GO

CREATE TABLE [dbo].[Parent](
    [ParentId] [int] IDENTITY(1,1) NOT NULL,
    [Account] [varchar](12) NULL,
    [Name] [varchar](12) NULL,
    [Membership] [varchar](14) NULL,
CONSTRAINT [PK_Parent] PRIMARY KEY CLUSTERED ([ParentId] ASC)) ON [PRIMARY]
GO

Screenshot #1:

1

Screenshot #2:

2

Screenshot #3:

3

Screenshot #4:

4

Screenshot #5:

5

Screenshot #6:

6

Screenshot #7:

7

Screenshot #8:

8

Screenshot #9:

9

Screenshot #10:

10

Screenshot #11:

11

Screenshot #12:

12

Screenshot #13:

13

Screenshot #14:

14

Screenshot #15:

15

Screenshot #16:

16

Screenshot #17:

17

Screenshot #18:

18

Screenshot #19:

19

Screenshot #20:

20

Screenshot #21:

21

Screenshot #22:

22

Screenshot #23:

23

Screenshot #24:

24

Screenshot #25:

25

Screenshot #26:

26