SSIS Control Flow vs Data Flow

Paul picture Paul · Dec 15, 2010 · Viewed 71.3k times · Source

I don't entirely understand the purpose of control flow in an SSIS package. In all of the packages I've created, I simply add a data flow component to control flow and then the rest of the logic is located within the data flow.

I've seen examples of more complicated control flows (EX: foreach loop container that iterates over lines in an Excel file.), but am looking for an example where it could not also be implemented in the data flow. I could just as easily create a connection to the excel file within the data flow.

I'm trying to get a better understanding of when I would need to (or should) implement logic in control flow vs using the data flow to do it all.

What prompted me to start looking into control flow and it's purpose is that I'd like to refactor SSIS data flows as well as break packages down into smaller packages in order to make it easier to support concurrent development.

I'm trying to wrap my mind around how I might use control flow for these purposes.

Answer

James Wiseman picture James Wiseman · Dec 15, 2010

A data flow defines a flow of data from a source to a destination. You do not start on one data flow task and move to the next. Data flows between your selected entities (sources, transformations, destinations).

Moreover within a data flow task, you cannot perform tasks such as iteration, component execution, etc.

A control flow defines a workflow of tasks to be executed, often a particular order (assuming your included precedence constraints). The looping example is a good example of a control-flow requirement, but you can also execute standalone SQL Scripts, call into COM interfaces, execute .NET components, or send an email. The control flow task itself may not actually have anything whatsoever to do with a database or a file.

A control flow task is doing nothing in itself TO the data. It is executing some that itself may (or may not) act upon data somewhere. The data flow task IS doing something with data. It defines its movement and transformation.

It should be obvious when to execute control flow logic and data flow logic, as it will be the only way to do it. In your example, you cite the foreach container, and state that you could connect to the spreadsheet in the data flow. Sure, for one spreadsheet, but how would you do it for multiple ones in a folder? In the data flow logic, you simply can't!

Hope this helps.