Running Oozie actions in parallel

Marek Grzenkowicz picture Marek Grzenkowicz · Sep 23, 2014 · Viewed 10.4k times · Source

I am using the workflow editor in Hue to develop an Oozie workflow. There are a few action that should be executed in parallel.

Is it possible to execute two or more actions concurrently?
How can I set it up in Hue?

Answer

Marek Grzenkowicz picture Marek Grzenkowicz · Sep 23, 2014

Yes, it is possible. Among various Oozie workflow nodes, there are two control nodes fork and join:

A fork node splits one path of execution into multiple concurrent paths of execution.

A join node waits until every concurrent execution path of a previous fork node arrives to it.

The fork and join nodes must be used in pairs. The join node assumes concurrent execution paths are children of the same fork node.

Hue does support it, although it's not very intuitive - you can drag and drop actions on the workflow, but you cannot do the same for the control nodes.

To split one path of execution into two concurrent ones, drag one action onto another (e.g. step_B onto step_A in the example below):

Oozie - make actions concurrent This will add a fork node automatically and place the appropriate action underneath:

Oozie - fork control node