Airflow - run task regardless of upstream success/fail

J. Doe picture J. Doe · Jun 5, 2017 · Viewed 12.4k times · Source

I have a DAG which fans out to multiple independent units in parallel. This runs in AWS, so we have tasks which scale our AutoScalingGroup up to the maximum number of workers when the DAG starts, and to the minimum when the DAG completes. The simplified version looks like this:

           | - - taskA - - |
           |               |
scaleOut - | - - taskB - - | - scaleIn
           |               |
           | - - taskC - - |

However, some of the tasks in the parallel set fail occasionally, and I can't get the scaleDown task to run when any of the A-C tasks fail.

What's the best way to have a task execute at the end of the DAG, once all other tasks have completed (success or fail)? The depends_on_upstream setting sounded like what we needed, but didn't actually do anything based on testing.

Answer

Nick picture Nick · Jun 8, 2017

All operators have an argument trigger_rule which can be set to 'all_done', which will trigger that task regardless of the failure or success of the previous task(s).

You could set the trigger rule for the task you want to run to 'all_done' instead of the default 'all_success'.

A simple bash operator task with that argument would look like:

task = BashOperator(
    task_id="hello_world",
    bash_command="echo Hello World!",
    trigger_rule="all_done",
    dag=dag
    )