Is it possible for Airflow scheduler to first finish the previous day's cycle before starting the next?

user3542930 picture user3542930 · Dec 7, 2016 · Viewed 9.1k times · Source

Right now, nodes in my DAG proceeds to the next day's task before the rest of the nodes of that DAG finishes. Is there a way for it to wait for the rest of the DAG to finish before moving unto the next day's DAG cycle?

(I do have depends_on_past as true, but that does not work in this case)

My DAG looks like this:

               O
               l
               V
O -> O -> O -> O -> O

Also, tree view pic of the dag]

tree view pic of the dag

Answer

Oleg Yamin picture Oleg Yamin · Mar 22, 2017

Might be a bit late for this answer, but I ran into the same issue and the way I resolved it is I added two extra tasks in each dag. "Previous" at the start and "Complete" at the end. Previous task is external task sensor which monitors previous job. Complete is just a dummy operator. Lets say it runs every 30 minutes so the dag would look like this:

dag = DAG(dag_id='TEST_DAG', default_args=default_args, schedule_interval=timedelta(minutes=30))

PREVIOUS = ExternalTaskSensor(
    task_id='Previous_Run',
    external_dag_id='TEST_DAG',
    external_task_id='All_Tasks_Completed',
    allowed_states=['success'],
    execution_delta=timedelta(minutes=30),
    dag=DAG
)

T1 = BashOperator(
    task_id='TASK_01',
    bash_command='echo "Hello World from Task 1"',
    dag=dag
)

COMPLETE = DummyOperator(
    task_id='All_Tasks_Completed',
    dag=DAG
)

PREVIOUS >> T1 >> COMPLETE

So the next dag, even tho it will come into the queue, it will not let tasks run until PREVIOUS is completed.