Unable to run Airflow Tasks due to execution date and start date

Branko picture Branko · Sep 13, 2017 · Viewed 10.9k times · Source

Whenever I try to run a DAG, it will be in the running state but the tasks will not run. I have set my start date to datetime.today() and my schedule interval to "* * * * *". Manually triggering a run will start the dag but the task will not run due to:

The execution date is 2017-09-13T00:00:00 but this is before the task's start date 2017-09-13T16:20:30.363268.

I have tried various combinations of schedule intervals (such as a specific time each day) as well as waiting for the dag to be triggered and manual triggers. Nothing seems to work.

Answer

mustafagok picture mustafagok · May 10, 2019

First of all start_date is a task attribute; but in general, it is set in default_args and used like dag attribute.

The message is very clear, if a task's execution_date is before the task's start_date, it can not be scheduled. You can set start_date smaller value:

import datetime

default_args = {
    'start_date': datetime.datetime(2019, 1, 1)  # hard coded date
}

or

import airflow

default_args = {
    'start_date': airflow.utils.dates.days_ago(7)  # 7 days ago
}

From Airflow Documentation

Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.

Let’s Repeat That The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.

So, when you schedule your dag, any dag_run's execution_date will be smaller than its start time. For daily, there will be 24 hours difference.

We can say start time = execution_date + schedule_interval