Whenever I try to run a DAG, it will be in the running state but the tasks will not run. I have set my start date to datetime.today() and my schedule interval to "* * * * *". Manually triggering a run will start the dag but the task will not run due to:
The execution date is 2017-09-13T00:00:00 but this is before the task's start date 2017-09-13T16:20:30.363268.
I have tried various combinations of schedule intervals (such as a specific time each day) as well as waiting for the dag to be triggered and manual triggers. Nothing seems to work.
First of all start_date
is a task attribute; but in general, it is set in default_args
and used like dag attribute.
The message is very clear, if a task's execution_date
is before the task's start_date
, it can not be scheduled. You can set start_date
smaller value:
import datetime
default_args = {
'start_date': datetime.datetime(2019, 1, 1) # hard coded date
}
or
import airflow
default_args = {
'start_date': airflow.utils.dates.days_ago(7) # 7 days ago
}
Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.
Let’s Repeat That The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.
So, when you schedule your dag, any dag_run's execution_date
will be smaller than its start time. For daily, there will be 24 hours difference.
We can say start time = execution_date
+ schedule_interval