How to use AirFlow to run a folder of python files?

tensor picture tensor · Sep 21, 2016 · Viewed 21.8k times · Source

I have a series of Python tasks inside a folder of python files: file1.py, file2.py, ...

I read the Airflow docs, but I don't see how to specify the folder and filename of the python files in the DAG?

I would like to execute those python files (not the Python function through Python Operator).

Task1: Execute file1.py (with some import package)

Task2: Execute file2.py (with some other import package)

It would be helpful. Thanks, regards

Answer

Roman picture Roman · May 5, 2017

To execute the python file as a whole, using the BashOperator (As in liferacer's answer):

from airflow.operators.bash_operator import BashOperator

bash_task = BashOperator(
    task_id='bash_task',
    bash_command='python file1.py',
    dag=dag
)

Then, to do it using the PythonOperator call your main function. You should already have a __main__ block, so put what happens in there into a main function, such that your file1.py looks like so:

def main():
    """This gets executed if `python file1` gets called."""
    # my code

if __name__ == '__main__':
    main() 

Then your dag definition:

from airflow.operators.python_operator import PythonOperator

import file1

python_task = PythonOperator(
    task_id='python_task',
    python_callable=file1.main,
    dag=dag
)