Hi everyone,
I need to schedule my python files(which contains data extraction from sql and some joins) using airflow. I have successfully installed airflow into my linux server and webserver of airflow is available with me. But even after going through documentation I am not clear where exactly I need to write script for scheduling and how will that script be available into airflow webserver so I could see the status
As far as the configuration is concerned I know where the dag folder is located in my home directory and also where example dags are located.
Note: Please dont mark this as duplicate with How to run bash script file in Airflow as I need to run python files lying in some different location.
You should probably use the PythonOperator
to call your function. If you want to define the function somewhere else, you can simply import it from a module as long as it's accessible in your PYTHONPATH
.
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from my_script import my_python_function
dag = DAG('tutorial', default_args=default_args)
PythonOperator(dag=dag,
task_id='my_task_powered_by_python',
provide_context=False,
python_callable=my_python_function,
op_args=['arguments_passed_to_callable'],
op_kwargs={'keyword_argument':'which will be passed to function'})
If your function my_python_function
was in a script file /path/to/my/scripts/dir/my_script.py
Then before starting Airflow, you could add the path to your scripts to the PYTHONPATH
like so:
export PYTHONPATH=/path/to/my/scripts/dir/:$PYTHONPATH
More information here: https://airflow.incubator.apache.org/code.html#airflow.operators.PythonOperator
Default args and other considerations as in the tutorial: https://airflow.incubator.apache.org/tutorial.html