How to obtain and process mysql records using Airflow?

gpk27 picture gpk27 · Sep 22, 2017 · Viewed 13k times · Source

I need to

1. run a select query on MYSQL DB and fetch the records.              
2. Records are processed by python script.

I am unsure about the way I should proceed. Is xcom the way to go here? Also, MYSQLOperator only executes the query, doesn't fetch the records. Is there any inbuilt transfer operator I can use? How can I use a MYSQL hook here?

you may want to use a PythonOperator that uses the hook to get the data, apply transformation and ship the (now scored) rows back some other place.

Can someone explain how to proceed regarding the same.

Refer - http://markmail.org/message/x6nfeo6zhjfeakfe

def do_work():
    mysqlserver = MySqlHook(connection_id)
    sql = "SELECT * from table where col > 100 "
    row_count = mysqlserver.get_records(sql, schema='testdb')
    print row_count[0][0]

callMYSQLHook = PythonOperator(
    task_id='fetch_from_testdb',
    python_callable=mysqlHook,
    dag=dag
)

Is this the correct way to proceed? Also how do we use xcoms to store the records for the following MySqlOperator?'

t = MySqlOperator(
conn_id='mysql_default',
task_id='basic_mysql',
sql="SELECT count(*) from table1 where id > 10",
dag=dag)

Answer

Breathe picture Breathe · Nov 6, 2017

Sure, just create a hook or operator and call the get_records() method: https://airflow.readthedocs.io/en/stable/_modules/airflow/hooks/dbapi_hook.html