What are the differences between airflow and Kubeflow pipeline?

Kevin Su picture Kevin Su · Nov 26, 2019 · Viewed 10k times · Source

Machine learning platform is one of the buzzwords in business, in order to boost develop ML or Deep learning.

There are a common part workflow orchestrator or workflow scheduler that help users build DAG, schedule and track experiments, jobs, and runs.

There are many machine learning platform that has workflow orchestrator, like Kubeflow pipeline, FBLearner Flow, Flyte

My question is what are the main differences between airflow and Kubeflow pipeline or other ML platform workflow orchestrator?

And airflow supports different language API and has large community, can we use airflow to build our ML workflow ?

Answer

arimbr picture arimbr · Nov 26, 2019

You can definitely use Airflow to orchestrate Machine Learning tasks, but you probably want to execute ML tasks remotely with operators.

For example, Dailymotion uses the KubernetesPodOperator to scale Airflow for ML tasks.

If you don't have the resources to setup a Kubernetes cluster yourself, you can use a ML platforms like Valohai that have an Airflow operator.

When doing ML on production, ideally you want to also version control your models to keep track of the data, code, parameters and metrics of each execution.

You can find more details on this article on Scaling Apache Airflow for Machine Learning Workflows