Orchestration engines and frameworks?

Eugene Marin picture Eugene Marin · Apr 23, 2018 · Viewed 9k times · Source

I'm looking for an orchestration framework/engine/toolkit with which to replace/upgrade an existing software, mainly because of scalability limitations. By orchestration I mean asynchronous and distributed execution of generic tasks and workflows.

More specifically the requirements are pretty much these:

  • Wrapping and execution of generic tasks, in Java if language dependent
  • API for tasks and workflows on-demand triggering
  • Scheduling would be nice as well
  • Support for distributed architecture & scalability (mainly for big numbers of small tasks)
  • Persistency and resilience
  • Advanced workflow configuration capabilities (do this, then these 3 tasks in parallel, then this, having priorities, dependencies...)
  • Monitoring and administration UI (or at least API)

The existing system is an old fashion monolithic service (in Java) that has most of that, including the execution logic itself which should remain as untouched as possible.

Does anyone have experience with a similar problem? It seems to me it's supposed to be pretty common, would be strange if I have to implement it myself entirely. I found some questions here (like this and this) discussing the theory of orchestration and choreography systems, but not real examples of tools implementing it. Also I think we're not exactly talking about microservices - the tasks are not prolonged and heavy, they're just many, running in the background executing short jobs of many types. I wouldn't create a service for every job type.

I'm also not looking for cloud and container services at this point - to my understanding the deployment is a different issue.

The closest I got is the Netflix Conductor engine, which answers most of the requirements by running an orchestration server that manages tasks implemented in servlets (or any web services in any language - a plus). However it seems like it's built mainly for arranging heavy tasks in a workflow rather than running a huge number of small tasks, which makes me wonder what would be the overhead of invoking many small tasks in servlets for example.

Does anyone have experience or any input on the Conductor or other tools I could use? Or even my entire approach to the problem?

EDIT: I realize it's kind of a "research advice needed" so let's put it simply in 3 questions:

  1. Am I right to look for an orchestration solution for the requirements above?
  2. Does anyone have experience with the Netflix Conductor? Any feedback on it?
  3. Does it have good competitors?

Answer

demorphica picture demorphica · May 29, 2018

Perhaps you are looking for something like Airflow https://airflow.apache.org/ ?

Wrapping and execution of generic tasks, in Java if language dependent

https://github.com/apache/incubator-airflow/tree/master/airflow/hooks https://github.com/apache/incubator-airflow/tree/master/airflow/contrib/operators

API for tasks and workflows on-demand triggering

https://airflow.apache.org/api.html (experimental)

Scheduling would be nice as well

think of cron on steroids - https://airflow.apache.org/scheduler.html

Support for distributed architecture & scalability (mainly for big numbers of small tasks)

scale with dask or celery nodes - Airflow + celery or dask. For what, when?

Persistency and resilience

uses a postgres db & rabbitMQ - if your deployment arch is stateless ( eg. repeatable containers & volumes with docker) you should be in good shape with WAL replication if you use Kubernetes or Consul there are other ways to implement more resilience on the other components

Advanced workflow configuration capabilities (do this, then these 3 tasks in parallel, then this, having priorities, dependencies...)

Airflow uses DAG's. The capabilities can be called fairly advanced. You also have parameter sharing using XCOMs if you really need that

Monitoring and administration UI (or at least API)

Has one, shows tasks & schedules & has a gantt view. also can see logs & run details easily & also manually schedule tasks directly from the UI

also look at oozie & azkaban

did this help?