What is Apache Beam?

Viswa picture Viswa · Feb 8, 2016 · Viewed 21.8k times · Source

I was going through the Apache posts and found a new term called Beam. Can anybody explain what exactly Apache Beam is? I tried to google out but unable to get a clear answer.

Answer

Frances picture Frances · Feb 11, 2016

Apache Beam is an open source, unified model for defining and executing both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and runtime-specific Runners for executing them.

History: The model behind Beam evolved from a number of internal Google data processing projects, including MapReduce, FlumeJava, and Millwheel. This model was originally known as the “Dataflow Model” and first implemented as Google Cloud Dataflow -- including a Java SDK on GitHub for writing pipelines and fully managed service for executing them on Google Cloud Platform. Others in the community began writing extensions, including a Spark Runner, Flink Runner, and Scala SDK. In January 2016, Google and a number of partners submitted the Dataflow Programming Model and SDKs portion as an Apache Incubator Proposal, under the name Apache Beam (unified Batch + strEAM processing). Apache Beam graduated from incubation in December 2016.

Additional resources for learning the Beam Model: