What is Apache Beam?

apache-beam

Viswa · Feb 8, 2016 · Viewed 21.8k times · Source

I was going through the Apache posts and found a new term called Beam. Can anybody explain what exactly Apache Beam is? I tried to google out but unable to get a clear answer.

Answer

Apache Beam is an open source, unified model for defining and executing both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and runtime-specific Runners for executing them.

History: The model behind Beam evolved from a number of internal Google data processing projects, including MapReduce, FlumeJava, and Millwheel. This model was originally known as the “Dataflow Model” and first implemented as Google Cloud Dataflow -- including a Java SDK on GitHub for writing pipelines and fully managed service for executing them on Google Cloud Platform. Others in the community began writing extensions, including a Spark Runner, Flink Runner, and Scala SDK. In January 2016, Google and a number of partners submitted the Dataflow Programming Model and SDKs portion as an Apache Incubator Proposal, under the name Apache Beam (unified Batch + strEAM processing). Apache Beam graduated from incubation in December 2016.

Additional resources for learning the Beam Model:

The Apache Beam website
The VLDB 2015 paper (using the original naming Dataflow model)
Streaming 101 and Streaming 102 posts on O’Reilly’s Radar site
A Beam podcast on Software Engineering Radio

What is Apache Beam?

Answer

Related questions