Apache Spark vs Akka

apache-spark parallel-processing akka distributed-computing

user4658980 · Mar 17, 2015 · Viewed 39.6k times · Source

Could you please tell me the difference between Apache Spark and AKKA, I know that both frameworks meant to programme distributed and parallel computations, yet i don't see the link or the difference between them.

Moreover, I would like to get the use cases suitable for each of them.

Answer

Apache Spark is actually built on Akka.

Akka is a general purpose framework to create reactive, distributed, parallel and resilient concurrent applications in Scala or Java. Akka uses the Actor model to hide all the thread-related code and gives you really simple and helpful interfaces to implement a scalable and fault-tolerant system easily. A good example for Akka is a real-time application that consumes and process data coming from mobile phones and sends them to some kind of storage.

Apache Spark (not Spark Streaming) is a framework to process batch data using a generalized version of the map-reduce algorithm. A good example for Apache Spark is a calculation of some metrics of stored data to get a better insight of your data. The data gets loaded and processed on demand.

Apache Spark Streaming is able to perform similar actions and functions on near real-time small batches of data the same way you would do it if the data would be already stored.

UPDATE APRIL 2016

From Apache Spark 1.6.0, Apache Spark is no longer relying on Akka for communication between nodes. Thanks to @EugeneMi for the comment.

Apache Spark vs Akka

Answer

Related questions