Apache Spark vs Apache Spark 2

YoungHobbit picture YoungHobbit · Oct 21, 2016 · Viewed 22.5k times · Source

What are the improvements Apache Spark2 brings compared to Apache Spark?

  1. From architecture perspective
  2. From application point of view
  3. or more

Answer

bob picture bob · Oct 21, 2016

Apache Spark 2.0.0 APIs have stayed largely similar to 1.X, Spark 2.0.0 does have API breaking changes

Apache Spark 2.0.0 is the first release on the 2.x line. The major updates are API usability, SQL 2003 support, performance improvements, structured streaming, R UDF support, as well as operational improvements.

New in spark 2:

  • The biggest change that I can see is that DataSet and DataFrame APIs will be merged.
  • The latest and greatest from Spark will be a whole lot efficient as compared to predecessors. Spark 2.0 is going to focus on a combination of Parquet and caching to achieve even better throughput.
  • Structured streaming is another big thing!
  • It will be the first version that will focus on ETL. Successive versions will add more operators and libraries for ETL

You can go through the Spark release 2.0.0 where updates in following points are explained:

  • API Stability
  • Core and Spark SQL
  • MLlib
  • SparkR
  • Streaming
  • Dependency, Packaging, and Operations
  • Removals, Behavior Changes and Deprecations
  • Known Issues