Will Spark SQL completely replace Apache Impala or Apache Hive?

sql hadoop apache-spark hive impala

Tim Koo · Oct 25, 2016 · Viewed 7.7k times · Source

I need to deploy Big Data Cluster on our servers. But I just know about knowledge of Apache Spark. Now I need to know whether Spark SQL can completely replace Apache Impala or Apache Hive.

I need your help. Thanks.

Answer

I would like to explain this with real time scenarios

In real time Production projects:

Hive is used mostly for storing data/tables and running ad-hoc queries if the organisation is increasing their data day by day and they use RDBMS data for querying then they can use HIVE.

Impala is used for Business intelligence projects where the reporting is done through some front end tool like tableau, pentaho etc..

and Spark is mostly used in Analytics purpose where the developers are more inclined towards Statistics as they can also use R launguage with spark, for making their initial data frames.

So answer to your question is "NO" spark will not replace hive or impala. because all three have their own use cases and benefits , also ease of implementation these query engines depends on your hadoop cluster setup.

Here are some links which will help you understand more clearly:

http://db-engines.com/en/system/Hive%3BImpala%3BSpark+SQL

http://www.infoworld.com/article/3131058/analytics/big-data-face-off-spark-vs-impala-vs-hive-vs-presto.html

https://www.dezyre.com/article/impala-vs-hive-difference-between-sql-on-hadoop-components/180

Will Spark SQL completely replace Apache Impala or Apache Hive?

Answer

Related questions