What is hive, Is it a database?

Brainchild picture Brainchild · Nov 17, 2013 · Viewed 40.5k times · Source

I just started exploring Hive. It has all the structures similar to an RDBMS like tables, joins, partitions.. what i understand is Hive still uses HDFS for storage and it is an SQL abstraction of HDFS. From this I am not sure weather Hive itself is a database solution like HBase, Cassnadra.. or simply it is a query system on top of HDFS. I don't think it is simply a query language because it has tables, joins and partitions..

Answer

Sandeep Singh picture Sandeep Singh · Nov 17, 2013

Hive is a data warehousing package/infrastructure built on top of Hadoop. It provides an SQL dialect called Hive Query Language (HQL) for querying data stored in a Hadoop cluster. Like all SQL dialects in widespread use, HQL doesn’t fully conform to any particular revision of the ANSI SQL standard. It is perhaps closest to MySQL’s dialect, but with significant differences. Hive offers no support for row level inserts, updates, and deletes. Hive doesn’t support transactions. So we can't compare it with RDBMS. Hive adds extensions to provide better performance in the context of Hadoop and to integrate with custom extensions and even external programs. It is well suited for batch processing data like: Log processing, Text mining, Document indexing, Customer-facing business intelligence, Predictive modeling, hypothesis testing etc.

Hive is not designed for online transaction processing and does not offer real-time queries.