Difference between Hive internal tables and external tables?

DrewRose picture DrewRose · Jun 11, 2013 · Viewed 189k times · Source

Can anyone tell me the difference between Hive's external table and internal tables. I know the difference comes when dropping the table. I don't understand what you mean by the data and metadata is deleted in internal and only metadata is deleted in external tables. Can anyone explain me in terms of nodes please.

Answer

prestomation picture prestomation · Jun 11, 2013

Hive has a relational database on the master node it uses to keep track of state. For instance, when you CREATE TABLE FOO(foo string) LOCATION 'hdfs://tmp/';, this table schema is stored in the database.

If you have a partitioned table, the partitions are stored in the database(this allows hive to use lists of partitions without going to the file-system and finding them, etc). These sorts of things are the 'metadata'.

When you drop an internal table, it drops the data, and it also drops the metadata.

When you drop an external table, it only drops the meta data. That means hive is ignorant of that data now. It does not touch the data itself.