I am new in hadoop and hive and I would know what is the difference between index and partition in hive? When I use index and when partition?
Thank you!
Indexes are new and evolving (features are being added) but currently Indexes are limited to single tables and cannot be used with external tables. Creating an index creates a separate table. Indexes can be partitioned (matching the partitions of the base table). Indexes are used to speed the search of data within tables.
Partitions provide segregation of the data at the hdfs level, creating sub-directories for each partition. Partitioning allows the number of files read and amount of data searched in a query to be limited. For this to occur however, partition columns must be specified in your WHERE clauses.
While building your data model you can determine the best use of indexes and/or partitions based on the size of your data and your expected use patterns.