Any way to compute statistics on a hive table for all partitions with a single analyze command?

WestCoastProjects picture WestCoastProjects · Aug 29, 2013 · Viewed 29k times · Source

The syntax I see for computing statistics in hive seems to indicate the answer to the title question would be 'no':

ANALYZE TABLE [TABLENAME] PARTITION(parcol1=…, partcol2=….) COMPUTE STATISTICS

However, I wanted to throw it out here, since it i surprising that it were always required to write a script to iterate over the partitions to generate the per-partition statements. We have about a thousand partitions on this small table right now and it will be growing by orders of magnitude.

BTW I tried the following without specifying the partition:

hive> analyze table metrics compute statistics;
FAILED: SemanticException [Error 10115]: Table is partitioned and partition specification is needed

Answer

msciwoj picture msciwoj · Nov 12, 2014

Yes, you can.

At least from hive v0.13 which I'm on. Just try partition spec syntax without specific values (no =… bits)

If you're using FOR COLUMNS then you can't due to the bug: https://issues.apache.org/jira/browse/HIVE-4861