PIG how to count a number of rows in alias

kee picture kee · Mar 28, 2012 · Viewed 108k times · Source

I did something like this to count the number of rows in an alias in PIG:

logs = LOAD 'log'
logs_w_one = foreach logs generate 1 as one;
logs_group = group logs_w_one all;
logs_count = foreach logs_group generate SUM(logs_w_one.one);
dump logs_count;

This seems to be too inefficient. Please enlighten me if there is a better way!

Answer

Arnon Rotem-Gal-Oz picture Arnon Rotem-Gal-Oz · Mar 28, 2012

COUNT is part of pig see the manual

LOGS= LOAD 'log';
LOGS_GROUP= GROUP LOGS ALL;
LOG_COUNT = FOREACH LOGS_GROUP GENERATE COUNT(LOGS);