select count distinct using pig latin

jdamae picture jdamae · Feb 12, 2012 · Viewed 29.3k times · Source

I need help with this pig script. I am just getting a single record. I am selecting 2 columns and doing a count(distinct) on another while also using a where like clause to find a particular description (desc).

Here's my sql with pig I am trying to code.

 /*
    For example in sql:
    select domain, count(distinct(segment)) as segment_cnt
    from table
    where desc='ABC123'
    group by domain
    order by segment_count desc;
    */

    A = LOAD 'myoutputfile' USING PigStorage('\u0005')
            AS (
                domain:chararray,
                segment:chararray,
                desc:chararray
                );
B = filter A by (desc=='ABC123');
C = foreach B generate domain, segment;
D = DISTINCT C;
E = group D all;
F = foreach E generate group, COUNT(D) as segment_cnt;
G = order F by segment_cnt DESC;

Answer

Romain picture Romain · Feb 12, 2012

You could GROUP on each domain and then count the number of distinct elements in each group with a nested FOREACH syntax:

D = group C by domain;
E = foreach D { 
    unique_segments = DISTINCT C.segment;
    generate group, COUNT(unique_segments) as segment_cnt;
};