Calculate Average using PIG

user1792899 picture user1792899 · Mar 5, 2013 · Viewed 18k times · Source

I am new to PIG and want to calculate Average of my one column data that looks like

0
10.1
20.1
30
40
50
60
70
80.1

I wrote this pig script

dividends = load 'myfile.txt' as (A);
dump dividends
grouped   = group dividends by A;
avg       = foreach grouped generate AVG(grouped.A);
dump avg

It parses data as

(0)
(10.1)
(20.1)
(30)
(40)
(50)
(60)
(70)
(80.1)

but gives this error for average

2013-03-04 15:10:58,289 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: 
<file try.pig, line 4, column 41> Invalid scalar projection: grouped
Details at logfile: /Users/PreetiGupta/Documents/CMPS290S/project/pig_1362438645642.log

ANY IDEA

Answer

cyang picture cyang · Mar 5, 2013

The AVG built in function takes a bag as an input. In your group statement, you are currently grouping elements by the value of A, but what you really want to do is group all the elements into one bag.

Pig's GROUP ALL is what you want to use:

dividends = load 'myfile.txt' as (A);
dump dividends
grouped   = group dividends all;
avg       = foreach grouped generate AVG(dividends.A);
dump avg