I am new to PIG and want to calculate Average of my one column data that looks like
0
10.1
20.1
30
40
50
60
70
80.1
I wrote this pig script
dividends = load 'myfile.txt' as (A);
dump dividends
grouped = group dividends by A;
avg = foreach grouped generate AVG(grouped.A);
dump avg
It parses data as
(0)
(10.1)
(20.1)
(30)
(40)
(50)
(60)
(70)
(80.1)
but gives this error for average
2013-03-04 15:10:58,289 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<file try.pig, line 4, column 41> Invalid scalar projection: grouped
Details at logfile: /Users/PreetiGupta/Documents/CMPS290S/project/pig_1362438645642.log
ANY IDEA
The AVG
built in function takes a bag as an input. In your group
statement, you are currently grouping elements by the value of A
, but what you really want to do is group all the elements into one bag.
Pig's GROUP ALL
is what you want to use:
dividends = load 'myfile.txt' as (A);
dump dividends
grouped = group dividends all;
avg = foreach grouped generate AVG(dividends.A);
dump avg