Exception in type casting Chararry to double in PIG

sudheer picture sudheer · Sep 26, 2013 · Viewed 11.8k times · Source

I have a sample input as tab separated key, value pair as follows

B_1001@2012-06-15   [email protected]
B_1001@2012-06-18   [email protected]
B_1002@2012-09-26   [email protected]
B_1002@2012-09-28   [email protected]

and I am loading this file into pig and doing the following

a = load '/home/HadoopUser/Desktop/a.txt' as (key:chararray, value:chararray);

describe a;
a: {key: chararray,value: chararray}

b = foreach a generate key, flatten(STRSPLIT(value,'@',2)) as (v1:double,v2:float);
describe b;
b: {key: chararray,v1: double,v2: float}

c = group b by key;
 describe c;
c: {group: chararray,b: {key: chararray,v1: double,v2: float}}

this works till here but when I use Arthematical calculations over b.v1 I am getting ClassCastException as java.lang.String can't be casted to java.lang.Double

but describe gives no error

d = foreach c generate group,SUM(b.v1);
describe d;
d: {group: chararray,double}

when I dump d; it id giving the exception

I even tried typecasting 'b' as well

b = foreach a generate key, (tuple (double,double))STRSPLIT(value,'@',2); 

now when I describe b; Its giving an error as Cannot cast tuple with schema tuple to tuple with schema tuple({double,double})

Please help me to know why is it coming like this even describe shows correct schema.

Answer

mr2ert picture mr2ert · Sep 26, 2013

I have experienced this issue before as well. I can't find the bug tracker link for it right now, but when you set the type/'cast' with a statement like B = FOREACH A GENERATE key AS key: chararray it will not actually cast the type (but it will change the output of DESCRIBE). You are right that you'll have to do an explicit cast, and the docs say that you can cast a chararray to a double. Try something like:

b1 = FOREACH b GENERATE key, (double)v1, (float)v2 ;

Update: Here is the link to the bug: https://issues.apache.org/jira/browse/PIG-2315