I have a pig job where in I need to filter the data by finding a word in it,
Here is the snippet
A = LOAD '/home/user/filename' USING PigStorage(',');
B = FOREACH A GENERATE $27,$38;
C = FILTER B BY ( $1 == '*Word*');
STORE C INTO '/home/user/out1' USING PigStorage();
The error is in the 3rd line while finding C, I have also tried using
C = FILTER B BY $1 MATCHES '*WORD*'
Also
C = FILTER B BY $1 MATCHES '\\w+WORD\\w+'
MATCHES
uses regular expressions. You should do ... MATCHES '.*WORD.*'
instead.
These is an example here finding the word 'apache'.