How can I use the map datatype in Apache Pig?

1frustratedpiggy picture 1frustratedpiggy · Nov 1, 2010 · Viewed 27.6k times · Source

I'd like to use Apache Pig to build a large key -> value mapping, look things up in the map, and iterate over the keys. However, there does not even seem to be syntax for doing these things; I've checked the manual, wiki, sample code, Elephant book, Google, and even tried parsing the parser source. Every single example loads map literals from a file... and then never uses them. How can you use Pig's maps?

First, there doesn't seem to be a way to load a 2-column CSV file into a map directly. If I have a simple map.csv:

1,2
3,4
5,6

And I try to load it as a map:

m = load 'map.csv' using PigStorage(',') as (M: []);
dump m;

I get three empty tuples:

()
()
()

So I try to load tuples and then generate the map:

m = load 'map.csv' using PigStorage(',') as (key:chararray, val:chararray);
b = foreach m generate [key#val];
ERROR 1000: Error during parsing. Encountered " "[" "[ "" at line 1, column 24.
...

Many variations on the syntax also fail (e.g., generate [$0#$1]).

OK, so I munge my map into Pig's map literal format as map.pig:

[1#2]
[3#4]
[5#6]

And load it up:

m = load 'map.pig' as (M: []);

Now let's load up some keys and try lookups:

k = load 'keys.csv' as (key);
dump k;
3
5
1

c = foreach k generate m#key;  /* Or m[key], or... what? */
ERROR 1000: Error during parsing.  Invalid alias: m in {M: map[ ]}

Hrm, OK, maybe since there are two relations involved, we need a join:

c = join k by key, m by /* ...um, what? */ $0;
dump c;
ERROR 1068: Using Map as key not supported.
c = join k by key, m by m#key;
dump c;
Error 1000: Error during parsing. Invalid alias: m in {M: map[ ]}

Fail. How do I refer to the key (or value) of a map? The map schema syntax doesn't seem to let you even name the key and value (the mailing list says there's no way to assign types).

Finally, I'd just like to be able to find all they keys in my map:

d = foreach m generate ...oh, forget it.

Is Pig's map type half-baked? What am I missing?

Answer

jayadev picture jayadev · Nov 23, 2011

Currently pig maps need the key to a chararray (string) that you supply and not a variable which contains a string. so in map#key the key has to be constant string that you supply (eg: map#'keyvalue').

The typical use case of this is to load a complex data structure one of the element being a key value pair and later in a foreach statement you can refer to a particular value based on the key you are interested in.

http://pig.apache.org/docs/r0.9.1/basic.html#map-schema