What is the relation between 'mapreduce.map.memory.mb' and 'mapred.map.child.java.opts' in Apache Hadoop YARN?

yedapoda picture yedapoda · Jun 5, 2014 · Viewed 56.5k times · Source

I would like to know the relation between the mapreduce.map.memory.mb and mapred.map.child.java.opts parameters.

Is mapreduce.map.memory.mb > mapred.map.child.java.opts?

Answer

user1234883 picture user1234883 · Sep 20, 2014

mapreduce.map.memory.mb is the upper memory limit that Hadoop allows to be allocated to a mapper, in megabytes. The default is 512. If this limit is exceeded, Hadoop will kill the mapper with an error like this:

Container[pid=container_1406552545451_0009_01_000002,containerID=container_234132_0001_01_000001] is running beyond physical memory limits. Current usage: 569.1 MB of 512 MB physical memory used; 970.1 MB of 1.0 GB virtual memory used. Killing container.

Hadoop mapper is a java process and each Java process has its own heap memory maximum allocation settings configured via mapred.map.child.java.opts (or mapreduce.map.java.opts in Hadoop 2+). If the mapper process runs out of heap memory, the mapper throws a java out of memory exceptions:

Error: java.lang.RuntimeException: java.lang.OutOfMemoryError

Thus, the Hadoop and the Java settings are related. The Hadoop setting is more of a resource enforcement/controlling one and the Java is more of a resource configuration one.

The Java heap settings should be smaller than the Hadoop container memory limit because we need reserve memory for Java code. Usually, it is recommended to reserve 20% memory for code. So if settings are correct, Java-based Hadoop tasks should never get killed by Hadoop so you should never see the "Killing container" error like above.

If you experience Java out of memory errors, you have to increase both memory settings.