What is difference between yarn.scheduler.maximum-allocation-mb
and yarn.nodemanager.resource.memory-mb
?
I see both of these in yarn-site.xml
and I see the explanations here.
yarn.scheduler.maximum-allocation-mb
is given the following definition: The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this will throw a InvalidResourceRequestException. Does this mean memory requests ONLY on the resourcemanager are limited by this value?
And yarn.nodemanager.resource.memory-mb
is given definition of Amount of physical memory, in MB, that can be allocated for containers. Does this mean the total amount for all containers across the entire cluster, summed together?
HOwever, I still cannot discern between these. Those explanations make me think that they are the same.
Even more confusing, their default values are exactly the same: 8192 mb. How do I tell difference between these? Thank you.
Consider in a scenario where you are setting up a cluster where each machine having 48 GB of RAM. Some of this RAM should be reserved for Operating System and other installed applications.
yarn.nodemanager.resource.memory-mb:
Amount of physical memory, in MB, that can be allocated for containers. It means the amount of memory YARN can utilize on this node and therefore this property should be lower than the total memory of that machine.
<name>yarn.nodemanager.resource.memory-mb</name>
<value>40960</value> <!-- 40 GB -->
The next step is to provide YARN guidance on how to break up the total resources available into Containers. You do this by specifying the minimum unit of RAM to allocate for a Container.
In yarn-site.xml
<name>yarn.scheduler.minimum-allocation-mb</name> <!-- RAM-per-container ->
<value>2048</value>
yarn.scheduler.maximum-allocation-mb:
It defines the maximum memory allocation available for a container in MB
it means RM can only allocate memory to containers in increments of "yarn.scheduler.minimum-allocation-mb"
and not exceed "yarn.scheduler.maximum-allocation-mb"
and It should not be more then total allocated memory of the Node.
In yarn-site.xml
<name>yarn.scheduler.maximum-allocation-mb</name> <!-Max RAM-per-container->
<value>8192</value>
For MapReduce applications, YARN processes each map or reduce task in a container and on a single machine there can be number of containers.
We want to allow for a maximum of 20 containers on each node, and thus need (40 GB total RAM) / (20 # of containers) = 2 GB minimum per container controlled by property yarn.scheduler.minimum-allocation-mb
Again we want to restrict maximum memory utilization for a container controlled by property "yarn.scheduler.maximum-allocation-mb"
For example, if one job is asking for 2049 MB memory per map container(mapreduce.map.memory.mb=2048 set in mapred-site.xml
), RM will give it one 4096 MB(2*yarn.scheduler.minimum-allocation-mb
) container.
If you have a huge MR job which asks for a 9999 MB map container, the job will be killed with the error message.