I am new in hadoop and i am not yet familiar to its configuration.
I just want to ask the maximum container per node.
I am using a single node cluster (6GB ram laptop)
and below is my mapred and yarn configuration:
**mapred-site.xml**
map-mb : 4096 opts:-Xmx3072m
reduce-mb : 8192 opts:-Xmx6144m
**yarn-site.xml**
resource memory-mb : 40GB
min allocation-mb : 1GB
The above setup can only run 4 to 5 jobs. and max of 8 container.
Maximum containers that run on a single NodeManager (hadoop worker) depends on lot of factors like how much memory is assigned for the NodeManager to use and also depends on application specific requirements.
The defaults for yarn.scheduler.*-allocation-*
are: 1GB (minimum allocation), 8GB (maximum allocation), 1 core and 32 cores. So, minimum and maximum allocation, affects number of containers per node.
So, if you have 6GB RAM and 4 virtual cores, here is how the YARN configuration should look like:
yarn.scheduler.minimum-allocation-mb: 128
yarn.scheduler.maximum-allocation-mb: 2048
yarn.scheduler.minimum-allocation-vcores: 1
yarn.scheduler.maximum-allocation-vcores: 2
yarn.nodemanager.resource.memory-mb: 4096
yarn.nodemanager.resource.cpu-vcores: 4
The above configuration tells hadoop to use atmost 4GB and 4 virtual cores and that each container can have between 128 MB and 2 GB of memory and between 1 and 2 virtual cores, with these settings you could run upto 2 containers with maximum resources at a time.
Now, for MapReduce specific configuration:
yarn.app.mapreduce.am.resource.mb: 1024
yarn.app.mapreduce.am.command-opts: -Xmx768m
mapreduce.[map|reduce].cpu.vcores: 1
mapreduce.[map|reduce].memory.mb: 1024
mapreduce.[map|reduce].java.opts: -Xmx768m
With this configuration, you could theoretically have up to 4 mappers/reducers running simultaneously in 4 1GB containers. In practice, the MapReduce application master will use a 1GB container so the actual number of concurrent mappers and reducers will be limited to 3. You can play around with the memory limits but it might require some experimentation to find the best ones.
As a rule of thumb, you should limit the heap-size to about 75% of the total memory available to ensure things run more smoothly.
You could also set memory per container using yarn.scheduler.minimum-allocation-mb
property.
For more detail configuration for production systems use this document from hortonworks as a reference.