No matter how much I tinker with the settings in yarn-site.xml
i.e using all of the below options
yarn.scheduler.minimum-allocation-vcores
yarn.nodemanager.resource.memory-mb
yarn.nodemanager.resource.cpu-vcores
yarn.scheduler.maximum-allocation-mb
yarn.scheduler.maximum-allocation-vcores
i just still cannot get my application i.e Spark to utilize all the cores on the cluster. The spark executors seem to be correctly taking up all the available memory, but each executor just keeps taking a single core and no more.
Here are the options configured in spark-defaults.conf
spark.executor.cores 3
spark.executor.memory 5100m
spark.yarn.executor.memoryOverhead 800
spark.driver.memory 2g
spark.yarn.driver.memoryOverhead 400
spark.executor.instances 28
spark.reducer.maxMbInFlight 120
spark.shuffle.file.buffer.kb 200
Notice that spark.executor.cores
is set to 3, but it doesn't work.
How do i fix this?
The problem lies not with yarn-site.xml
or spark-defaults.conf
but actually with the resource calculator that assigns the cores to the executors or in the case of MapReduce jobs, to the Mappers/Reducers.
The default resource calculator i.e org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
uses only memory information for allocating containers and CPU scheduling is not enabled by default. To use both memory as well as the CPU, the resource calculator needs to be changed to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
in the capacity-scheduler.xml
file.
Here's what needs to change.
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>