I have a Hadoop cluster with 5 nodes, each of which has 12 cores with 32GB memory. I use YARN as MapReduce framework, so I have the following settings with YARN:
Then the cluster metrics shown on my YARN cluster page (http://myhost:8088/cluster/apps) displayed that VCores Total is 40. This is pretty fine!
Then I installed Spark on top of it and use spark-shell in yarn-client mode.
I ran one Spark job with the following configuration:
I set --executor-cores as 10, --num-executors as 4, so logically, there should be totally 40 Vcores Used. However, when I check the same YARN cluster page after the Spark job started running, there are only 4 Vcores Used, and 4 Vcores Total
I also found that there is a parameter in capacity-scheduler.xml
- called yarn.scheduler.capacity.resource-calculator
:
"The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. DefaultResourceCalculator only uses Memory while DominantResourceCalculator uses dominant-resource to compare multi-dimensional resources such as Memory, CPU etc."
I then changed that value to DominantResourceCalculator
.
But then when I restarted YARN and run the same Spark application, I still got the same result, say the cluster metrics still told that VCores used is 4! I also checked the CPU and memory usage on each node with htop command, I found that none of the nodes had all 10 CPU cores fully occupied. What can be the reason?
I tried also to run the same Spark job in fine-grained way, say with --num executors 40 --executor-cores 1
, in this ways I checked again the CPU status on each worker node, and all CPU cores are fully occupied.
I was wondering the same but changing the resource-calculator worked for me.
This is how I set the property:
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>
Check in the YARN UI in the application how many containers and vcores are assigned, with the change the number of containers should be executors+1 and the vcores should be: (executor-cores*num-executors) +1.