I have setup a 10 node HDP platform on AWS. Below is my configuration 2 Servers - Name Node and Standby Name node 7 Data Nodes and each node has 40 vCPU and 160 GB of memory.
I am trying to calculate the number of executors while submitting spark applications and after going through different blogs I am confused on what this parameter actually means.
Looking at the below blog it seems the num executors are the total number of executors across all nodes http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
But looking at the below blog it seems that the num executors is per node or server https://blogs.aws.amazon.com/bigdata/post/Tx578UTQUV7LRP/Submitting-User-Applications-with-spark-submit
Can anyone please clarify and review the below :-
Is the num-executors value is per node or the total number of executors across all the data nodes.
I am using the below calculation to come up with the core count, executor count and memory per executor
Number of cores <= 5 (assuming 5) Num executors = (40-1)/5 = 7 Memory = (160-1)/7 = 22 GB
With the above calculation which would be the correct way
--master yarn-client --driver-memory 10G --executor-memory 22G --num-executors 7 --executor-cores 5
OR
--master yarn-client --driver-memory 10G --executor-memory 22G --num-executors 49 --executor-cores 5
Thanks, Jayadeep
Can anyone please clarify and review the below :-
- Is the num-executors value is per node or the total number of executors across all the data nodes.
You need to first understand that the executors run on the NodeManagers (You can think of this like workers in Spark standalone). A number of Containers (includes vCPU, memory, network, disk, etc.) equal to number of executors specified will be allocated for your Spark application on YARN. Now these executor containers will be run on multiple NodeManagers and that depends on the CapacityScheduler (default scheduler in HDP).
So to sum up, total number of executors is the number of resource containers you specify for your application to run.
Refer this blog to understand better.
- I am using the below calculation to come up with the core count, executor count and memory per executor
Number of cores <= 5 (assuming 5) Num executors = (40-1)/5 = 7 Memory = (160-1)/7 = 22 GB
There is no rigid formula for calculating the number of executors. Instead you can try enabling Dynamic Allocation in YARN for your application.