Spark resources not fully allocated on Amazon EMR

Michel Lemay picture Michel Lemay · Jun 8, 2015 · Viewed 13.6k times · Source

I'm trying to maximize cluster usage for a simple task.

Cluster is 1+2 x m3.xlarge, runnning Spark 1.3.1, Hadoop 2.4, Amazon AMI 3.7

The task reads all lines of a text file and parse them as csv.

When I spark-submit a task as a yarn-cluster mode, I get one of the following result:

  • 0 executor: job waits infinitely until I manually kill it
  • 1 executor: job under utilize resources with only 1 machine working
  • OOM when I do not assign enough memory on the driver

What I would have expected:

  • Spark driver run on cluster master with all memory available, plus 2 executors with 9404MB each (as defined by install-spark script).

Sometimes, when I get a "successful" execution with 1 executor, cloning and restarting the step ends up with 0 executor.

I created my cluster using this command:

aws emr --region us-east-1 create-cluster --name "Spark Test"
--ec2-attributes KeyName=mykey 
--ami-version 3.7.0 
--use-default-roles 
--instance-type m3.xlarge 
--instance-count 3 
--log-uri s3://mybucket/logs/ 
--bootstrap-actions Path=s3://support.elasticmapreduce/spark/install-spark,Args=["-x"] 
--steps Name=Sample,Jar=s3://elasticmapreduce/libs/script-runner/script-runner.jar,Args=[/home/hadoop/spark/bin/spark-submit,--master,yarn,--deploy-mode,cluster,--class,my.sample.spark.Sample,s3://mybucket/test/sample_2.10-1.0.0-SNAPSHOT-shaded.jar,s3://mybucket/data/],ActionOnFailure=CONTINUE

With some step variations including:

--driver-memory 8G --driver-cores 4 --num-executors 2


install-spark script with -x produces the following spark-defaults.conf:

$ cat spark-defaults.conf
spark.eventLog.enabled  false
spark.executor.extraJavaOptions         -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70
spark.driver.extraJavaOptions         -Dspark.driver.log.level=INFO
spark.executor.instances        2
spark.executor.cores    4
spark.executor.memory   9404M
spark.default.parallelism       8

Update 1

I get the same behavior with a generic JavaWordCount example:

/home/hadoop/spark/bin/spark-submit --verbose --master yarn --deploy-mode cluster --driver-memory 8G --class org.apache.spark.examples.JavaWordCount /home/hadoop/spark/lib/spark-examples-1.3.1-hadoop2.4.0.jar s3://mybucket/data/

However, if I remove the '--driver-memory 8G', the task gets assigned 2 executors and finishes correctly.

So, what's the matter with driver-memory preventing my task to get executors?

Should the driver be executed on the cluster's master node alongside with Yarn master container as explained here?

How do I give more memory to my spark job driver? (Where collects and some other useful operations arise)

Answer

Michel Lemay picture Michel Lemay · Jun 9, 2015

The solution to maximize cluster usage is to forget about the '-x' parameter when installing spark on EMR and to adjust executors memory and cores by hand.

This post gives a pretty good explanation of how resources allocation is done when running Spark on YARN.

One important thing to remember is that all executors must have the same resources allocated! As we speak, Spark does not support heterogeneous executors. (Some work is currently being made to support GPUs but it's another topic)

So in order to get maximum memory allocated to the driver while maximizing memory to the executors, I should split my nodes like this (this slideshare gives good screenshots at page 25):

  • Node 0 - Master (Yarn resource manager)
  • Node 1 - NodeManager(Container(Driver) + Container(Executor))
  • Node 2 - NodeManager(Container(Executor) + Container(Executor))

NOTE: Another option would be to spark-submit with --master yarn --deploy-mode client from the master node 0. Are there any counter example this is a bad idea?

In my example, I can have at most have 3 executors of 2 vcores with 4736 MB each + a driver with same specs.

4736 memory is derived from the value of yarn.nodemanager.resource.memory-mb defined in /home/hadoop/conf/yarn-site.xml. On a m3.xlarge, it is set to 11520 mb (see here for all values associated to each instance types)

Then, we get:

(11520 - 1024) / 2 (executors per nodes) = 5248 => 5120 (rounded down to 256 mb increment as defined in yarn.scheduler.minimum-allocation-mb)

7% * 5120 = 367 rounded up to 384 (memory overhead) will become 10% in spark 1.4

5120 - 384 = 4736

Other interesting links: