Boosting spark.yarn.executor.memoryOverhead

masta-g3 picture masta-g3 · Jun 29, 2016 · Viewed 21k times · Source

I'm trying to run a (py)Spark job on EMR that will process a large amount of data. Currently my job is failing with the following error message:

Reason: Container killed by YARN for exceeding memory limits.
5.5 GB of 5.5 GB physical memory used.
Consider boosting spark.yarn.executor.memoryOverhead.

So I google'd how to do this, and found that I should pass along the spark.yarn.executor.memoryOverhead parameter with the --conf flag. I'm doing it this way:

aws emr add-steps\
--cluster-id %s\
--profile EMR\
--region us-west-2\
--steps Name=Spark,Jar=command-runner.jar,\
Args=[\
/usr/lib/spark/bin/spark-submit,\
--deploy-mode,client,\
/home/hadoop/%s,\
--executor-memory,100g,\
--num-executors,3,\
--total-executor-cores,1,\
--conf,'spark.python.worker.memory=1200m',\
--conf,'spark.yarn.executor.memoryOverhead=15300',\
],ActionOnFailure=CONTINUE" % (cluster_id,script_name)\

But when I rerun the job it keeps giving me the same error message, with the 5.5 GB of 5.5 GB physical memory used, which implies that my memory did not increase.. any hints on what I am doing wrong?

EDIT

Here are details on how I initially create the cluster:

aws emr create-cluster\
--name "Spark"\
--release-label emr-4.7.0\
--applications Name=Spark\
--bootstrap-action Path=s3://emr-code-matgreen/bootstraps/install_python_modules.sh\
--ec2-attributes KeyName=EMR2,InstanceProfile=EMR_EC2_DefaultRole\
--log-uri s3://emr-logs-zerex\
--instance-type r3.xlarge\
--instance-count 4\
--profile EMR\
--service-role EMR_DefaultRole\
--region us-west-2'

Thanks.

Answer

masta-g3 picture masta-g3 · Jun 29, 2016

After a couple of hours I found the solution to this problem. When creating the cluster, I needed to pass on the following flag as a parameter:

--configurations file://./sparkConfig.json\

With the JSON file containing:

[
    {
      "Classification": "spark-defaults",
      "Properties": {
        "spark.executor.memory": "10G"
      }
    }
  ]

This allows me to increase the memoryOverhead in the next step by using the parameter I initially posted.