How to change yarn scheduler configuration on aws EMR?

Kumar Vaibhav picture Kumar Vaibhav · Apr 14, 2017 · Viewed 14.9k times · Source

Unlike HortonWorks or Cloudera, AWS EMR does not seem to give any GUI to change xml configurations of various hadoop ecosystem frameworks.

Logging into my EMR namenode and doing a quick

find \ -iname yarn-site.xml

I was able to find it to be located at /etc/hadoop/conf.empty/yarn-site.xml and capacity-scheduler to be located at /etc/hadoop/conf.empty/capacity-scheduler.xml.

But note how these are under conf.empty and I suspect these might not be the actual locations for yarn-site and capacity-scheduler xmls.

I understand that I can change these configurations while making a cluster but what I need to know is how to be able to change them without tearing apart the cluster.

I just want to play around scheduling properties and such and try out different schedulers to identify what might work will with my spark applications.

Thanks in advance!

Answer

jc mannem picture jc mannem · Apr 17, 2017

Well, the yarn-site.xml and capacity-scheduler.xml are indeed under correct locations (/etc/hadoop/conf.empty/) and on running cluster , editing them on master node and restarting YARN RM Daemon will change the scheduler.

When spinning up a new cluster , you can use EMR Configurations API to change appropriate values. http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

For example : Specify appropriate values in capacity-scheduler and yarn-site classifications on your Configuration for EMR to change those values in corresponding XML files.

Edit: Sep 4, 2019 : With Amazon EMR version 5.21.0 and later, you can override cluster configurations and specify additional configuration classifications for each instance group in a running cluster. You do this by using the Amazon EMR console, the AWS Command Line Interface (AWS CLI), or the AWS SDK.

Please see https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps-running-cluster.html