I need to set a custom environment variable in EMR to be available when running a spark application.
I have tried adding this:
...
--configurations '[
{
"Classification": "spark-env",
"Configurations": [
{
"Classification": "export",
"Configurations": [],
"Properties": { "SOME-ENV-VAR": "qa1" }
}
],
"Properties": {}
}
]'
...
and also tried to replace "spark-env with hadoop-env
but nothing seems to work.
There is this answer from the aws forums. but I can't figure out how to apply it.
I'm running on EMR 5.3.1 and launch it with a preconfigured step from the cli: aws emr create-cluster...
Add the custom configurations like below JSON to a file say, custom_config.json
[
{
"Classification": "spark-env",
"Properties": {},
"Configurations": [
{
"Classification": "export",
"Properties": {
"VARIABLE_NAME": VARIABLE_VALUE,
}
}
]
}
]
And, On creating the emr cluster, pass the file reference to the --configurations
option
aws emr create-cluster --configurations file://custom_config.json --other-options...