Set hadoop configuration values on spark-submit command line

StephenBoesch picture StephenBoesch · Mar 14, 2017 · Viewed 10.8k times · Source

We want to set the aws parameters that from code would be done via the SparkContext:

sc.hadoopConfiguration.set("fs.s3a.access.key", vault.user)
sc.hadoopConfiguration.set("fs.s3a.secret.key", vault.key)

However we have a custom Spark launcher framework that requires all the custom Spark configurations to be done via --conf parameters to the spark-submit command line.

Is there a way to "notify" the SparkContext to set --conf values to the hadoopConfiguration and not to its general SparkConf ? Looking for something along the lines of

spark-submit --conf hadoop.fs.s3a.access.key $vault.user --conf hadoop.fs.s3a.access.key $vault.key

or

spark-submit --conf hadoopConfiguration.fs.s3a.access.key $vault.user --conf hadoopConfiguration.fs.s3a.access.key $vault.key

Answer

vanza picture vanza · Mar 14, 2017

You need to prefix Hadoop configs with spark.hadoop. in the command line (or SparkConf object). For example:

spark.hadoop.fs.s3a.access.key=value