I am configuring an Apache Spark cluster.
When I run the cluster with 1 master and 3 slaves, I see this on the master monitor page:
Memory
2.0 GB (512.0 MB Used)
2.0 GB (512.0 MB Used)
6.0 GB (512.0 MB Used)
I want to increase the used memory for the workers but I could not find the right config for this. I have changed spark-env.sh as below:
export SPARK_WORKER_MEMORY=6g
export SPARK_MEM=6g
export SPARK_DAEMON_MEMORY=6g
export SPARK_JAVA_OPTS="-Dspark.executor.memory=6g"
export JAVA_OPTS="-Xms6G -Xmx6G"
But the used memory is still the same. What should I do to change used memory?
When using 1.0.0+ and using spark-shell or spark-submit, use the --executor-memory
option. E.g.
spark-shell --executor-memory 8G ...
0.9.0 and under:
When you start a job or start the shell change the memory. We had to modify the spark-shell script so that it would carry command line arguments through as arguments for the underlying java application. In particular:
OPTIONS="$@"
...
$FWDIR/bin/spark-class $OPTIONS org.apache.spark.repl.Main "$@"
Then we can run our spark shell as follows:
spark-shell -Dspark.executor.memory=6g
When configuring it for a standalone jar, I set the system property programmatically before creating the spark context and pass the value in as a command line argument (I can make it shorter than the long winded system props then).
System.setProperty("spark.executor.memory", valueFromCommandLine)
As for changing the default cluster wide, sorry, not entirely sure how to do it properly.
One final point - I'm a little worried by the fact you have 2 nodes with 2GB and one with 6GB. The memory you can use will be limited to the smallest node - so here 2GB.