I'm using spark 1.4.0-rc2 so I can use python 3 with spark. If I add export PYSPARK_PYTHON=python3
to my .bashrc file, I can run spark interactively with python 3. However, if I want to run a standalone program in local mode, I get an error:
Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions
How can I specify the version of python for the driver? Setting export PYSPARK_DRIVER_PYTHON=python3
didn't work.
Setting PYSPARK_PYTHON=python3
both to python3 works for me. I did this using export in my .bashrc. In the end, these are the variables I create:
export SPARK_HOME="$HOME/Downloads/spark-1.4.0-bin-hadoop2.4"
export IPYTHON=1
export PYSPARK_PYTHON=/usr/bin/python3
I also followed this tutorial to make it work from within Ipython3 notebook: http://ramhiser.com/2015/02/01/configuring-ipython-notebook-support-for-pyspark/