I have followed some tutorial online but they do not work with Spark 1.5.1
on OS X El Capitan (10.11)
Basically I have run this commands download apache-spark
brew update
brew install scala
brew install apache-spark
updated the .bash_profile
# For a ipython notebook and pyspark integration
if which pyspark > /dev/null; then
export SPARK_HOME="/usr/local/Cellar/apache-spark/1.5.1/libexec/"
export PYSPARK_SUBMIT_ARGS="--master local[2]"
fi
run
ipython profile create pyspark
created a startup file ~/.ipython/profile_pyspark/startup/00-pyspark-setup.py
configured in this way
# Configure the necessary Spark environment
import os
import sys
# Spark home
spark_home = os.environ.get("SPARK_HOME")
# If Spark V1.4.x is detected, then add ' pyspark-shell' to
# the end of the 'PYSPARK_SUBMIT_ARGS' environment variable
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 1.4" in open(spark_release_file).read():
pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
# Add the spark python sub-directory to the path
sys.path.insert(0, spark_home + "/python")
# Add the py4j to the path.
# You may need to change the version number to match your install
sys.path.insert(0, os.path.join(spark_home, "python/lib/py4j-0.8.2.1-src.zip"))
# Initialize PySpark to predefine the SparkContext variable 'sc'
execfile(os.path.join(spark_home, "python/pyspark/shell.py"))
I then run ipython notebook --profile=pyspark
and the notebook works fine, but the sc
(spark context) is not recognised.
Anyone managed to do this with Spark 1.5.1
?
EDIT: you can follow this guide to have it working
I have Jupyter installed, and indeed It is simpler than you think:
Install jupyter typing the next line in your terminal Click me for more info.
ilovejobs@mymac:~$ conda install jupyter
Update jupyter just in case.
ilovejobs@mymac:~$ conda update jupyter
Download Apache Spark and compile it, or download and uncompress Apache Spark 1.5.1 + Hadoop 2.6.
ilovejobs@mymac:~$ cd Downloads
ilovejobs@mymac:~/Downloads$ wget http://www.apache.org/dyn/closer.lua/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgz
Create an Apps
folder on your home (i.e):
ilovejobs@mymac:~/Downloads$ mkdir ~/Apps
Move the uncompressed folder spark-1.5.1
to the ~/Apps
directory.
ilovejobs@mymac:~/Downloads$ mv spark-1.5.1/ ~/Apps
Move to the ~/Apps
directory and verify that spark is there.
ilovejobs@mymac:~/Downloads$ cd ~/Apps
ilovejobs@mymac:~/Apps$ ls -l
drwxr-xr-x ?? ilovejobs ilovejobs 4096 ?? ?? ??:?? spark-1.5.1
Here is the first tricky part. Add the spark binaries to your $PATH
:
ilovejobs@mymac:~/Apps$ cd
ilovejobs@mymac:~$ echo "export $HOME/apps/spark/bin:$PATH" >> .profile
Here is the second tricky part. Add this environment variables also:
ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON=ipython" >> .profile
ilovejobs@mymac:~$ echo "export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark" >> .profile
Source the profile to make these variables available for this terminal
ilovejobs@mymac:~$ source .profile
Create a ~/notebooks
directory.
ilovejobs@mymac:~$ mkdir notebooks
Move to ~/notebooks
and run pyspark:
ilovejobs@mymac:~$ cd notebooks
ilovejobs@mymac:~/notebooks$ pyspark
Notice that you can add those variables to the .bashrc
located in your home.
Now be happy, You should be able to run jupyter with a pyspark kernel (It will show it as a python 2 but it will use spark)