I'm unable to run below import in Jupyter notebook.
findspark.init('home/ubuntu/spark-3.0.0-bin-hadoop3.2')
Getting this following error:
---------------------------------------------------------------------------
~/.local/lib/python3.6/site-packages/findspark.py in init(spark_home, python_path, edit_rc, edit_profile)
144 except IndexError:
145 raise Exception(
--> 146 "Unable to find py4j, your SPARK_HOME may not be configured correctly"
147 )
148 sys.path[:0] = [spark_python, py4j]
Exception: Unable to find py4j, your SPARK_HOME may not be configured correctly
I do have py4j installed and also tried to add these below lines into ~/.bashrc
export SPARK_HOME=/home/ubuntu/spark-3.0.0-bin-hadoop3.2
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH
Check if the spark version you installed is the same that you declare under SPARK_HOME name
For example (in Google Colab), I've installed:
!wget -q https://downloads.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz
and then I declare:
os.environ["SPARK_HOME"] = "/content/spark-3.0.1-bin-hadoop3.2"
Look that spark-3.0.1-bin-hadoop3.2 must be same in both places