I'm working on CentOS, I've setup $SPARK_HOME
and also added path to bin
in $PATH
.
I can run pyspark
from anywhere.
But when I try to create python
file and uses this statement;
from pyspark import SparkConf, SparkContext
it throws following error
python pysparktask.py
Traceback (most recent call last):
File "pysparktask.py", line 1, in <module>
from pyspark import SparkConf, SparkContext
ModuleNotFoundError: No module named 'pyspark'
I tried to install it again using pip
.
pip install pyspark
and it gives this error too.
Could not find a version that satisfies the requirement pyspark (from versions: ) No matching distribution found for pyspark
EDIT
based on answer, I updated the code.
error is
Traceback (most recent call last):
File "pysparktask.py", line 6, in <module>
from pyspark import SparkConf, SparkContext
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/__init__.py", line 44, in <module>
from pyspark.context import SparkContext
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/context.py", line 33, in <module>
from pyspark.java_gateway import launch_gateway
File "/opt/mapr/spark/spark-2.0.1/python/pyspark/java_gateway.py", line 31, in <module>
from py4j.java_gateway import java_import, JavaGateway, GatewayClient
ModuleNotFoundError: No module named 'py4j'
Add the following environment variable and also append spark's lib path to sys.path
import os
import sys
os.environ['SPARK_HOME'] = "/usr/lib/spark/"
sys.path.append("/usr/lib/spark/python/")
from pyspark import SparkConf, SparkContext # And then try to import SparkContext.