Unable to import SparkContext

Mubin picture Mubin · Mar 30, 2017 · Viewed 8k times · Source

I'm working on CentOS, I've setup $SPARK_HOME and also added path to bin in $PATH.

I can run pyspark from anywhere.

But when I try to create python file and uses this statement;

from pyspark import SparkConf, SparkContext

it throws following error

python pysparktask.py
    Traceback (most recent call last):
    File "pysparktask.py", line 1, in <module>
      from pyspark import SparkConf, SparkContext
    ModuleNotFoundError: No module named 'pyspark'

I tried to install it again using pip.

pip install pyspark

and it gives this error too.

Could not find a version that satisfies the requirement pyspark (from versions: ) No matching distribution found for pyspark

EDIT

based on answer, I updated the code.

error is

Traceback (most recent call last):
  File "pysparktask.py", line 6, in <module>
    from pyspark import SparkConf, SparkContext
  File "/opt/mapr/spark/spark-2.0.1/python/pyspark/__init__.py", line 44, in <module>
    from pyspark.context import SparkContext
  File "/opt/mapr/spark/spark-2.0.1/python/pyspark/context.py", line 33, in <module>
    from pyspark.java_gateway import launch_gateway
  File "/opt/mapr/spark/spark-2.0.1/python/pyspark/java_gateway.py", line 31, in <module>
    from py4j.java_gateway import java_import, JavaGateway, GatewayClient
ModuleNotFoundError: No module named 'py4j'

Answer

Afaq picture Afaq · Mar 30, 2017

Add the following environment variable and also append spark's lib path to sys.path

import os
import sys

os.environ['SPARK_HOME'] = "/usr/lib/spark/"
sys.path.append("/usr/lib/spark/python/")

from pyspark import SparkConf, SparkContext # And then try to import SparkContext.