I want to use matplotlib.bblpath or shapely.geometry libraries in pyspark.
When I try to import any of them I get the below error:
>>> from shapely.geometry import polygon
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named shapely.geometry
I know the module isn't present, but how can these packages be brought to my pyspark libraries?
In the Spark context try using:
SparkContext.addPyFile("module.py") # also .zip
, quoting from the docs:
Add a .py or .zip dependency for all tasks to be executed on this SparkContext in the future. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI.