Reading csv files in zeppelin using spark-csv

fabsta picture fabsta · Oct 6, 2015 · Viewed 18.9k times · Source

I wanna read csv files in Zeppelin and would like to use databricks' spark-csv package: https://github.com/databricks/spark-csv

In the spark-shell, I can use spark-csv with

spark-shell --packages com.databricks:spark-csv_2.11:1.2.0

But how do I tell Zeppelin to use that package?

Thanks in advance!

Answer

Simon Elliston Ball picture Simon Elliston Ball · Jan 8, 2016

You need to add the Spark Packages repository to Zeppelin before you can use %dep on spark packages.

%dep
z.reset()
z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")
z.load("com.databricks:spark-csv_2.10:1.2.0")

Alternatively, if this is something you want available in all your notebooks, you can add the --packages option to the spark-submit command setting in the interpreters config in Zeppelin, and then restart the interpreter. This should start a context with the package already loaded as per the spark-shell method.