I'm trying to create a SparkContext in an Intellij 14 Scala Worksheet.
here are my dependencies
name := "LearnSpark"
version := "1.0"
scalaVersion := "2.11.7"
// for working with Spark API
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.0"
Here is the code i run in the worksheet
import org.apache.spark.{SparkContext, SparkConf}
val conf = new SparkConf().setMaster("local").setAppName("spark-play")
val sc = new SparkContext(conf)
error
15/08/24 14:01:59 ERROR SparkContext: Error initializing SparkContext.
java.lang.ClassNotFoundException: rg.apache.spark.rpc.akka.AkkaRpcEnvFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
When I run Spark as standalone app it works fine. For example
import org.apache.spark.{SparkContext, SparkConf}
// stops verbose logs
import org.apache.log4j.{Level, Logger}
object TestMain {
Logger.getLogger("org").setLevel(Level.OFF)
def main(args: Array[String]): Unit = {
//Create SparkContext
val conf = new SparkConf()
.setMaster("local[2]")
.setAppName("mySparkApp")
.set("spark.executor.memory", "1g")
.set("spark.rdd.compress", "true")
.set("spark.storage.memoryFraction", "1")
val sc = new SparkContext(conf)
val data = sc.parallelize(1 to 10000000).collect().filter(_ < 1000)
data.foreach(println)
}
}
Can someone provide some guidance on where I should look to resolve this exception?
Thanks.
Since there still are quite some doubts if it is at all possible to run IntelliJ IDEA Scala Worksheet with Spark and this question is the most direct one, I wanted to share my screenshot and a cookbook style recipe to get Spark code evaluated in the Worksheet.
I am using Spark 2.1.0 with Scala Worksheet in IntelliJ IDEA (CE 2016.3.4).
The first step is to have build.sbt file when importing dependencies in IntelliJ. I have used the same simple.sbt from the Spark Quick Start:
name := "Simple Project"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"
The second step is to uncheck 'Run worksheet in the compiler process' checkbox in Settings -> Languages and Frameworks -> Scala -> Worksheet. I have also tested the other Worksheet settings and they had no effect on the warning about duplicate Spark context creation.
Here is the version of the code from SimpleApp.scala example in the same guide modified to work in the Worksheet. The master
and appName
parameters have to be set in the same Worksheet:
import org.apache.spark.{SparkConf, SparkContext}
val conf = new SparkConf()
conf.setMaster("local[*]")
conf.setAppName("Simple Application")
val sc = new SparkContext(conf)
val logFile = "/opt/spark-latest/README.md"
val logData = sc.textFile(logFile).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println(s"Lines with a: $numAs, Lines with b: $numBs")
Here is a screenshot of the functioning Scala Worksheet with Spark:
UPDATE for IntelliJ CE 2017.1 (Worksheet in REPL mode)
In 2017.1 Intellij introduced REPL mode for Worksheet. I have tested the same code with 'Use REPL' option checked. For this mode to run you need to leave the 'Run worksheet in the compiler process' checkbox in Worksheet Settings I have described above checked (it is by default).
The code runs fine in Worksheet REPL mode.