How to work efficiently with SBT, Spark and "provided" dependencies?

Alexis Seigneurin picture Alexis Seigneurin · Apr 5, 2016 · Viewed 17.4k times · Source

I'm building an Apache Spark application in Scala and I'm using SBT to build it. Here is the thing:

  1. when I'm developing under IntelliJ IDEA, I want Spark dependencies to be included in the classpath (I'm launching a regular application with a main class)
  2. when I package the application (thanks to the sbt-assembly) plugin, I do not want Spark dependencies to be included in my fat JAR
  3. when I run unit tests through sbt test, I want Spark dependencies to be included in the classpath (same as #1 but from the SBT)

To match constraint #2, I'm declaring Spark dependencies as provided:

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
  ...
)

Then, sbt-assembly's documentation suggests to add the following line to include the dependencies for unit tests (constraint #3):

run in Compile <<= Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run))

That leaves me with constraint #1 not being full-filled, i.e. I cannot run the application in IntelliJ IDEA as Spark dependencies are not being picked up.

With Maven, I was using a specific profile to build the uber JAR. That way, I was declaring Spark dependencies as regular dependencies for the main profile (IDE and unit tests) while declaring them as provided for the fat JAR packaging. See https://github.com/aseigneurin/kafka-sandbox/blob/master/pom.xml

What is the best way to achieve this with SBT?

Answer

Martin Tapp picture Martin Tapp · Nov 7, 2018

Use the new 'Include dependencies with "Provided" scope' in an IntelliJ configuration.

IntelliJ config with Provided scope checkbox