Im using spark (in java API) and require a single jar that can be pushed to the cluster, however the jar itself should not include spark. The app that deploys the jobs of course should include spark.
I would like:
I have 1. and 3. working. Any ideas on how I can 2. ? What code would I need to add to my build.sbt file?
The question is not relevant only to spark, but any other dependency that I may wish to exclude as well.
The first option to exclude a jar from the fat jar is to use "provided"
configuration on the library dependency. "provided"
comes from Maven's provided scope that's defined as follows:
This is much like
compile
, but indicates you expect the JDK or a container to provide the dependency at runtime. For example, when building a web application for the Java Enterprise Edition, you would set the dependency on the Servlet API and related Java EE APIs to scopeprovided
because the web container provides those classes. This scope is only available on the compilation and test classpath, and is not transitive.
Since you're deploying your code to a container (in this case Spark), contrary to your comment you'd probably need Scala standard library, and other library jars (e.g. Dispatch if you used it). This won't affect run
or test
.
If you just want your source code, and no Scala standard library or other library dependencies, that would be packageBin
built into sbt. This packaged jar can be combined with dependency-only jar you can make using sbt-assembly's assemblyPackageDependency
.
The final option is to use excludedJars in assembly
:
excludedJars in assembly := {
val cp = (fullClasspath in assembly).value
cp filter {_.data.getName == "spark-core_2.9.3-0.8.0-incubating.jar"}
}