I am trying to save thousands of models produced by ML Pipeline. As indicated in the answer here, the models can be saved as follows:
import java.io._
def saveModel(name: String, model: PipelineModel) = {
val oos = new ObjectOutputStream(new FileOutputStream(s"/some/path/$name"))
oos.writeObject(model)
oos.close
}
schools.zip(bySchoolArrayModels).foreach{
case (name, model) => saveModel(name, Model)
}
I have tried using s3://some/path/$name
and /user/hadoop/some/path/$name
as I would like the models to be saved to amazon s3 eventually but they both fail with messages indicating the path cannot be found.
How to save models to Amazon S3?
One way to save a model to HDFS is as following:
// persist model to HDFS
sc.parallelize(Seq(model), 1).saveAsObjectFile("hdfs:///user/root/linReg.model")
Saved model can then be loaded as:
val linRegModel = sc.objectFile[LinearRegressionModel]("linReg.model").first()
For more details see (ref)