What's the difference between Spark ML and MLLIB packages

vyakhir picture vyakhir · Aug 8, 2016 · Viewed 15.3k times · Source

I noticed there are two LinearRegressionModel classes in SparkML, one in ML and another one in MLLib package.

These two are implemented quite differently - e.g. the one from MLLib implements Serializable, while the other one does not.

By the way ame is true about RandomForestModel.

Why is there two classes? Which is the "right" one? And is there a way to convert one into another?

Answer

zero323 picture zero323 · Aug 8, 2016

o.a.s.mllib contains old RDD-based API while o.a.s.ml contains new API build around Dataset and ML Pipelines. ml and mllib reached feature parity in 2.0.0 and mllib is slowly being deprecated (this already happened in case of linear regression) and most likely will be removed in the next major release.

So unless your goal is backward compatibility then the "right choice" is o.a.s.ml.