Top "Apache-spark-mllib" questions

MLlib is a machine learning library for Apache Spark

Matrix Multiplication in Apache Spark

I am trying to perform matrix multiplication using Apache Spark and Java. I have 2 main questions: How to create RDD …

java scala apache-spark rdd apache-spark-mllib
How to cross validate RandomForest model?

I want to evaluate a random forest being trained on some data. Is there any utility in Apache Spark to …

apache-spark random-forest cross-validation apache-spark-ml apache-spark-mllib
How to convert a map to Spark's RDD

I have a data set which is in the form of some nested maps, and its Scala type is: Map[…

scala apache-spark libsvm apache-spark-mllib
Apache Spark: How to create a matrix from a DataFrame?

I have a DataFrame in Apache Spark with an array of integers, the source is a set of images. I …

python matrix apache-spark pyspark apache-spark-mllib
PCA Analysis in PySpark

Looking at http://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html. The examples seem to only contain Java and Scala. Does …

python apache-spark apache-spark-mllib pca apache-spark-ml
Spark train test split

I am curious if there is something similar to sklearn's http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedShuffleSplit.…

apache-spark apache-spark-mllib train-test-split
What's the difference between Spark ML and MLLIB packages

I noticed there are two LinearRegressionModel classes in SparkML, one in ML and another one in MLLib package. These two …

apache-spark apache-spark-mllib apache-spark-ml
What is the right way to save\load models in Spark\PySpark

I'm working with Spark 1.3.0 using PySpark and MLlib and I need to save and load my models. I use code …

python apache-spark pyspark apache-spark-mllib
Spark job execution time

This might be a very simple question. But is there any simple way to measure the execution time of a …

apache-spark apache-spark-mllib apache-spark-1.5
How to extract best parameters from a CrossValidatorModel

I want to find the parameters of ParamGridBuilder that make the best model in CrossValidator in Spark 1.4.x, In Pipeline …

scala apache-spark pipeline cross-validation apache-spark-mllib