Top "Apache-spark-mllib" questions

MLlib is a machine learning library for Apache Spark

Feature normalization algorithm in Spark

Trying to understand Spark's normalization algorithm. My small test set contains 5 vectors: {0.95, 0.018, 0.0, 24.0, 24.0, 14.4, 70000.0}, {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 70000.0}, {-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, 70000.0}, {-0.95, 0.018, 0.0, 24.0, 24.0, 14.4, 70000.0}, {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 70000.0}, I …

apache-spark apache-spark-mllib apache-spark-ml
How to get word details from TF Vector RDD in Spark ML Lib?

I have created Term Frequency using HashingTF in Spark. I have got the term frequencies using tf.transform for each …

apache-spark apache-spark-mllib tf-idf apache-spark-ml
How do I convert an RDD with a SparseVector Column to a DataFrame with a column as Vector

I have an RDD with a tuple of values (String, SparseVector) and I want to create a DataFrame using the …

apache-spark pyspark apache-spark-sql apache-spark-mllib apache-spark-ml
How to serve a Spark MLlib model?

I'm evaluating tools for production ML based applications and one of our options is Spark MLlib , but I have some …

apache-spark machine-learning apache-spark-mllib
Mllib dependency error

I'm trying to build a very simple scala standalone app using the Mllib, but I get the following error when …

scala apache-spark apache-spark-mllib
Spark DataFrame handing empty String in OneHotEncoder

I am importing a CSV file (using spark-csv) into a DataFrame which has empty String values. When applied the OneHotEncoder, …

scala apache-spark apache-spark-mllib apache-spark-ml spark-csv
How to transform a categorical variable in Spark into a set of columns coded as {0,1}?

I'm trying to perform a logistic regression (LogisticRegressionWithLBFGS) with Spark MLlib (with Scala) on a dataset which contains categorical variables. …

scala apache-spark bigdata apache-spark-mllib categorical-data
PySpark & MLLib: Class Probabilities of Random Forest Predictions

I'm trying to extract the class probabilities of a random forest object I have trained using PySpark. However, I do …

apache-spark pyspark random-forest apache-spark-mllib
How to convert RDD of dense vector into DataFrame in pyspark?

I have a DenseVector RDD like this >>> frequencyDenseVectors.collect() [DenseVector([1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0]), DenseVector([1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), DenseVector([1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]), DenseVector([0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0])] I want to convert …

apache-spark pyspark apache-spark-mllib apache-spark-ml apache-spark-2.0
Understanding Spark RandomForest featureImportances results

I'm using RandomForest.featureImportances but I don't understand the output result. I have 12 features, and this is the output I …

apache-spark classification random-forest apache-spark-mllib