Top "Apache-spark-mllib" questions

MLlib is a machine learning library for Apache Spark

Create labeledPoints from Spark DataFrame in Python

What .map() function in python do I use to create a set of labeledPoints from a spark dataframe? What is …

python pandas apache-spark apache-spark-mllib apache-spark-ml
What is the difference between HashingTF and CountVectorizer in Spark?

Trying to do doc classification in Spark. I am not sure what the hashing does in HashingTF; does it sacrifice …

apache-spark apache-spark-mllib apache-spark-ml
Converting RDD[org.apache.spark.sql.Row] to RDD[org.apache.spark.mllib.linalg.Vector]

I am relatively new to Spark and Scala. I am starting with the following dataframe (single column made out of …

scala apache-spark rdd spark-dataframe apache-spark-mllib
How can I create a TF-IDF for Text Classification using Spark?

I have a CSV file with the following format : product_id1,product_title1 product_id2,product_title2 product_id3,product_…

scala apache-spark apache-spark-mllib tf-idf
How to extract a value from a Vector in a column of a Spark Dataframe

When using SparkML to predict labels the result Dataframe is: scala> result.show +-----------+--------------+ |probability|predictedLabel| +-----------+--------------+ | [0.0,1.0]| 0.0| | [0.0,1.0]| 0.0| | [0.0,1.0]| 0.0| | [0.0,1.0]| 0.0| | [0.0,1.0]| 0.0| | [0.1,0.9]| 0.0| | [0.0,1.0]| 0.0| | [0.0,1.0]| 0.0| | [0.0,1.0]| 0.0| | [0.0,1.0]| 0.0| | [0.0,1.0]| 0.0| | [0.0,1.0]| 0.0| | [0.1,0.9]| 0.0| | [0.6,0.4]| 1.0| | [0.6,0.4]| 1.0| | [1.0,0.0]| 1.0| | [0.9,0.1]| 1.0| | [0.9,0.1]| 1.0| | [1.0,0.0]| 1.0| | [1.0,0.0]| 1.0| +…

scala apache-spark dataframe apache-spark-sql apache-spark-mllib
How do I run the Spark decision tree with a categorical feature set using Scala?

I have a feature set with a corresponding categoricalFeaturesInfo: Map[Int,Int]. However, for the life of me I cannot …

scala apache-spark tree apache-spark-mllib categorical-data
Spark Multiclass Classification Example

Do you guys know where can I find examples of multiclass classification in Spark. I spent a lot of time …

scala apache-spark apache-spark-mllib random-forest apache-spark-ml
Creating Spark dataframe from numpy matrix

it is my first time with PySpark, (Spark 2), and I'm trying to create a toy dataframe for a Logit model. …

numpy apache-spark pyspark apache-spark-sql apache-spark-mllib
Calling Java/Scala function from a task

Background My original question here was Why using DecisionTreeModel.predict inside map function raises an exception? and is related to …

python scala apache-spark pyspark apache-spark-mllib
How to convert spark DataFrame to RDD mllib LabeledPoints?

I tried to apply PCA to my data and then apply RandomForest to the transformed data. However, PCA.transform(data) …

scala apache-spark rdd pca apache-spark-mllib