Top "Apache-spark-ml" questions

Spark ML is a high-level API for building machine learning pipelines in Apache Spark.

SparkException: Values to assemble cannot be null

I want use StandardScaler to normalize the features. Here is my code: val Array(trainingData, testData) = dataset.randomSplit(Array(0.7,0.3)) val …

apache-spark apache-spark-sql apache-spark-ml
Create labeledPoints from Spark DataFrame in Python

What .map() function in python do I use to create a set of labeledPoints from a spark dataframe? What is …

python pandas apache-spark apache-spark-mllib apache-spark-ml
What is the difference between HashingTF and CountVectorizer in Spark?

Trying to do doc classification in Spark. I am not sure what the hashing does in HashingTF; does it sacrifice …

apache-spark apache-spark-mllib apache-spark-ml
Spark Scala: How to convert Dataframe[vector] to DataFrame[f1:Double, ..., fn: Double)]

I just used Standard Scaler to normalize my features for a ML application. After selecting the scaled features, I want …

scala apache-spark apache-spark-sql apache-spark-ml
Spark Multiclass Classification Example

Do you guys know where can I find examples of multiclass classification in Spark. I spent a lot of time …

scala apache-spark apache-spark-mllib random-forest apache-spark-ml
How to map features from the output of a VectorAssembler back to the column names in Spark ML?

I'm trying to run a linear regression in PySpark and I want to create a table containing summary statistics such …

python apache-spark machine-learning pyspark apache-spark-ml
apply OneHotEncoder for several categorical columns in SparkMlib

I have several categorical features and would like to transform them all using OneHotEncoder. However, when I tried to apply …

python apache-spark pyspark apache-spark-mllib apache-spark-ml
How to define a custom aggregation function to sum a column of Vectors?

I have a DataFrame of two columns, ID of type Int and Vec of type Vector (org.apache.spark.mllib.…

scala apache-spark apache-spark-sql aggregate-functions apache-spark-ml
Column name with dot spark

I am trying to take columns from a DataFrame and convert it to an RDD[Vector]. The problem is that …

scala apache-spark apache-spark-sql apache-spark-mllib apache-spark-ml