Top "Apache-spark-mllib" questions

MLlib is a machine learning library for Apache Spark

apply OneHotEncoder for several categorical columns in SparkMlib

I have several categorical features and would like to transform them all using OneHotEncoder. However, when I tried to apply …

python apache-spark pyspark apache-spark-mllib apache-spark-ml
From DataFrame to RDD[LabeledPoint]

I am trying to implement a document classifier using Apache Spark MLlib and I am having some problems representing the …

scala apache-spark apache-spark-mllib
Column name with dot spark

I am trying to take columns from a DataFrame and convert it to an RDD[Vector]. The problem is that …

scala apache-spark apache-spark-sql apache-spark-mllib apache-spark-ml
how to add a Incremental column ID for a table in spark SQL

I'm working on a spark mllib algorithm. The dataset I have is in this form Company":"XXXX","CurrentTitle":"XYZ","Edu_…

apache-spark apache-spark-sql spark-dataframe apache-spark-mllib
DBSCAN on spark : which implementation

I would like to do some DBSCAN on Spark. I have currently found 2 implementations: https://github.com/irvingc/dbscan-on-spark https://…

scala apache-spark cluster-analysis apache-spark-mllib dbscan
How to convert type Row into Vector to feed to the KMeans

when i try to feed df2 to kmeans i get the following error clusters = KMeans.train(df2, 10, maxIterations=30, runs=10, initializationMode="…

apache-spark pyspark k-means apache-spark-mllib pyspark-sql
Addition of two RDD[mllib.linalg.Vector]'s

I need addition of two matrices that are stored in two files. The content of latest1.txt and latest2.txt …

scala apache-spark apache-spark-mllib
(Spark) object {name} is not a member of package org.apache.spark.ml

I'm trying to run self-contained application using scala on apache spark based on example here: http://spark.apache.org/docs/…

scala apache-spark sbt apache-spark-mllib
How to save models from ML Pipeline to S3 or HDFS?

I am trying to save thousands of models produced by ML Pipeline. As indicated in the answer here, the models …

java scala apache-spark apache-spark-mllib apache-spark-ml
How to convert org.apache.spark.rdd.RDD[Array[Double]] to Array[Double] which is required by Spark MLlib

I am trying to implement KMeans using Apache Spark. val data = sc.textFile(irisDatasetString) val parsedData = data.map(_.split(',…

apache-spark apache-spark-mllib