Top "Apache-spark-ml" questions

Spark ML is a high-level API for building machine learning pipelines in Apache Spark.

pyspark : NameError: name 'spark' is not defined

I am copying the pyspark.ml example from the official document website: http://spark.apache.org/docs/latest/api/python/…

apache-spark machine-learning pyspark distributed-computing apache-spark-ml
How to handle categorical features with spark-ml?

How do I handle categorical data with spark-ml and not spark-mllib ? Thought the documentation is not very clear, it seems …

apache-spark categorical-data apache-spark-ml apache-spark-mllib
Should we parallelize a DataFrame like we parallelize a Seq before training

Consider the code given here, https://spark.apache.org/docs/1.2.0/ml-guide.html import org.apache.spark.ml.classification.LogisticRegression val …

scala apache-spark pyspark apache-spark-sql apache-spark-ml
How to split Vector into columns - using PySpark

Context: I have a DataFrame with 2 columns: word and vector. Where the column type of "vector" is VectorUDT. An Example: …

python apache-spark pyspark apache-spark-sql apache-spark-ml
How do I convert an array (i.e. list) column to Vector

Short version of the question! Consider the following snippet (assuming spark is already set to some SparkSession): from pyspark.sql …

python apache-spark pyspark apache-spark-sql apache-spark-ml
KMeans clustering in PySpark

I have a spark dataframe 'mydataframe' with many columns. I am trying to run kmeans on only two columns: lat …

machine-learning pyspark k-means apache-spark-mllib apache-spark-ml
Encode and assemble multiple features in PySpark

I have a Python class that I'm using to load and process some data in Spark. Among various things I …

python apache-spark apache-spark-sql apache-spark-mllib apache-spark-ml
How to extract model hyper-parameters from spark.ml in PySpark?

I'm tinkering with some cross-validation code from the PySpark documentation, and trying to get PySpark to tell me what model …

pyspark modeling cross-validation apache-spark-mllib apache-spark-ml
Spark, ML, StringIndexer: handling unseen labels

My goal is to build a multicalss classifier. I have built a pipeline for feature extraction and it includes as …

apache-spark apache-spark-ml
How to prepare data into a LibSVM format from DataFrame?

I want to make libsvm format, so I made dataframe to the desired format, but I do not know how …

apache-spark apache-spark-sql apache-spark-mllib libsvm apache-spark-ml