Top "Apache-spark-ml" questions

Spark ML is a high-level API for building machine learning pipelines in Apache Spark.

How to get word details from TF Vector RDD in Spark ML Lib?

I have created Term Frequency using HashingTF in Spark. I have got the term frequencies using tf.transform for each …

apache-spark apache-spark-mllib tf-idf apache-spark-ml
How do I convert an RDD with a SparseVector Column to a DataFrame with a column as Vector

I have an RDD with a tuple of values (String, SparseVector) and I want to create a DataFrame using the …

apache-spark pyspark apache-spark-sql apache-spark-mllib apache-spark-ml
Pyspark ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:50532)

Hello I was working with Pyspark,implementing a sentiment analysis project using ML package first time the cofde work good …

pyspark apache-spark-ml py4j
Tuning parameters for implicit pyspark.ml ALS matrix factorization model through pyspark.ml CrossValidator

I'm trying to tune the parameters of an ALS matrix factorization model that uses implicit data. For this, I'm trying …

python apache-spark pyspark apache-spark-ml
Spark DataFrame handing empty String in OneHotEncoder

I am importing a CSV file (using spark-csv) into a DataFrame which has empty String values. When applied the OneHotEncoder, …

scala apache-spark apache-spark-mllib apache-spark-ml spark-csv
Serialize a custom transformer using python to be used within a Pyspark ML pipeline

I found the same discussion in comments section of Create a custom Transformer in PySpark ML, but there is no …

pyspark apache-spark-ml
How to flatten columns of type array of structs (as returned by Spark ML API)?

Maybe it's just because I'm relatively new to the API, but I feel like Spark ML methods often return DFs …

apache-spark apache-spark-sql apache-spark-ml
Normalize column with Spark

I have a data file with three columns, and I want to normalize the last column to apply ALS with …

scala apache-spark spark-dataframe apache-spark-ml normalize
How to convert RDD of dense vector into DataFrame in pyspark?

I have a DenseVector RDD like this >>> frequencyDenseVectors.collect() [DenseVector([1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0]), DenseVector([1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), DenseVector([1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]), DenseVector([0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0])] I want to convert …

apache-spark pyspark apache-spark-mllib apache-spark-ml apache-spark-2.0
How to get classification probabilities from PySpark MultilayerPerceptronClassifier?

I'm using Spark 2.0.1 in python, my dataset is in DataFrame, so I'm using the ML (not MLLib) library for machine …

apache-spark machine-learning neural-network pyspark apache-spark-ml