Top "Apache-spark-mllib" questions

MLlib is a machine learning library for Apache Spark

AttributeError: 'DataFrame' object has no attribute 'map'

I wanted to convert the spark data frame to add using the code below: from pyspark.mllib.clustering import KMeans …

python apache-spark pyspark spark-dataframe apache-spark-mllib
The value of "spark.yarn.executor.memoryOverhead" setting?

The value of spark.yarn.executor.memoryOverhead in a Spark job with YARN should be allocated to App or just …

apache-spark apache-spark-sql spark-streaming apache-spark-mllib
How to handle categorical features with spark-ml?

How do I handle categorical data with spark-ml and not spark-mllib ? Thought the documentation is not very clear, it seems …

apache-spark categorical-data apache-spark-ml apache-spark-mllib
Sparse Vector vs Dense Vector

How to create SparseVector and dense Vector representations if the DenseVector is: denseV = np.array([0., 3., 0., 4.]) What will be the Sparse …

apache-spark apache-spark-mllib
What is the difference between Apache Mahout and Apache Spark's MLlib?

Considering a MySQL products database with 10 millions products for an e-commerce website. I'm trying to set up a classification module …

apache-spark mahout apache-spark-mllib
How to create correct data frame for classification in Spark ML

I am trying to run random forest classification by using Spark ML api but I am having issues with creating …

scala apache-spark apache-spark-sql apache-spark-mllib
extracting numpy array from Pyspark Dataframe

I have a dataframe gi_man_df where group can be n: +------------------+-----------------+--------+--------------+ | group | number|rand_int| …

numpy apache-spark pyspark spark-dataframe apache-spark-mllib
KMeans clustering in PySpark

I have a spark dataframe 'mydataframe' with many columns. I am trying to run kmeans on only two columns: lat …

machine-learning pyspark k-means apache-spark-mllib apache-spark-ml
PySpark & MLLib: Random Forest Feature Importances

I'm trying to extract the feature importances of a random forest object I have trained using PySpark. However, I do …

apache-spark pyspark random-forest apache-spark-mllib
How to assign unique contiguous numbers to elements in a Spark RDD

I have a dataset of (user, product, review), and want to feed it into mllib's ALS algorithm. The algorithm needs …

apache-spark apache-spark-mllib