Top "Apache-spark-ml" questions

Spark ML is a high-level API for building machine learning pipelines in Apache Spark.

Access element of a vector in a Spark DataFrame (Logistic Regression probability vector)

I trained a LogisticRegression model in PySpark (ML package) and the result of the prediction is a PySpark DataFrame (cv_…

python apache-spark pyspark spark-dataframe apache-spark-ml
Save ML model for future usage

I was applying some Machine Learning algorithms like Linear Regression, Logistic Regression, and Naive Bayes to some data, but I …

apache-spark pyspark apache-spark-mllib apache-spark-ml
Dropping a nested column from Spark DataFrame

I have a DataFrame with the schema root |-- label: string (nullable = true) |-- features: struct (nullable = true) | |-- feat1: …

scala apache-spark dataframe apache-spark-sql apache-spark-ml
How to use XGboost in PySpark Pipeline

I want to update my code of pyspark. In the pyspark, it must put the base model in a pipeline, …

apache-spark pyspark apache-spark-mllib xgboost apache-spark-ml
How to cross validate RandomForest model?

I want to evaluate a random forest being trained on some data. Is there any utility in Apache Spark to …

apache-spark random-forest cross-validation apache-spark-ml apache-spark-mllib
Create a custom Transformer in PySpark ML

I am new to Spark SQL DataFrames and ML on them (PySpark). How can I create a custom tokenizer, which …

python apache-spark nltk pyspark apache-spark-ml
Create feature vector programmatically in Spark ML / pyspark

I'm wondering if there is a concise way to run ML (e.g KMeans) on a DataFrame in pyspark if …

python apache-spark pyspark apache-spark-ml
PCA Analysis in PySpark

Looking at http://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html. The examples seem to only contain Java and Scala. Does …

python apache-spark apache-spark-mllib pca apache-spark-ml
What's the difference between Spark ML and MLLIB packages

I noticed there are two LinearRegressionModel classes in SparkML, one in ML and another one in MLLib package. These two …

apache-spark apache-spark-mllib apache-spark-ml
Apache Spark throws NullPointerException when encountering missing feature

I have a bizarre issue with PySpark when indexing column of strings in features. Here is my tmp.csv file: …

python apache-spark apache-spark-sql pyspark apache-spark-ml