Top "Apache-spark-ml" questions

Spark ML is a high-level API for building machine learning pipelines in Apache Spark.

Save and load two ML models in pyspark

First I create two ML algorithms and save them to two separate files. Note that both models are based on …

python apache-spark pyspark apache-spark-ml
What do columns ‘rawPrediction’ and ‘probability’ of DataFrame mean in Spark MLlib?

After I trained a LogisticRegressionModel, I transformed the test data DF with it and get the prediction DF. And then …

apache-spark-sql logistic-regression apache-spark-ml
How to merge multiple feature vectors in DataFrame?

Using Spark ML transformers I arrived at a DataFrame where each row looks like this: Row(object_id, text_features_…

apache-spark machine-learning apache-spark-sql apache-spark-ml
How to save models from ML Pipeline to S3 or HDFS?

I am trying to save thousands of models produced by ML Pipeline. As indicated in the answer here, the models …

java scala apache-spark apache-spark-mllib apache-spark-ml
How to access element of a VectorUDT column in a Spark DataFrame?

I have a dataframe df with a VectorUDT column named features. How do I get an element of the column, …

apache-spark dataframe pyspark apache-spark-sql apache-spark-ml
Spark, Scala, DataFrame: create feature vectors

I have a DataFrame that looks like follow: userID, category, frequency 1,cat1,1 1,cat2,3 1,cat9,5 2,cat4,6 2,cat9,2 2,cat10,1 3,cat1,5 3,cat7,16 3,cat8,2 …

scala apache-spark apache-spark-sql apache-spark-ml
Field "features" does not exist. SparkML

I am trying to build a model in Spark ML with Zeppelin. I am new to this area and would …

scala apache-zeppelin apache-spark-ml
Feature normalization algorithm in Spark

Trying to understand Spark's normalization algorithm. My small test set contains 5 vectors: {0.95, 0.018, 0.0, 24.0, 24.0, 14.4, 70000.0}, {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 70000.0}, {-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, 70000.0}, {-0.95, 0.018, 0.0, 24.0, 24.0, 14.4, 70000.0}, {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 70000.0}, I …

apache-spark apache-spark-mllib apache-spark-ml
Using Spark ML's OneHotEncoder on multiple columns

I've been able to create a pipeline that will allow me to index multiple string columns at once, but I …

scala apache-spark apache-spark-ml
pyspark extract ROC curve?

Is there a way to get the points on an ROC curve from Spark ML in pyspark? In the documentation …

pyspark apache-spark-ml