Spark ML is a high-level API for building machine learning pipelines in Apache Spark.
First I create two ML algorithms and save them to two separate files. Note that both models are based on …
python apache-spark pyspark apache-spark-mlAfter I trained a LogisticRegressionModel, I transformed the test data DF with it and get the prediction DF. And then …
apache-spark-sql logistic-regression apache-spark-mlUsing Spark ML transformers I arrived at a DataFrame where each row looks like this: Row(object_id, text_features_…
apache-spark machine-learning apache-spark-sql apache-spark-mlI am trying to save thousands of models produced by ML Pipeline. As indicated in the answer here, the models …
java scala apache-spark apache-spark-mllib apache-spark-mlI have a dataframe df with a VectorUDT column named features. How do I get an element of the column, …
apache-spark dataframe pyspark apache-spark-sql apache-spark-mlI have a DataFrame that looks like follow: userID, category, frequency 1,cat1,1 1,cat2,3 1,cat9,5 2,cat4,6 2,cat9,2 2,cat10,1 3,cat1,5 3,cat7,16 3,cat8,2 …
scala apache-spark apache-spark-sql apache-spark-mlI am trying to build a model in Spark ML with Zeppelin. I am new to this area and would …
scala apache-zeppelin apache-spark-mlTrying to understand Spark's normalization algorithm. My small test set contains 5 vectors: {0.95, 0.018, 0.0, 24.0, 24.0, 14.4, 70000.0}, {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 70000.0}, {-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, 70000.0}, {-0.95, 0.018, 0.0, 24.0, 24.0, 14.4, 70000.0}, {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 70000.0}, I …
apache-spark apache-spark-mllib apache-spark-mlI've been able to create a pipeline that will allow me to index multiple string columns at once, but I …
scala apache-spark apache-spark-mlIs there a way to get the points on an ROC curve from Spark ML in pyspark? In the documentation …
pyspark apache-spark-ml