Top "Apache-spark-ml" questions

Spark ML is a high-level API for building machine learning pipelines in Apache Spark.

Pyspark - Get all parameters of models created with ParamGridBuilder

I'm using PySpark 2.0 for a Kaggle competition. I'd like to know the behavior of a model (RandomForest) depending on different …

python machine-learning pyspark apache-spark-ml hyperparameters
Cannot convert type <class 'pyspark.ml.linalg.SparseVector'> into Vector

Given my pyspark Row object: >>> row Row(clicked=0, features=SparseVector(7, {0: 1.0, 3: 1.0, 6: 0.752})) >>> row.clicked 0 >>&…

apache-spark pyspark apache-spark-sql apache-spark-mllib apache-spark-ml
spark.ml StringIndexer throws 'Unseen label' on fit()

I'm preparing a toy spark.ml example. Spark version 1.6.0, running on top of Oracle JDK version 1.8.0_65, pyspark, ipython notebook. First, …

apache-spark dataframe pyspark apache-spark-sql apache-spark-ml
How to create a custom Estimator in PySpark

I am trying to build a simple custom Estimator in PySpark MLlib. I have here that it is possible to …

python apache-spark pyspark apache-spark-mllib apache-spark-ml