Spark ML is a high-level API for building machine learning pipelines in Apache Spark.
I want use StandardScaler to normalize the features. Here is my code: val Array(trainingData, testData) = dataset.randomSplit(Array(0.7,0.3)) val …
apache-spark apache-spark-sql apache-spark-mlWhat .map() function in python do I use to create a set of labeledPoints from a spark dataframe? What is …
python pandas apache-spark apache-spark-mllib apache-spark-mlTrying to do doc classification in Spark. I am not sure what the hashing does in HashingTF; does it sacrifice …
apache-spark apache-spark-mllib apache-spark-mlI just used Standard Scaler to normalize my features for a ML application. After selecting the scaled features, I want …
scala apache-spark apache-spark-sql apache-spark-mlDo you guys know where can I find examples of multiclass classification in Spark. I spent a lot of time …
scala apache-spark apache-spark-mllib random-forest apache-spark-mlI am reducing the dimensionality of a Spark DataFrame with PCA model with pyspark (using the spark ml library) as …
apache-spark apache-spark-sql pyspark pca apache-spark-mlI'm trying to run a linear regression in PySpark and I want to create a table containing summary statistics such …
python apache-spark machine-learning pyspark apache-spark-mlI have several categorical features and would like to transform them all using OneHotEncoder. However, when I tried to apply …
python apache-spark pyspark apache-spark-mllib apache-spark-mlI have a DataFrame of two columns, ID of type Int and Vec of type Vector (org.apache.spark.mllib.…
scala apache-spark apache-spark-sql aggregate-functions apache-spark-mlI am trying to take columns from a DataFrame and convert it to an RDD[Vector]. The problem is that …
scala apache-spark apache-spark-sql apache-spark-mllib apache-spark-ml