MLlib is a machine learning library for Apache Spark
Trying to understand Spark's normalization algorithm. My small test set contains 5 vectors: {0.95, 0.018, 0.0, 24.0, 24.0, 14.4, 70000.0}, {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 70000.0}, {-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, 70000.0}, {-0.95, 0.018, 0.0, 24.0, 24.0, 14.4, 70000.0}, {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 70000.0}, I …
apache-spark apache-spark-mllib apache-spark-mlI have created Term Frequency using HashingTF in Spark. I have got the term frequencies using tf.transform for each …
apache-spark apache-spark-mllib tf-idf apache-spark-mlI have an RDD with a tuple of values (String, SparseVector) and I want to create a DataFrame using the …
apache-spark pyspark apache-spark-sql apache-spark-mllib apache-spark-mlI'm evaluating tools for production ML based applications and one of our options is Spark MLlib , but I have some …
apache-spark machine-learning apache-spark-mllibI'm trying to build a very simple scala standalone app using the Mllib, but I get the following error when …
scala apache-spark apache-spark-mllibI am importing a CSV file (using spark-csv) into a DataFrame which has empty String values. When applied the OneHotEncoder, …
scala apache-spark apache-spark-mllib apache-spark-ml spark-csvI'm trying to perform a logistic regression (LogisticRegressionWithLBFGS) with Spark MLlib (with Scala) on a dataset which contains categorical variables. …
scala apache-spark bigdata apache-spark-mllib categorical-dataI'm trying to extract the class probabilities of a random forest object I have trained using PySpark. However, I do …
apache-spark pyspark random-forest apache-spark-mllibI have a DenseVector RDD like this >>> frequencyDenseVectors.collect() [DenseVector([1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0]), DenseVector([1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), DenseVector([1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]), DenseVector([0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0])] I want to convert …
apache-spark pyspark apache-spark-mllib apache-spark-ml apache-spark-2.0I'm using RandomForest.featureImportances but I don't understand the output result. I have 12 features, and this is the output I …
apache-spark classification random-forest apache-spark-mllib