Popular "rdd" questions | Page 9

I've come across the glom() method on RDD. As per the documentation Return an RDD created by coalescing all elements …

apache-spark rdd

So assume ive got an rdd with 3000 rows. The 2000 first rows are of class 1 and the 1000 last rows are of …

apache-spark rdd

Can any one please correct my understanding on persisting by Spark. If we have performed a cache() on an RDD …

apache-spark apache-spark-sql rdd

I have get an error when using mllib RandomForest to train data. As my dataset is huge and the default …

scala apache-spark rdd

Lets start with a simple function which always returns a random integer: import numpy as np def f(x): return …

python random apache-spark pyspark rdd

I have have the following parser to parse arithmetic expressions containing Float and RDD : import scalaz._ import Scalaz._ def term2: …

scala parsing rdd type-mismatch scalaz7

The Apache Spark pyspark.RDD API docs mention that groupByKey() is inefficient. Instead, it is recommend to use reduceByKey(), aggregateByKey(), …

apache-spark rdd pyspark

I would like to dynamically generate a dataframe containing a header record for a report, so creating a dataframe from …

apache-spark dataframe spark-dataframe rdd spark-csv

I need to split an RDD into 2 parts: 1 part which satisfies a condition; another part which does not. I can …

apache-spark rdd

We all know Spark does the computation in memory. I am just curious on followings. If I create 10 RDD in …

hadoop apache-spark pyspark hdfs rdd

Top "Rdd" questions