Popular "rdd" questions | Page 5

In Pyspark, I can create a RDD from a list and decide how many partitions to have: sc = SparkContext() sc.…

performance apache-spark pyspark rdd

I'm just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for …

apache-spark apache-spark-sql rdd apache-spark-dataset

I want to share this particular Apache Spark with Python solution because documentation for it is quite poor. I wanted …

python apache-spark aggregate average rdd

I have a very big pyspark.sql.dataframe.DataFrame named df. I need some way of enumerating records- thus, being …

python apache-spark bigdata pyspark rdd

Why does the rdd.sample() function on Spark RDD return a different number of elements even though the fraction parameter …

apache-spark sample rdd

I have the following spark job, trying to keep everything in memory: val myOutRDD = myInRDD.flatMap { fp => val tuple2…

apache-spark shuffle rdd persist

I have an RDD called JavaPairRDD<String, List<String>> existingRDD; Now I need to initialize this …

java apache-spark rdd

I used to think that rdd.take(1) and rdd.first() are exactly the same. However I began to wonder if …

apache-spark pyspark rdd

I've been playing around with converting RDDs to DataFrames and back again. First, I had an RDD of type (Int, …

scala apache-spark dataframe rdd

Let us say I have the following two RDDs, with the following key-pair values. rdd1 = [ (key1, [value1, value2]), (key2, [value3, …

python scala apache-spark rdd

Top "Rdd" questions