Popular "rdd" questions | Page 4

How to sort an RDD in Scala Spark?

Reading Spark method sortByKey : sortByKey([ascending], [numTasks]) When called on a dataset of (K, V) pairs where K implements Ordered, …

scala apache-spark rdd

How do I select a range of elements in Spark RDD?

I'd like to select a range of elements in a Spark RDD. For example, I have an RDD with a …

apache-spark rdd

Explain the aggregate functionality in Spark

I am looking for some better explanation of the aggregate functionality that is available via spark in python. The example …

python apache-spark lambda aggregate rdd

How do I get a SQL row_number equivalent for a Spark RDD?

I need to generate a full list of row_numbers for a data table with many columns. In SQL, this …

sql apache-spark row-number rdd

How does HashPartitioner work?

I read up on the documentation of HashPartitioner. Unfortunately nothing much was explained except for the API calls. I am …

scala apache-spark rdd partitioning

DataFrame equality in Apache Spark

Assume df1 and df2 are two DataFrames in Apache Spark, computed using two different mechanisms, e.g., Spark SQL vs. …

scala apache-spark dataframe apache-spark-sql rdd

What is RDD in spark

Definition says: RDD is immutable distributed collection of objects I don't quite understand what does it mean. Is it like …

scala hadoop apache-spark rdd

Spark: RDD to List

I have a RDD structure RDD[(String, String)] and I want to create 2 Lists (one for each dimension of the …

scala list apache-spark rdd

How to partition RDD by key in Spark?

Given that the HashPartitioner docs say: [HashPartitioner] implements hash-based partitioning using Java's Object.hashCode. Say I want to partition DeviceData …

scala apache-spark rdd

How to extract an element from a array in pyspark

python apache-spark pyspark rdd

Top "Rdd" questions