Top "Rdd" questions

Resilient Distributed Datasets (RDDs) are a distributed memory abstraction that allows programmers to perform in-memory computations on large clusters while retaining the fault tolerance of data flow models like MapReduce.

Is there an "Explain RDD" in spark

In particular, if I say rdd3 = rdd1.join(rdd2) then when I call rdd3.collect, depending on the Partitioner used, …

apache-spark rdd