I know the method rdd.firstwfirst() which gives me the first element in an RDD.
Also there is the method rdd.take(num) Which gives me the first "num" elements.
But isn't there a possibility to get an element by index?
Thanks.e
This should be possible by first indexing the RDD. The transformation zipWithIndex
provides a stable indexing, numbering each element in its original order.
Given: rdd = (a,b,c)
val withIndex = rdd.zipWithIndex // ((a,0),(b,1),(c,2))
To lookup an element by index, this form is not useful. First we need to use the index as key:
val indexKey = withIndex.map{case (k,v) => (v,k)} //((0,a),(1,b),(2,c))
Now, it's possible to use the lookup
action in PairRDD to find an element by key:
val b = indexKey.lookup(1) // Array(b)
If you're expecting to use lookup
often on the same RDD, I'd recommend to cache the indexKey
RDD to improve performance.
How to do this using the Java API is an exercise left for the reader.