Convert an RDD to iterable: PySpark?

pg2455 picture pg2455 · Sep 25, 2015 · Viewed 16.9k times · Source

I have an RDD which I am creating by loading a text file and preprocessing it. I dont want to collect it and save it to the disk or memory(entire data) but rather want to pass it to some other function in python which consumes data one after the other is form of iterable.

How is this possible?

data =  sc.textFile('file.txt').map(lambda x: some_func(x))

an_iterable = data. ##  what should I do here to make it give me one element at a time?
def model1(an_iterable):
 for i in an_iterable:
  do_that(i)

model(an_iterable)

Answer

danf1024 picture danf1024 · Sep 25, 2015

I believe what you want is toLocalIterator():