How to print the contents of RDD?

blue-sky picture blue-sky · Apr 19, 2014 · Viewed 246.1k times · Source

I'm attempting to print the contents of a collection to the Spark console.

I have a type:

linesWithSessionId: org.apache.spark.rdd.RDD[String] = FilteredRDD[3]

And I use the command:

scala> linesWithSessionId.map(line => println(line))

But this is printed :

res1: org.apache.spark.rdd.RDD[Unit] = MappedRDD[4] at map at :19

How can I write the RDD to console or save it to disk so I can view its contents?

Answer

Oussama picture Oussama · Apr 24, 2014

If you want to view the content of a RDD, one way is to use collect():

myRDD.collect().foreach(println)

That's not a good idea, though, when the RDD has billions of lines. Use take() to take just a few to print out:

myRDD.take(n).foreach(println)