I'm attempting to print the contents of a collection to the Spark console.
I have a type:
linesWithSessionId: org.apache.spark.rdd.RDD[String] = FilteredRDD[3]
And I use the command:
scala> linesWithSessionId.map(line => println(line))
But this is printed :
res1: org.apache.spark.rdd.RDD[Unit] = MappedRDD[4] at map at :19
How can I write the RDD to console or save it to disk so I can view its contents?
If you want to view the content of a RDD, one way is to use collect()
:
myRDD.collect().foreach(println)
That's not a good idea, though, when the RDD has billions of lines. Use take()
to take just a few to print out:
myRDD.take(n).foreach(println)