I can't understand reduceByKey(_ + _) in the first example of spark with scala
object WordCount {
def main(args: Array[String]): Unit = {
val inputPath = args(0)
val outputPath = args(1)
val sc = new SparkContext()
val lines = sc.textFile(inputPath)
val wordCounts = lines.flatMap {line => line.split(" ")}
.map(word => (word, 1))
.reduceByKey(_ + _) **I cant't understand this line**
wordCounts.saveAsTextFile(outputPath)
}
}
Reduce takes two elements and produce a third after applying a function to the two parameters.
The code you shown is equivalent to the the following
reduceByKey((x,y)=> x + y)
Instead of defining dummy variables and write a lambda, Scala is smart enough to figure out that what you trying achieve is applying a func
(sum in this case) on any two parameters it receives and hence the syntax
reduceByKey(_ + _)