I have a huge file in HDFS having Time Series data points (Yahoo Stock prices).
I want to find the moving average of the Time Series how do I go about writing the Apache Spark job to do that .
You can use the sliding function from MLLIB which probably does the same thing as Daniel's answer. You will have to sort the data by time before using the sliding function.
import org.apache.spark.mllib.rdd.RDDFunctions._
sc.parallelize(1 to 100, 10)
.sliding(3)
.map(curSlice => (curSlice.sum / curSlice.size))
.collect()