What does "Stage Skipped" mean in Apache Spark web UI?

Aravind Yarram picture Aravind Yarram · Jan 3, 2016 · Viewed 22k times · Source

From my Spark UI. What does it mean by skipped?

enter image description here

Answer

zero323 picture zero323 · Jan 3, 2016

Typically it means that data has been fetched from cache and there was no need to re-execute given stage. It is consistent with your DAG which shows that the next stage requires shuffling (reduceByKey). Whenever there is shuffling involved Spark automatically caches generated data:

Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files are preserved until the corresponding RDDs are no longer used and are garbage collected. This is done so the shuffle files don’t need to be re-created if the lineage is re-computed.