What is yarn-client mode in Spark?

zxz picture zxz · Dec 27, 2013 · Viewed 61.9k times · Source

Apache Spark has recently updated the version to 0.8.1, in which yarn-client mode is available. My question is, what does yarn-client mode really mean? In the documentation it says:

With yarn-client mode, the application will be launched locally. Just like running application or spark-shell on Local / Mesos / Standalone mode. The launch method is also the similar with them, just make sure that when you need to specify a master url, use “yarn-client” instead

What does it mean "launched locally"? Locally where? On the Spark cluster?
What is the specific difference from the yarn-standalone mode?

Answer

ben jarman picture ben jarman · May 12, 2015

So in spark you have two different components. There is the driver and the workers. In yarn-cluster mode the driver is running remotely on a data node and the workers are running on separate data nodes. In yarn-client mode the driver is on the machine that started the job and the workers are on the data nodes. In local mode the driver and workers are on the machine that started the job.

When you run .collect() the data from the worker nodes get pulled into the driver. It's basically where the final bit of processing happens.

For my self i have found yarn-cluster mode to be better when i'm at home on the vpn, but yarn-client mode is better when i'm running code from within the data center.

Yarn-client mode also means you tie up one less worker node for the driver.