Where are logs in Spark on YARN?

DeepNightTwo picture DeepNightTwo · Apr 14, 2014 · Viewed 79.8k times · Source

I'm new to spark. Now I can run spark 0.9.1 on yarn (2.0.0-cdh4.2.1). But there is no log after execution.

The following command is used to run a spark example. But logs are not found in the history server as in a normal MapReduce job.

SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.0.0-cdh4.2.1.jar \
./bin/spark-class org.apache.spark.deploy.yarn.Client --jar ./spark-example-1.0.0.jar \
--class SimpleApp --args yarn-standalone  --num-workers 3 --master-memory 1g \
--worker-memory 1g --worker-cores 1

where can I find the logs/stderr/stdout?

Is there someplace to set the configuration? I did find an output from console saying:

14/04/14 18:51:52 INFO Client: Command for the ApplicationMaster: $JAVA_HOME/bin/java -server -Xmx640m -Djava.io.tmpdir=$PWD/tmp org.apache.spark.deploy.yarn.ApplicationMaster --class SimpleApp --jar ./spark-example-1.0.0.jar --args 'yarn-standalone' --worker-memory 1024 --worker-cores 1 --num-workers 3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr

In this line, notice 1> $LOG_DIR/stdout 2> $LOG_DIR/stderr

Where can LOG_DIR be set?

Answer

MARK picture MARK · Sep 28, 2014

You can access logs through the command

yarn logs -applicationId <application ID> [OPTIONS]

general options are:

  • appOwner <Application Owner> - AppOwner (assumed to be current user if not specified)
  • containerId <Container ID> - ContainerId (must be specified if node address is specified)
  • nodeAddress <Node Address> - NodeAddress in the format nodename:port (must be specified if container id is specified)

Examples:

yarn logs -applicationId application_1414530900704_0003                                      
yarn logs -applicationId application_1414530900704_0003 myuserid

// the user ids are different
yarn logs -applicationId <appid> --appOwner <userid>