I'm new to spark. Now I can run spark 0.9.1 on yarn (2.0.0-cdh4.2.1). But there is no log after execution.
The following command is used to run a spark example. But logs are not found in the history server as in a normal MapReduce job.
SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.0.0-cdh4.2.1.jar \
./bin/spark-class org.apache.spark.deploy.yarn.Client --jar ./spark-example-1.0.0.jar \
--class SimpleApp --args yarn-standalone --num-workers 3 --master-memory 1g \
--worker-memory 1g --worker-cores 1
where can I find the logs/stderr/stdout?
Is there someplace to set the configuration? I did find an output from console saying:
14/04/14 18:51:52 INFO Client: Command for the ApplicationMaster: $JAVA_HOME/bin/java -server -Xmx640m -Djava.io.tmpdir=$PWD/tmp org.apache.spark.deploy.yarn.ApplicationMaster --class SimpleApp --jar ./spark-example-1.0.0.jar --args 'yarn-standalone' --worker-memory 1024 --worker-cores 1 --num-workers 3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
In this line, notice 1> $LOG_DIR/stdout 2> $LOG_DIR/stderr
Where can LOG_DIR be set?
You can access logs through the command
yarn logs -applicationId <application ID> [OPTIONS]
general options are:
appOwner <Application Owner>
- AppOwner (assumed to be current user if not specified)containerId <Container ID>
- ContainerId (must be specified if node
address is specified)nodeAddress <Node Address>
- NodeAddress in the format nodename:port
(must be specified if container id is specified)Examples:
yarn logs -applicationId application_1414530900704_0003
yarn logs -applicationId application_1414530900704_0003 myuserid
// the user ids are different
yarn logs -applicationId <appid> --appOwner <userid>