I have had Spark job failing with a trace like this one:
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-Container id: container_1455622885057_0016_01_000008
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-Exit code: 52
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr:Stack trace: ExitCodeException exitCode=52:
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr- at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr- at org.apache.hadoop.util.Shell.run(Shell.java:456)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr- at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr- at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr- at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr- at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr- at java.util.concurrent.FutureTask.run(FutureTask.java:262)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr- at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr- at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr- at java.lang.Thread.run(Thread.java:745)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-Container exited with a non-zero exit code 52
It took me a while to figure out what "exit code 52" means, so I'm putting this up here for the benefit of others who might be searching
The exit code 52 comes from org.apache.spark.util.SparkExitCode, and it is val OOM=52
- i.e. an OutOfMemoryError. Which makes sense since I also find this in the container logs:
16/02/16 17:09:59 ERROR executor.Executor: Managed memory leak detected; size = 4823704883 bytes, TID = 3226
16/02/16 17:09:59 ERROR executor.Executor: Exception in task 26.0 in stage 2.0 (TID 3226)
java.lang.OutOfMemoryError: Unable to acquire 1248 bytes of memory, got 0
at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120)
at org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:354)
at org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:375)
at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237)
at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
(note that I'm not really sure at this point if the problem is in my code or due to the Tungsten memory leaks, but that's a different issue)