Why does Spark job fail with "Exit code: 52"

Question 1

Why does Spark job fail with "Exit code: 52"

apache-spark yarn spark-dataframe

Virgil · Feb 17, 2016 · Viewed 15.3k times · Source

Answer

Answer

The exit code 52 comes from org.apache.spark.util.SparkExitCode, and it is val OOM=52 - i.e. an OutOfMemoryError. Which makes sense since I also find this in the container logs:

16/02/16 17:09:59 ERROR executor.Executor: Managed memory leak detected; size = 4823704883 bytes, TID = 3226
16/02/16 17:09:59 ERROR executor.Executor: Exception in task 26.0 in stage 2.0 (TID 3226)
java.lang.OutOfMemoryError: Unable to acquire 1248 bytes of memory, got 0
        at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120)
        at org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:354)
        at org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:375)
        at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237)
        at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

(note that I'm not really sure at this point if the problem is in my code or due to the Tungsten memory leaks, but that's a different issue)

Question 2

I have had Spark job failing with a trace like this one:

./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-Container id: container_1455622885057_0016_01_000008
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-Exit code: 52
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr:Stack trace: ExitCodeException exitCode=52: 
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.apache.hadoop.util.Shell.run(Shell.java:456)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at java.lang.Thread.run(Thread.java:745)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-Container exited with a non-zero exit code 52

It took me a while to figure out what "exit code 52" means, so I'm putting this up here for the benefit of others who might be searching

Why does Spark job fail with "Exit code: 52"

Answer

Related questions