Remote Post-mortem coredump analysis without having exact debug symbols for shared system libraries

user389238 picture user389238 · Dec 1, 2010 · Viewed 7.1k times · Source

How do you usually get around this problem? Imagine that a thread crashes inside libc code (which is a system shared library) on Computer1 and then generates a coredump. But the Computer2 on which this coredump will be analysed might have a different version of libc.

So:

  1. How important it is to have the same shared library on the remote computer? Will the gdb correctly reconstruct stacktrace without having exact same version of libc on Conputer2?

  2. How important it is to have correct debug symbols for libc? Will the gdb correctly reconstruct stacktrace without having exact same debug symbols on the Computer2?

  3. And what is the "correct" way to avoid this debug symbol mismatch problem for shared system libraries? For me it seems that there is no single solution that solves this problem in an elegant way? Maybe anyone can share his experience?

Answer

Employed Russian picture Employed Russian · Dec 3, 2010
  1. It depends. On some processors, such as x86_64, correct unwind descriptors are required for GDB to properly unwind the stack. On such machine, analyzing coredump with non-matching libc will likely produce complete garbage.

  2. You don't need debug symbols for libc to get the stack trace. You wouldn't get file and line numbers without debug symbols, but you should get correct function names (except when inlining has taken place).

  3. The premise of your question is wrong -- debug symbols have nothing to do with this. The "correct" way to analyze coredump on C2, when that coredump was produced on C1, is to have a copy of C1's libraries (in e.g. /tmp/C1/lib/...) and direct GDB to use that copy instead of the C2's installed libc with

    (gdb) set solib-absolute-prefix /tmp/C1

command.

Note: above setting must be in effect before you load the core into GDB. This:

gdb exe core
(gdb) set solib-absolute-prefix /tmp/C1

will not work (core is read before the setting is in effect).

Here is the right way:

gdb exe
(gdb) set solib-absolute-prefix /tmp/C1
(gdb) core core

(I've tried to find a reference to this on the web, but didn't).

What are unwind descriptors?

Unwind descriptors are required when code is compiled without frame pointers (default for x86_64 in optimized mode). Such code does not save %rbp register, and so GDB needs to be told how to "step back" from current frame to the caller frame (this process is also known as stack unwinding).

Why isn't C1's libc.so included in the core?

The core file usually contains only contents of writable segments of the program address space. The read-only segments (where executable code and unwind descriptors reside) is not usually necessary -- you could just read them directly from libc.so on disk.

Except this doesn't work when you analyze C1's core on C2!

Some (but not all) operating systems allow one to configure "full coredumps", where the OS will dump read-only mappings as well, precisely so you can analyze core on any machine.