GDB corrupted stack frame - How to debug?

Sangeeth Saravanaraj picture Sangeeth Saravanaraj · Mar 21, 2012 · Viewed 94.3k times · Source

I have the following stack trace. Is it possible to make out anything useful from this for debugging?

Program received signal SIGSEGV, Segmentation fault.
0x00000002 in ?? ()
(gdb) bt
#0  0x00000002 in ?? ()
#1  0x00000001 in ?? ()
#2  0xbffff284 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) 

Where to start looking at the code when we get a Segmentation fault, and the stack trace is not so useful?

NOTE: If I post the code, then the SO experts will give me the answer. I want to take the guidance from SO and find the answer myself, so I'm not posting the code here. Apologies.

Answer

Chris Dodd picture Chris Dodd · Mar 21, 2012

Those bogus adresses (0x00000002 and the like) are actually PC values, not SP values. Now, when you get this kind of SEGV, with a bogus (very small) PC address, 99% of the time it's due to calling through a bogus function pointer. Note that virtual calls in C++ are implemented via function pointers, so any problem with a virtual call can manifest in the same way.

An indirect call instruction just pushes the PC after the call onto the stack and then sets the PC to the target value (bogus in this case), so if this is what happened, you can easily undo it by manually popping the PC off the stack. In 32-bit x86 code you just do:

(gdb) set $pc = *(void **)$esp
(gdb) set $esp = $esp + 4

With 64-bit x86 code you need

(gdb) set $pc = *(void **)$rsp
(gdb) set $rsp = $rsp + 8

Then, you should be able to do a bt and figure out where the code really is.

The other 1% of the time, the error will be due to overwriting the stack, usually by overflowing an array stored on the stack. In this case, you might be able to get more clarity on the situation by using a tool like valgrind