Debug core file with no symbols

Morinar picture Morinar · Jun 26, 2009 · Viewed 30.4k times · Source

I have a C application we have deployed to a customers site. It was compiled and runs on HP-UX. The user has reported a crash and we have obtained a core dump. So far, I've been unable to duplicate the crash in house.

As you would suspect, the core file/deployed executable is completely devoid of any sort of symbols. When I load it up in gdb and do a bt, the best I get is this:

(gdb) bt
#0  0xc0199470 in ?? ()

I can do a 'strings core' on the file, but my understanding is that all I get there is all the strings in the executable, so it seems semi-impossible to track down anything there.

I do have a debug version (compiled with -g) of the executable, which is unfortunately a couple of months newer than the released version. If I try to start gdb with that hub, I see this:

warning: exec file is newer than core file.
Core was generated by `program_name'.
Program terminated with signal 11, Segmentation fault.
__dld_list is not valid according to __dld_flags.

#0  0xc0199470 in ?? ()
(gdb) bt
#0  0xc0199470 in ?? ()

While it would be feasible to compile a debug version and deploy it at the customer's site and then wait for another crash, it would be relatively difficult and undesirable for a number of reasons.

I am quite familiar with the code and have a relatively good idea of where in code it is crashing based on the customer's bug report.

Is there ANY way I can glean any more information from this core dump? Via strings or another debugger or anything? Thanks.

Answer

Sufian picture Sufian · Jun 26, 2009

This type of response from gdb:

(gdb) bt
#0  0xc0199470 in ?? ()

can also happen in the case that the stack was smashed by a buffer overrun, where the return address was overwritten in memory, so the program counter gets set to a seemingly random area.

This is one of the ways that even a build with a corresponding symbol database can cause a symbol lookup error (or strange looking backtraces). If you still get this after you have the symbol table, your problem is likely that your customer's data is causing some issues with your code.