valgrind can't find anything useful. I'm confused.
Symptomes:
PS: code does NOT segfault
Currently I have some progress via replacing all my malloc() via mmap()
+mprotect()
You might be overwriting the stack, or you might be overwriting the heap.
You can try adding the flag -fstack-protector-all
to your GCC command line options to ask for some stack-smashing reporting to be built into the program. This might cause it to fail sooner.
Another possibility is to look at the address reported in dmesg
output and see if you can't track down the function/memory that is being smashed:
[68303.941351] broken[13301]: segfault at 7f0061616161 ip 000000000040053d sp 00007fffd4ad3980 error 4 in broken[400000+1000]
readelf -s
will dump the symbol table, we can look for the function that is triggering the problem:
$ readelf -s broken | grep 4005
40: 00000000004005e0 0 FUNC LOCAL DEFAULT 13 __do_global_ctors_aux
47: 0000000000400540 2 FUNC GLOBAL DEFAULT 13 __libc_csu_fini
57: 0000000000400550 137 FUNC GLOBAL DEFAULT 13 __libc_csu_init
63: 0000000000400515 42 FUNC GLOBAL DEFAULT 13 main
The main
routine is the one executing when the bad pointer is used:
#include <string.h>
void f(const char *s) {
char buf[4];
strcpy(buf, s);
return;
}
int main(int argc, char* argv[]) {
f("aaaa");
f("aaaaaaaaaaaaaaaaaaaa");
return 0;
}
When main
tries to return to the C library to quit, it uses a bad pointer stored in the stack frame. So look at the functions called by main
, and (it's pretty easy in this trivial case) f
is obviously the bugger that scribbled all over the stack frame.
If you're overwriting the heap, then perhaps you could try electric fence. The downsides are pretty steep (vast memory use) but it might be just what you need to find the problem.