I am having a problem about a wrong symbol resolution. My main program loads a shared library with dlopen and a symbol from it with dlsym. Both the program and the library are written in C. Library code
int a(int b)
{
return b+1;
}
int c(int d)
{
return a(d)+1;
}
In order to make it work on a 64-bit machine, -fPIC is passed to gcc when compiling.
The program is:
#include <dlfcn.h>
#include <stdio.h>
int (*a)(int b);
int (*c)(int d);
int main()
{
void* lib=dlopen("./libtest.so",RTLD_LAZY);
a=dlsym(lib,"a");
c=dlsym(lib,"c");
int d = c(6);
int b = a(5);
printf("b is %d d is %d\n",b,d);
return 0;
}
Everything runs fine if the program is NOT compiled with -fPIC, but it crashes with a segmentation fault when the program is compiled with -fPIC. Investigation led to discover that the crash is due to the wrong resolution of symbol a. The crash occurs when a is called, no matter whether from the library or the main program (the latter is obtained by commenting out the line calling c() in the main program).
No problems occur when calling c() itself, probably because c() is not called internally by the library itself, while a() is both a function used internally by the library and an API function of the library.
A simple workaround is not use -fPIC when compiling the program. But this is not always possible, for example when the code of the main program has to be in a shared library itself. Another workaround is to rename the pointer to function a to something else. But I cannot find any real solution.
Replacing RTLD_LAZY with RTLD_NOW does not help.
I suspect that there is a clash between two global symbols. One solution is to declare a
in the main program as static. Alternatively, the linux manpage mentions RTLD_DEEPBIND
flag, a linux-only extension, which you can pass to dlopen
and which will cause library to prefer its own symbols over global symbols.