Why does ld need -rpath-link when linking an executable against a so that needs another so?

Troels Folke picture Troels Folke · Jul 6, 2014 · Viewed 35.7k times · Source

I'm just curious here. I have created a shared object:

gcc -o liba.so -fPIC -shared liba.c

And one more shared object, that links against the former one:

gcc -o libb.so -fPIC -shared libb.c liba.so

Now, when creating an executable that links against libb.so, I will have to specify -rpath-link to ld so it can find liba.so when discovering that libb.so depends on it:

gcc -o test -Wl,-rpath-link,./ test.c libb.so

otherwise ld will complain.

Why is it, that ld MUST be able to locate liba.so when linking test? Because to me it doesn't seem like ld is doing much else than confirming liba.so's existence. For instance, running readelf --dynamic ./test only lists libb.so as needed, so I guess the dynamic linker must discover the libb.so -> liba.so dependency on its own, and make it's own search for liba.so.

I'm on an x86-64 GNU/Linux platform, and the main()-routine in test calls a function in libb.so that in turn calls a function in liba.so.

Answer

anton_rh picture anton_rh · Mar 2, 2016

Why is it, that ld MUST be able to locate liba.so when linking test? Because to me it doesn't seem like ld is doing much else than confirming liba.so's existence. For instance, running readelf --dynamic ./test only lists libb.so as needed, so I guess the dynamic linker must discover the libb.so -> liba.so dependency on its own, and make it's own search for liba.so.

Well if I understand linking process correctly, ld actually does not need to locate even libb.so. It could just ignore all unresolved references in test hoping that dynamic linker would resolve them when loading libb.so at runtime. But if ld were doing in this way, many "undefined reference" errors would not be detected at link time, instead they would be found when trying to load test in runtime. So ld just does additional checking that all symbols not found in test itself can be really found in shared libraries that test depend on. So if test program has "undefined reference" error (some variable or function not found in test itself and neither in libb.so), this becomes obvious at link time, not just at runtime. Thus such behavior is just an additional sanity check.

But ld goes even further. When you link test, ld also checks that all unresolved references in libb.so are found in the shared libraries that libb.so depends on (in our case libb.so depends on liba.so, so it requires liba.so to be located at link time). Well, actually ld has already done this checking, when it was linking libb.so. Why does it do this checking second time... Maybe developers of ld found this double checking useful to detect broken dependencies when you try to link your program against outdated library that could be loaded in the times when it was linked, but now it can't be loaded because the libraries it depends on are updated (for example, liba.so was later reworked and some of the function was removed from it).

UPD

Just did few experiments. It seems my assumption "actually ld has already done this checking, when it was linking libb.so" is wrong.

Let us suppose the liba.c has the following content:

int liba_func(int i)
{
    return i + 1;
}

and libb.c has the next:

int liba_func(int i);
int liba_nonexistent_func(int i);

int libb_func(int i)
{
    return liba_func(i + 1) + liba_nonexistent_func(i + 2);
}

and test.c

#include <stdio.h>

int libb_func(int i);

int main(int argc, char *argv[])
{
    fprintf(stdout, "%d\n", libb_func(argc));
    return 0;
}

When linking libb.so:

gcc -o libb.so -fPIC -shared libb.c liba.so

linker doesn't generate any error messages that liba_nonexistent_func cannot be resolved, instead it just silently generate broken shared library libb.so. The behavior is the same as you would make a static library (libb.a) with ar which doesn't resolve symbols of the generated library too.

But when you try to link test:

gcc -o test -Wl,-rpath-link=./ test.c libb.so

you get the error:

libb.so: undefined reference to `liba_nonexistent_func'
collect2: ld returned 1 exit status

Detecting such error would not be possible if ld didn't scan recursively all the shared libraries. So it seems that the answer to the question is the same as I told above: ld needs -rpath-link in order to make sure that the linked executable can be loaded later by dynamic loaded. Just a sanity check.

UPD2

It would make sense to check for unresolved references as early as possible (when linking libb.so), but ld for some reasons doesn't do this. It's probably for allowing to make cyclic dependencies for shared libraries.

liba.c can have the following implementation:

int libb_func(int i);

int liba_func(int i)
{
    int (*func_ptr)(int) = libb_func;
    return i + (int)func_ptr;
}

So liba.so uses libb.so and libb.so uses liba.so (better never do such a thing). This successfully compiles and works:

$ gcc -o liba.so -fPIC -shared liba.c
$ gcc -o libb.so -fPIC -shared libb.c liba.so
$ gcc -o test test.c -Wl,-rpath=./ libb.so
$ ./test
-1217026998

Though readelf says that liba.so doesn't need libb.so:

$ readelf -d liba.so | grep NEEDED
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
$ readelf -d libb.so | grep NEEDED
 0x00000001 (NEEDED)                     Shared library: [liba.so]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]

If ld checked for unresolved symbols during the linking of a shared library, the linking of liba.so would not be possible.

Note that I used -rpath key instead of -rpath-link. The difference is that -rpath-link is used at linking time only for checking that all symbols in the final executable can be resolved, whereas -rpath actually embeds the path you specify as parameter into the ELF:

$ readelf -d test | grep RPATH
 0x0000000f (RPATH)                      Library rpath: [./]

So it's now possible to run test if the shared libraries (liba.so and libb.so) are located at your current working directory (./). If you just used -rpath-link there would be no such entry in test ELF, and you would have to add the path to the shared libraries to the /etc/ld.so.conf file or to the LD_LIBRARY_PATH environment variable.

UPD3

It is actually possible to check for unresolved symbols during linking shared library, --no-undefined option must be used for doing that:

$ gcc -Wl,--no-undefined -o libb.so -fPIC -shared libb.c liba.so
/tmp/cc1D6uiS.o: In function `libb_func':
libb.c:(.text+0x2d): undefined reference to `liba_nonexistent_func'
collect2: ld returned 1 exit status

Also I found a good article that clarifies many aspects of linking shared libraries that depend on other shared libraries: Better understanding Linux secondary dependencies solving with examples.