Dynamic loading and weak symbol resolution

MvG picture MvG · Dec 18, 2013 · Viewed 7.9k times · Source

Analyzing this question I found out some things about behavior of weak symbol resolution in the context of dynamic loading (dlopen) on Linux. Now I'm looking for the specifications governing this.

Let's take an example. Suppose there is a program a which dynamically loads libraries b.so and c.so, in that order. If c.so depends on two other libraries foo.so (actually libgcc.so in that example) and bar.so (actually libpthread.so), then usually symbols exported by bar.so can be used to satisfy weak symbol linkages in foo.so. But if b.so also depends on foo.so but not on bar.so, then these weak symbols will apparently not be linked against bar.so. It seems as if foo.so inkages only look for symbols from a and b.so and all their dependencies.

This makes sense, to some degree, since otherwise loading c.so might change the behavior of foo.so at some point where b.so has already been using the library. On the other hand, in the question that got me started this caused quite a bit of trouble, so I wonder whether there is a way around this problem. And in order to find ways around, I first need a good understanding about the very exact details how symbol resolution in these cases is specified.

What is the specification or other technical document to define correct behavior in these scenarios?

Answer

Praxeolitic picture Praxeolitic · Sep 25, 2014

Unfortunately, the authoritative documentation is the source code. Most distributions of Linux use glibc or its fork, eglibc. In the source code for both, the file that should document dlopen() reads as follows:

manual/libdl.texi

@c FIXME these are undocumented:
@c dladdr
@c dladdr1
@c dlclose
@c dlerror
@c dlinfo
@c dlmopen
@c dlopen
@c dlsym
@c dlvsym

What technical specification there is can be drawn from the ELF specification and the POSIX standard. The ELF specification is what makes a weak symbol meaningful. POSIX is the actual specification for dlopen() itself.

This is what I find to be the most relevant portion of the ELF specification.

When the link editor searches archive libraries, it extracts archive members that contain definitions of undefined global symbols. The member’s definition may be either a global or a weak symbol.

The ELF specification makes no reference to dynamic loading so the rest of this paragraph is my own interpretation. The reason I find the above relevant is that resolving symbols occurs at a single "when". In the example you give, when program a dynamically loads b.so, the dynamic loader attempts to resolve undefined symbols. It may end up doing so with either global or weak symbols. When the program then dynamically loads c.so, the dynamic loader again attempts to resolve undefined symbols. In the scenario you describe, symbols in b.so were resolved with weak symbols. Once resolved, those symbols are no longer undefined. It doesn't matter if global or weak symbols were used to defined them. They're already no longer undefined by the time c.so is loaded.

The ELF specification gives no precise definition of what a link editor is or when the link editor must combine object files. Presumably it's a non-issue because the document has dynamic-linking in mind.

POSIX describes some of the dlopen() functionality but leaves much up to the implementation, including the substance of your question. POSIX makes no reference to the ELF format or weak symbols in general. For systems implementing dlopen() there need not even be any notion of weak symbols.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlopen.html

POSIX compliance is part of another standard, the Linux Standard Base. Linux distributions may or may not choose to follow these standards and may or may not go to the trouble of being certified. For example, I understand that a formal Unix certification by Open Group is quite expensive -- hence the abundance of "Unix-like" systems.

An interesting point about the standards compliance of dlopen() is made on the Wikipedia article for dynamic loading. dlopen(), as mandated by POSIX, returns a void*, but C, as mandated by ISO, says that a void* is a pointer to an object and such a pointer is not necessarily compatible with a function pointer.

The fact remains that any conversion between function and object pointers has to be regarded as an (inherently non-portable) implementation extension, and that no "correct" way for a direct conversion exists, since in this regard the POSIX and ISO standards contradict each other.

The standards that do exist contradict and what standards documents there are may not be especially meaningful anyway. Here's Ulrich Drepper writing about his disdain for Open Group and their "specifications".

http://udrepper.livejournal.com/8511.html

Similar sentiment is expressed in the post linked by rodrigo.

The reason I've made this change is not really to be more conformant (it's nice but no reason since nobody complained about the old behaviour).

After looking into it, I believe the proper answer to the question as you've asked it is that there is no right or wrong behavior for dlopen() in this regard. Arguably, once a search has resolved a symbol it is no longer undefined and in subsequent searches the dynamic loader will not attempt to resolve the already defined symbol.

Finally, as you state in the comments, what you describe in the original post is not correct. Dynamically loaded shared libraries can be used to resolve undefined symbols in previously dynamically loaded shared libraries. In fact, this isn't limited to undefined symbols in dynamically loaded code. Here is an example in which the executable itself has an undefined symbol that is resolved through dynamic loading.

main.c

#include <dlfcn.h>

void say_hi(void);

int main(void) {
    void* symbols_b = dlopen("./dyload.so", RTLD_NOW | RTLD_GLOBAL);
    /* uh-oh, forgot to define this function */
    /* better remember to define it in dyload.so */
    say_hi();
    return 0;
}

dyload.c

#include <stdio.h>
void say_hi(void) {
    puts("dyload.so: hi");
}

Compile and run.

gcc-4.8 main -fpic -ldl -Wl,--unresolved-symbols=ignore-all -o main
gcc-4.8 dyload.c -shared -fpic -o dyload.so
$ ./main
dyload.so: hi

Note that the main executable itself was compiled as PIC.