Relation between object file and shared object file

ASHOK picture ASHOK · Jul 31, 2009 · Viewed 17.1k times · Source

what is the relation between shared object(.so) file and object(.o) file?

can you please explain via example?

Answer

quark picture quark · Jul 31, 2009

Let's say you have the following C source file, call it name.c

#include <stdio.h>
#include <stdlib.h>

void print_name(const char * name)
{
    printf("My name is %s\n", name);
}

When you compile it, with cc name.c you generate name.o. The .o contains the compiled code and data for all functions and variables defined in name.c, as well as index associated their names with the actual code. If you look at that index, say with the nm tool (available on Linux and many other Unixes) you'll notice two entries:

00000000 T print_name
         U printf

What this means: there are two symbols (names of functions or variables, but not names of classes, structs, or any types) stored in the .o. The first, marked with T actually contains its definition in name.o. The other, marked with U is merely a reference. The code for print_name can be found here, but the code for printf cannot. When your actual program runs it will need to find all the symbols that are references and look up their definitions in other object files in order to be linked together into a complete program or complete library. An object file is therefore the definitions found in the source file, converted to binary form, and available for placing into a full program.

You can link together .o files one by one, but you don't: there are generally a lot of them, and they are an implementation detail. You'd really prefer to have them all collected into bundles of related objects, with well recognized names. These bundles are called libraries and they come in two forms: static and dynamic.

A static library (in Unix) is almost always suffixed with .a (examples include libc.a which is the C core library, libm.a which is the C math library) and so on. Continuing the example you'd build your static library with ar rc libname.a name.o. If you run nm on libname.a you'll see this:

name.o:
00000000 T print_name
         U printf

As you can see it is primarily a big table of object files with an index finding all the names in it. Just like object files it contains both the symbols defined in every .o and the symbols referred to by them. If you were to link in another .o (e.g. date.o to print_date), you'd see another entry like the one above.

If you link in a static library into an executable it embeds the entire library into the executable. This is just like linking in all the individual .o files. As you can imagine this can make your program very large, especially if you are using (as most modern applications are) a lot of libraries.

A dynamic or shared library is suffixed with .so. It, like its static analogue, is a large table of object files, referring to all the code compiled. You'd build it with cc -shared libname.so name.o. Looking at with nm is quite a bit different than the static library though. On my system it contains about two dozen symbols only two of which are print_name and printf:

00001498 a _DYNAMIC
00001574 a _GLOBAL_OFFSET_TABLE_
         w _Jv_RegisterClasses
00001488 d __CTOR_END__
00001484 d __CTOR_LIST__
00001490 d __DTOR_END__
0000148c d __DTOR_LIST__
00000480 r __FRAME_END__
00001494 d __JCR_END__
00001494 d __JCR_LIST__
00001590 A __bss_start
         w __cxa_finalize@@GLIBC_2.1.3
00000420 t __do_global_ctors_aux
00000360 t __do_global_dtors_aux
00001588 d __dso_handle
         w __gmon_start__
000003f7 t __i686.get_pc_thunk.bx
00001590 A _edata
00001594 A _end
00000454 T _fini
000002f8 T _init
00001590 b completed.5843
000003c0 t frame_dummy
0000158c d p.5841
000003fc T print_name
         U printf@@GLIBC_2.0

A shared library differs from a static library in one very important way: it does not embed itself in your final executable. Instead the executable contains a reference to that shared library that is resolved, not at link time, but at run-time. This has a number of advantages:

  • Your executable is much smaller. It only contains the code you explicitly linked via the object files. The external libraries are references and their code does not go into the binary.
  • You can share (hence the name) one library's bits among multiple executables.
  • You can, if you are careful about binary compatibility, update the code in the library between runs of the program, and the program will pick up the new library without you needing to change it.

There are some disadvantages:

  • It takes time to link a program together. With shared libraries some of this time is deferred to every time the executable runs.
  • The process is more complex. All the additional symbols in the shared library are part of the infrastructure needed to make the library link up at run-time.
  • You run the risk of subtle incompatibilities between differing versions of the library. On Windows this is called "DLL hell".

(If you think about it many of these are the reasons programs use or do not use references and pointers instead of directly embedding objects of a class into other objects. The analogy is pretty direct.)

Ok, that's a lot of detail, and I've skipped a lot, such as how the linking process actually works. I hope you can follow it. If not ask for clarification.