Linkers, Loaders and Shared Libraries

C++
Author

dev::author

Published

May 30, 2026

Terminology

Let’s start by fixing some terminology.

  • Shared Library. We use the cross-platform term shared library to refer to, what is otherwise known as shared objects, dynamic object, Dynamic Shared Object(DSO) or on Windows Dynamic Load Library(DLL).

  • Binary. An executable or a shared library.

  • Symbol. Function / global variables.

  • Linux. I will loosely use the term linux for all UNIX descendants.

Nano-introduction to Linking

We start with source files. In the first phase, the compiler translates them into object files. An object file is nothing but a container for sections. Two canonical examples for sections are as .text which contains machine code and .data which contains program data.

In the second phase, the linker - the first thing it does is, it pulls together the identically named sections and concatenates them into one larger section. The second thing it does, is it reorders the sections. So, sections with similar required run-time permissions are adjacent on disk.

In the final stage, the loader takes these adjacent chunks called segments and maps them to memory on page-aligned boundaries. After this mapping, the loader adjusts the page permissions accordingly.

Now, code typically does lots of function calls. The simplest case is that of from the binary into itself. But, that is not the general case. In general, the process can map shared libraries into its address space and perform calls into functions which are implemented in the shared library. The shared library itself can perform calls into functions implemented in yet other shared libraries. Arguably, the most important job of both the linker and loader is to properly wire these calls. How would might a wiring look like?

Say, we have a .code section that calls the function foo(), which is implemented in another binary. The .code section itself does not contain the string foo, it contains a call 0x0000 instruction into an address yet unknown at link time. So, the .code section contains a placeholder - a string of 0s. In addition, the linker generates another section called .reloc for relocation. On Windows, these are sometimes called fixups. A .reloc section is essentially a small TODO item for the loader. The linker says, “Dear loader, please find the function foo, when you do overwrite this placeholder with the address you found.”. This relocation does happen in practice, but it is generally frowned upon, for two reasons. First, modifying the .code section makes the .code section unshareable between processes and second this form of relocation needs to be done once per call site, not once per function, which can amount to a large difference if you have tens of thousands of calls into the same function e.g. a game engine with calls to the math function sqrt().

The more typical scheme is to route all calls to the same function through an indirect call into a single placeholder. In this scheme, when the loader reads the .reloc section, it knows it needs to overwrite just a single slot with the function foo()’s address and not all the call sites. This design trades a little bit of run-time performance for a lot of load-time savings. There is an entire section of such placeholders. On win64, it is called the IAT (Import Address Table). In Linux, this is a rough approximation for a section known as the GOT(Global Offset Table). Note already, that this design means that cross-binary calls are indirect. They carry the same overhead as virtual functions do. And this is not the full truth, it is just the first step in our journey towards it.

Windows Binary

Let us take a closer look at Windows first. A windows binary contains descriptions of all the imports it consumes in a section called .idata (the i stands for import). The most important data-structure in the .idata section is the directory table. The directory table is a table of entries - one per shared library. Such an entry contains the DLL name, an import lookup table (an offset) which contains imported symbol names for the loader’s usage, an offset into the import lookup table, which contains information about which symbols to locate within this library and an offset into an import address table (IAT), which tells the loader where to write the addresses of these located symbols once they are found.

Schematically, what Windows binary tells the world, tells the loader is, “Dear loader, please load the library lib1, and from it, please locate the symbols f1, f2 and f3. Then, load the library lib2 and from it, please locate the symbols g1, g2 and g3.”.

Windows Binary (.exe / .dll)
+------------------------------------------------------------------------------+
|  .idata section                                                              |
|  +--------------------------+                                                |
|  |     Directory table      |                                                |
|  |                          |                                                |
|  |  +--------------------+  |   ILT (symbol names)    IAT (addresses)        |
|  |  |  Entry: lib1.dll   |--+--> +--------------+    +--------------+        |
|  |  |  name / ILT / IAT  |  |    |  f1          |    |  <- addr(f1) |        |
|  |  +--------------------+  | +->|  f2          |    |  <- addr(f2) |        |
|  |           |              | |  |  f3          |    |  <- addr(f3) |        |
|  |           +--------------+-+  +--------------+    +--------------+        |
|  |                          |              ^                   ^             |
|  |  +--------------------+  |   ILT        |        IAT        |             |
|  |  |  Entry: lib2.dll   |--+--> +--------------+    +--------------+        |
|  |  |  name / ILT / IAT  |  |    |  g1          |    |  <- addr(g1) |        |
|  |  +--------------------+  | +->|  g2          |    |  <- addr(g2) |        |
|  |           |              | |  |  g3          |    |  <- addr(g3) |        |
|  |           +--------------+-+  +--------------+    +--------------+        |
|  +--------------------------+              |                   |             |
|                                            +---------+---------+             |
+------------------------------------------------------------------------------+

Linux import sections

As we shall see, this is not what happens in Linux. A linux binary contains not one but two sections, encoding almost identical information to that of idata in Windows. A .dynamic section contains a raw list of library names and the .dynsym section contains the more famous symbol table. It contains not just the symbols to be imported, but all the symbols that this binary contains. The ones to be imported are marked in the binary as undefined UND.

.dynamic section

readelf -d $(which ls) | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libcap.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

.dynsym section

readelf -W --syms $(which ls)

Symbol table .dynsym contains 132 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __ctype_toupper_loc@GLIBC_2.3 (2)
     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND getenv@GLIBC_2.2.5 (3)
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND cap_to_text
     4: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND sigprocmask@GLIBC_2.2.5 (3)
     5: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __snprintf_chk@GLIBC_2.3.4 (4)
     6: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND raise@GLIBC_2.2.5 (3)
     7: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND free@GLIBC_2.2.5 (3)
     8: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __vfprintf_chk@GLIBC_2.3.4 (4)
     9: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.34 (5)
    10: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __mempcpy_chk@GLIBC_2.3.4 (4)
    11: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND abort@GLIBC_2.2.5 (3)
    12: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __errno_location@GLIBC_2.2.5 (3)
    13: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strncmp@GLIBC_2.2.5 (3)
    14: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTable
    15: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND localtime_r@GLIBC_2.2.5 (3)
    16: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _exit@GLIBC_2.2.5 (3)
    17: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __fpending@GLIBC_2.2.5 (3)
    18: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND flistxattr@GLIBC_2.3 (2)
    19: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND isatty@GLIBC_2.2.5 (3)
    20: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND sigaction@GLIBC_2.2.5 (3)
    21: 0000000000000000     0 FUNC    GLOBAL DEFAU

To summarize, .dynamic and .dynsym are separate buckets of lib names and symbol names.

Schematically, a linux binary speaks to the loader in very different terms. It tells the loader - here are the libraries, I want you to map into the process. And here’s the bucket of symbols, I ask you locate anywhere within these libraries.

Linux Binary (ELF)
 +------------------------------------------------------------------------------+
|                                                                              |
|  .code section                                                               |
|  +------------------------------------------------------------------------+  |
|  |  .text   (executable instructions)                                     |  |
|  |  .rodata (read-only data, string literals, ...)                        |  |
|  |  PLT     (per-symbol call stubs -> resolved via GOT at runtime)        |  |
|  +------------------------------------------------------------------------+  |
|                                                                              |
|  .data section                                                               |
|  +------------------------------------------------------------------------+  |
|  |  .data   (initialised global/static variables)                         |  |
|  |  .bss    (uninitialised global/static variables)                       |  |
|  |  GOT     (per-symbol address slots, patched by loader at runtime)      |  |
|  +------------------------------------------------------------------------+  |
|                                                                              |
|  .dynamic section                                                            |
|  +------------------------------------------------------------------------+  |
|  |  NEEDED  libcap.so.2                                                   |  |
|  |  NEEDED  libc.so.6                                                     |  |
|  |  (library names only -- no symbol information)                         |  |
|  +------------------------------------------------------------------------+  |
|                                                                              |
|  .dynsym section                                                             |
|  +------------------------------------------------------------------------+  |
|  |  Ndx  Name                         Bind    Source                      |  |
|  |  ---  ---------------------------  ------  --------------------------  |  |
|  |  UND  __ctype_toupper_loc          GLOBAL  GLIBC_2.3                   |  |
|  |  UND  getenv                       GLOBAL  GLIBC_2.2.5                 |  |
|  |  UND  cap_to_text                  GLOBAL  libcap.so.2                 |  |
|  |  UND  sigprocmask                  GLOBAL  GLIBC_2.2.5                 |  |
|  |  UND  free                         GLOBAL  GLIBC_2.2.5                 |  |
|  |  UND  abort                        GLOBAL  GLIBC_2.2.5                 |  |
|  |   :   ...                                                              |  |
|  |   9   main                         GLOBAL  (defined in this binary)    |  |
|  |  10   some_exported_fn             GLOBAL  (defined in this binary)    |  |
|  |   :   ...                                                              |  |
|  +------------------------------------------------------------------------+  |
|                                                                              |
+------------------------------------------------------------------------------+

These seemingly benign differences in the architecture have far-reaching consequences. A symbol in Linux can be resolved from any of these libraries. The one that you intended it to be, or another. On the plus side, this might be an intentional behavior. It might have imported some third-party library and you want to override some implementation of the function with your own. This design enables it. This is an iceberg tip of much deeper design decisions.

Interposition

Interposition is the ability of overriding a symbol in one binary from another. This is a cornerstone of the design of linux execution model. It is a fundamental ABI design pillar.

You might come across alleged motivation for this. There are some claims online that say, the ELF pioneers designed it this way, so dynamic shared libraries would mimic the behavior of earlier static libs. Supposedly, in static libraries, if several libraries implement the same symbol, then the first definition encountered is used and it shadows the later ones. However, that is not entirely true. It might be the case, but you might also get a linker error complaining about ODR violations. The exact behavior depends upon which members are extracted from the static archives.

A different conjectured motivation is that the canonical model in the minds of the ELF architects was the library libc. libc is beyond ubiquitous, it is universal. It is loaded by practically all applications and it is huge. Therefore, it is reasonable to anticipate that users might want to override some of its implementations. Therefore, this capability was baked into the architecture. But this is guesswork. We have internal knowledge about the motivation.

So, we have discussed one facility that enables interposition - that is the separation of library names from list of symbols to consume. Let’s discuss another - library search order.

Library Search Order

The library search order in Linux is breadth-first search. What this means is, suppose we have an executable which loads the dynamic libraries lib1 and lib2 and these internally load the libraries lib3, lib4 and lib5. Furthermore say that, lib5 wants to use the symbol foo. The order in which the binaries would be searched by the loader for the symbol foo is the first the exe (the executable), then lib1, lib2, then lib3 and lib4 and only finally lib5, even if lib5 implements foo. All the other binaries get an opportunity to interpose foo before it is sought in lib5 itself.

                    ┌───────────────┐
                    │      exe      │
                    └───────┬───────┘
            ┌───────────────┴───────────────┐
            │                               │
    ┌───────┴───────┐               ┌───────┴───────┐
    │     lib1      │               │     lib2      │
    └───────┬───────┘               └───────┬───────┘
    ┌───────┴───────┐                       │
    │               │                       │
┌───┴───────┐ ┌─────┴─────┐         ┌───────┴───────┐
│   lib3    │ │   lib4    │         │     lib5      │
└───────────┘ └───────────┘         │  defines foo  │
                                    └───────────────┘

In particular, the executable is consulte before the current library. This is the default behavior, it can be adjusted. The most direct way is linking lib5 with the linker switch -Bsymbolic. -Bsymbolic tells the linker, that when resolving symbols in lib5, searchn in lib5 before the usual breadth first order.

LD_PRELOAD

LD_PRELOAD is an environment variable that, if it exists, and contains library names, they are loaded after the executable, but before any dependent libraries.

                    ┌───────────────┐
                    │      exe      │ 
                    └───────┬───────┘
                    ┌───────┴───────┐
                    │   LD_PRELOAD  │  
                    └───────┬───────┘
            ┌───────────────┴───────────────┐
            │                               │
    ┌───────┴───────┐               ┌───────┴───────┐
    │     lib1      │               │     lib2      │
    └───────┬───────┘               └───────┬───────┘
    ┌───────┴───────┐                       │
    │               │                       │
┌───┴───────┐ ┌─────┴─────┐         ┌───────┴───────┐
│   lib3    │ │   lib4    │         │     lib5      │
└───────────┘ └───────────┘         │  defines foo  │
                                    └───────────────┘

Can a shared-library symbol be overriden from an executable?

Can a shared-library symbol be overriden from an executable? In Windows, the answer is no. In Linux, the answer is yes, thanks to Interposition. This is exactly what Interposition does. In Mac, the answer is yes, but not by default.

C++ new operator

Here is a quote from the C++ standard.

  • operator new(std::size_t)
  • operator new(std::size_t, std::align_val_t)

The program’s definitions are used instead of the default versions supplied by the C++ standard library.

That is what happens on Linux, thanks to interposition. That is not what happens on Windows.

                    ┌─────────────────────┐
                    │          exe        │
                    | operator new(size_t)|
                    └───────────┬─────────┘
            ┌───────────────────┴────────────────────┐
            │                                        │
    ┌───────┴───────┐                   ┌────────────┴─────────┐
    │     glibc     │                   │       libc++         │
    |               |                   | operator new(size_t) |
    +---------------+                   +----------------------+

In fact, Windows can’t do that. Strictly speaking, Windows does not conform to this clause of the standard.

Symbol Resolution Time

Let’s discuss another mechanism that supports interposition. The default behavior is governed by the default switch --allow-shlib-undefined. If we have the below tree of dependencies and suppose lib5 wishes to import the symbol foo and the exe wishes to import the symbol bar. The resolution for the exe symbols is checked at link-time. The linker would refuse to link the executable, if it cannot resolve the symbol bar.

That is not true of the libraries. The linker would very happily agree to link lib5, even if it cannot see foo. As far as the linker is concerned, the implementation of foo might lie in the executable, where it has no chance of discovering it.

                    ┌───────────────┐
                    │      exe      │---(bar    Resolution checked 
                    └───────┬───────┘           at link time
                    ┌───────┴───────┐
                    │   LD_PRELOAD  │  
                    └───────┬───────┘
            ┌───────────────┴───────────────┐
            │                               │
    ┌───────┴───────┐               ┌───────┴───────┐
    │     lib1      │               │     lib2      │
    └───────┬───────┘               └───────┬───────┘
    ┌───────┴───────┐                       │
    │               │                       │
┌───┴───────┐ ┌─────┴─────┐         ┌───────┴───────┐
│   lib3    │ │   lib4    │         │     lib5      │---(foo  Resolution NOT
└───────────┘ └───────────┘         └───────────────┘        checked at link time

This behavior can be controlled with some switches. The easiest would be linking the executable with a --no-allow-shlib-undefined on the exe.

C++ Impliciation : How to form a process-wide singleton?

Let’s discuss a developer facing implication of this. Can we have a process-wide singleton? By singleton, I mean the Meyers singleton design pattern - a single object which has a unique instance which is usable by all the code in the process, from all binaries.

In Windows, the usual singleton design pattern would create a per-binary singleton.