Terminology
Let’s start by fixing some terminology.
Shared Library. We use the cross-platform term shared library to refer to, what is otherwise known as shared objects, dynamic object, Dynamic Shared Object(DSO) or on Windows Dynamic Load Library(DLL).
Binary. An executable or a shared library.
Symbol. Function / global variables.
Linux. I will loosely use the term linux for all UNIX descendants.
Nano-introduction to Linking
We start with source files. In the first phase, the compiler translates them into object files. An object file is nothing but a container for sections. Two canonical examples for sections are as .text which contains machine code and .data which contains program data.
In the second phase, the linker - the first thing it does is, it pulls together the identically named sections and concatenates them into one larger section. The second thing it does, is it reorders the sections. So, sections with similar required run-time permissions are adjacent on disk.
In the final stage, the loader takes these adjacent chunks called segments and maps them to memory on page-aligned boundaries. After this mapping, the loader adjusts the page permissions accordingly.
Now, code typically does lots of function calls. The simplest case is that of from the binary into itself. But, that is not the general case. In general, the process can map shared libraries into its address space and perform calls into functions which are implemented in the shared library. The shared library itself can perform calls into functions implemented in yet other shared libraries. Arguably, the most important job of both the linker and loader is to properly wire these calls. How would might a wiring look like?
Say, we have a .code section that calls the function foo(), which is implemented in another binary. The .code section itself does not contain the string foo, it contains a call 0x0000 instruction into an address yet unknown at link time. So, the .code section contains a placeholder - a string of 0s. In addition, the linker generates anothe section called .reloc for relocation. On Windows, these are sometimes called fixups. A .reloc section is essentially a small TODO item for the loader. The linker says, “Dear loader, please find the function foo, when you do overwrite this placeholder with the address you found.”. This relocation does happen in practice, but it is generally frowned upon, for two reasons. First, modifying the .code section makes the .code section unshareable between processes and second this form of relocation needs to be done once per call site, not once per function, which can amount to a large difference if you have tens of thousands of calls into the same function e.g. a game engine with calls to the math function sqrt().
The more typical scheme is to route all calls to the same function through an indirect call into a single placeholder. In this scheme, when the loader reads the .reloc section, it knows it needs to overwrite just a single slot with the function foo()’s address and not all the call sites. This design trades a little bit of run-time performance for a lot of load-time savings. There is an entire section of such placeholders. On win64, it is called the IAT (Import Address Table). In Linux, this is a rough approximation for a section known as the GOT(Global Offset Table). Note already, that this design means that cross-binary calls are indirect. They carry the same overhead as virtual functions do. And this is not the full truth, it is just the first step in our journey towards it.
Let us take a closer look at Windows first. A win32/win64 binary