Featured image of post How a C\C++ Linker Works

How a C\C++ Linker Works

Ah, the linker!

That mysterious entity that lurks in the build process, waiting for the perfect moment to throw a hundred cryptic error messages your way.

If you’ve ever stared at an “undefined reference” error and questioned your life choices, congratulations—you’ve met the linker.

But why does the linker exist?

What arcane magic does it perform? And why does it seem to enjoy tormenting developers?


Why Was the Linker Created?

Back in the early days of programming, people wrote simple programs, compiled them, and ran them. Life was good.

Then, programs got bigger.

Instead of writing everything in one giant file, developers started splitting code into multiple files. This made things more manageable, but it introduced a new problem:

how do you combine these separate files into a working program?

Enter the linker.

The linker was created to take compiled object files (.o or .obj files), find all the function and variable references, and stitch everything together into a final executable.

Without it, you’d have to manually copy-paste all your code into one massive file, which would make debugging even more of a nightmare than it already was.

Evolution of Linkers

  • 1970s-1980s: Early linkers were simple. They just took object files and smooshed them together. No fancy optimizations, no dynamic linking—just good old brute force.

  • 1990s: With the rise of shared libraries (.so, .dll), linkers became more sophisticated. They had to deal with dynamic linking, symbol resolution, and version compatibility.

  • 2000s-Present: Modern linkers like GNU ld, LLVM lld, and MSVC’s linker have gotten ridiculously complex. They handle static and dynamic libraries, link-time optimizations, position-independent code, and a million other things that ensure your builds will break in new and exciting ways.



The Compilation Process

The process of converting human-readable source code into an executable binary happens in multiple stages.

These stages are broadly categorized into:

  1. Preprocessing
  2. Compilation
  3. Assembly
  4. Linking

Step 1: Preprocessing (.c / .cpp.i)

The preprocessor handles preprocessor directives (like #include, #define, and #ifdef) before the actual compilation begins. This step:

  • Expands macros
  • Includes header files
  • Evaluates #if conditions

Example:

1
2
3
4
5
6
#include <stdio.h>
#define PI 3.14
int main() {
    printf("PI = %f", PI);
    return 0;
}

After preprocessing (gcc -E file.c -o file.i), the code expands to:

1
2
3
4
int main() {
    printf("PI = %f", 3.14);
    return 0;
}

Step 2: Compilation (.i.s)

The compiler translates the preprocessed source into assembly code specific to your CPU architecture. This step involves:

  • Syntax checking
  • Type checking
  • Converting C/C++ code into lower-level intermediate representation (IR)
  • Optimizing code
  • Generating assembly output

Example (gcc -S file.i -o file.s):

1
2
3
4
5
6
7
main:
    push    rbp
    mov     rdi, format
    mov     esi, 3.14
    call    printf
    pop     rbp
    ret

Step 3: Assembly (.s.o)

The assembler takes the assembly output and converts it into machine code, producing an object file (.o or .obj).

Command: gcc -c file.s -o file.o

Step 4: Linking (.o → Executable)

The linker combines object files and libraries into a final executable. This step involves:

  • Resolving symbols
  • Address allocation
  • Merging multiple .o files

The Linking Process

Linking can be static (everything included in the binary) or dynamic (uses shared libraries like .so or .dll).

Symbol Resolution

Each .o file contains symbols (functions, variables). The linker matches undefined symbols with their definitions.

Example:

main.o has:

1
2
extern printf
call printf

libc.a has:

1
2
printf:
    ...

The linker resolves printf by linking it to the actual function.


Differences Between C and C++ Compilation

While C and C++ follow the same basic compilation steps, C++ adds complexity:

  1. Name Mangling: C++ allows function overloading, requiring name transformations to differentiate functions.
  2. Object Code Differences: C++ includes classes, virtual tables, and exceptions that C does not have.
  3. Stronger Type Checking: C++ enforces stricter type checking than C.

Name Mangling Example

Consider:

1
2
3
// C++
void func(int);
void func(double);

The compiler mangles them into:

1
2
_Z4funci   ; func(int)
_Z4funcd   ; func(double)

Without mangling, both functions would collide in the symbol table!

How to Disable Name Mangling

Use extern "C":

1
extern "C" void myFunction();

This tells the compiler to use C-style naming, avoiding mangling.


Key Ideas

ConceptSummary
PreprocessingExpands macros, includes headers
CompilationConverts code to assembly
AssemblyTranslates assembly to machine code
LinkingResolves symbols, merges .o files
Name ManglingC++ changes function names for uniqueness
Static LinkingIncludes everything in the final binary
Dynamic LinkingUses external .so / .dll files

Basically…

Understanding how all this works is critical for debugging and optimization.

While C and C++ share similarities, C++’s name mangling and object-oriented features make its compilation more complex.

And! more likely for a newbie to be staring a 98 pages of linker errors wondering how to fix them :)


Tricky Linker Errors and How to Survive Them

Linker errors are like riddles from an evil wizard. Here are some of the most common ones and why they happen.

1. Undefined Reference to someFunction()

Why It Happens

  • You declared a function but forgot to define it.
  • You forgot to link against the correct library.
  • You’re compiling a C++ file but forgot to use extern "C" when linking with C code.

How to Fix It

  • Make sure the function is actually defined somewhere.
  • Check that you’re linking against the correct .lib, .a, or .so file.
  • If mixing C and C++, wrap C function declarations in extern "C" {}.

2. Multiple Definition of someFunction()

Why It Happens

  • You accidentally defined the same function in multiple .cpp files.
  • A header file contains a function definition instead of just a declaration.

How to Fix It

  • Use #pragma once or include guards (#ifndef HEADER_H … #define HEADER_H … #endif).
  • Move function definitions out of header files and into .cpp files.

3. Cannot Open File someLibrary.lib

Why It Happens

  • You’re trying to link against a static library that doesn’t exist.
  • The library path isn’t set correctly.

How to Fix It

  • Double-check that the library file exists in the correct directory.
  • Update your linker settings to include the correct library path.

Bad things That Can Happen..

As compilers and operating systems have evolved, so have the ways in which we compile and link code. This has made linkers more complex, leading to some truly bizarre build failures.

1. Debug vs. Release Builds

Release builds are optimized, debug builds aren’t. If you mix them up—like linking a debug .lib to a release executable—you’re going to have a bad time.

2. DLL Hell

If you’ve ever had to troubleshoot Windows DLL linking issues, you know why it’s called “DLL Hell.”

Version mismatches, missing symbols, and incorrect calling conventions can make your life miserable.

3. Static vs. Dynamic Linking Gone Wrong

Accidentally linking against the static version of a library when you meant to use the dynamic one?

Or vice versa?

Congratulations, you’ve just won a ticket to undefined behavior land.

4. The One Setting That Breaks Everything

One wrong setting in your build system can trigger hundreds of linker errors.

Wrong architecture?

Wrong calling convention?

Wrong runtime library?

Fun Times!


Key Ideas

ConceptExplanation
What does a linker do?It combines compiled object files into a final executable.
Why do linker errors happen?Missing definitions, incorrect linking settings, and library mismatches.
Common linker errorsUndefined references, multiple definitions, missing libraries.
Debug vs. Release buildsMixing them can lead to linker nightmares.
Static vs. Dynamic linkingGetting this wrong can break everything.

References