Linker Basics

CS 301 Lecture, Dr. Lawlor

So you've got some source code.  You want to make an executable.  How does this happen?

1. Build object files

Step 1 is to compile the source code.  If you've got lots of different source files, you want to build them each into not an entire program, but a little piece called an "object file".   Object files consist of compiled machine code, but with special hooks to allow it to be combined with other object files into a single executable.

For C++:
On Linux, MacOS X, or other UNIX-like systems, object files normally have extension ".o", and you make them with the "-c" flag.  So this will create "foo.o" from "foo.cpp":
	g++ foo.cpp -c
On Windows, object files have extension ".obj", and you make them with the "/c" flag.  You compile C++ with the "cl" program, and the "/TP /GR /EHsc" flags (that's C++, with dynamic_cast, and throw respectively). So this command creates "foo.obj" from "foo.cpp":
	cl /TP /GR /EHsc foo.cpp /c
(also see the Compiler Flags listed below)

For C:
On UNIX, use gcc to compile plain C (not C++).
	gcc foo.c -c
On Windows, use the "/TC" flag to compile C.
	cl /TC foo.c /c

For YASM assembly:
On UNIX, use "elf32" format (or elf64 on a 64-bit OS)
	yasm -f elf32 foo.S -o foo.o 
On Windows, use "win32" format (or win64 on a 64-bit OS)
	yasm -f win32 foo.S -o foo.obj

2: Link object files into libraries

Step 2 is optional.  If you've got a zillion object files that are related, you can put them together into a "library".  For now, we'll only look at statically linked libraries, not dynamically-linked libraries (DLLs).

On Linux static library files have extension ".a", and you make them with the "ar cr" tool.  So this will create "foo.a" from "foo.o" and "bar.o":
	ar cr foo.a foo.o

Some machines, like MacOS X, require you to run "ranlib foo.a" after this.  Also, "foo.a" remembers several strange things like the order you added the files, and "ar cr" won't ever *remove* .o files from your .a; so it's a good idea to remove your .a's before running "ar cr"...
On Windows, static library files have extension ".lib", and you make them with the "link /lib" tool.  So this creates "foo.obj" from "foo.c":
	link /lib /out:foo.lib foo.obj

3: Build executable

Step 3 is to combine all your object files and libraries into a single executable.  This step is called "linking".

On Linux executables have no filename extension.  You specify the executable name with the "-o" flag.  You can also build executables yourself with "ld", but it's trickier, especially for C++.  You MUST list all the needed libraries on the command line.
g++ -o bar bar.o foo.a
On Windows, executables are named ".exe".    You can either list the libraries you need on the command line, or else
cl /o bar.exe bar.obj foo.lib

Often, you don't call these programs yourself.  Instead, you let the IDE (e.g., MS Visual C++) call them for you.  Or you write a "Makefile" and let "make" call the programs needed.

Compiler/Linker Flags

There are a bunch of "flags" that you can pass to the compiler and linker to make various stuff happen.  Most of these are useful only once in a while, but when needed, they're really useful!

Name
Examples
For
Does
Needed when
-c

Compiler
Compile only, don't link.  Makes an object file (.o or .obj) from a source code file.
Compiling big programs with lots of pieces because you can leave most code compiled as .obj files.  Also useful prior to building libraries.
-Dmacro=value
-DUserID=17
-DMAXCRAP=99
Compiler
Sets a macro (just like #define) from the compiler command line.
Setting up configuration values, paths, etc.  Another alternative is to write a "config.h" file somewhere that sets the same macros; "config.h" can make the compiler command lines a lot more intelligible!
-Ipath
-Ilibfoo/include
-I.
Compiler Adds a new directory to the "include path"; the list of places the compiler looks for #included files.
Compiling code that uses header files in some other directory.  (Subtle: #include "foo.h" works automatically if foo.h is in the current directory; but #include <foo.h> only works if you specify -I. to add the path to the header file.  Also consider using something like #include "libfoo/include/foo.h")
-Lpath
-Llibfoo/lib
-L.
Linker
Adds a new directory to the "library path"; the list of directories the compiler looks for libraries (.a or .lib files) inside.
Linking with almost any library other than the builtin system libraries.
-lname
-lfoo
Linker
Looks for a file named "libname.a" in all the known library directories.  UNIX-only. Linking almost any library on UNIX.

Stupid bugs in the linker

If you accidentally define the same subroutine name in two object files, the linker will complain about "multiply defined symbols".  This is good, because it lets you catch and fix your error.

If you accidentally define the same subroutine name in two library files, the linker takes the definition from the file listed first on the command line!  Any subsequent definitions of that subroutine are ignored; any subsequent uses of that subroutine find the first subroutine.  This is horrible, because it's unlikely that two subroutines named "doit" are interchangable just because the names are the same!

If you call a subroutine from another inside an object file, the linker will search everywhere for that subroutine.  If you call a subroutine from inside a library file, the linker only searches that library and subsequent libraries on the command line!   For example, "gcc my.o foo.a bar.a" errors out if bar.a requires anything from foo.a beyond what my.o uses.  This is stupid, because the linker is perfectly capable of searching foo.a again, it just doesn't want to.  If two libraries both depend on routines listed in each other, you may have to list them several times on the command line "gcc my.o foo.a bar.a foo.a".  That second foo.a picks up the things in foo that bar needs.

(These bugs are present in both the UNIX and Windows linkers.  Some code actually depends on these bugs in order to operate!)

The problem here is that writing a library name on the link line is just shorthand for a whole set of object files.  As it walks the list of libraries, the linker uses a simple pruning algorithm to decide which object files it can ignore--if nobody seen so far still needs a subroutine (or other symbol) listed in the object file, the object file is permanently ignored. 

Generally speaking, you've got to be very careful to manage dependencies between libraries, and careful with the order things are listed on the link line.

Guts of object/executable files:

There are lots of different things inside an object file or executable (see page 543 of the textbook for a complete list):
You can look at these things inside an object or executable file using the GNU/Linux tool objdump:
    objdump -hdrC foo.o
or on Windows using the Microsoft tool dumpbin:
    dumpbin /disasm foo.obj
(both objdump and dumpbin have zillions of additional parameters and options.)

When you're writing C, the compiler is smart enough to put everything into the right places.  But when you're writing assembly (especially when writing a standalone .S assembly source file), you often have to explicitly say:
section ".text"
before writing assembler instructions, or
section ".rodata"
before defining read-only strings or tables.