Dynamically Linked Libraries: DLLs

CS 301 Lecture, Dr. Lawlor

So pretty much every program ever written in C or C++ calls a standard library function like printf or std::cout.  But with ordinary "static" linking, the machine code for every library function called by a program has to be included right inside the program's executable file.  So if you've got 10,000 programs on your hard drive, with static linking you'd have 10,000 copies of printf and std::cout! 

That's silly, and it's something dynamic linking can prevent--with dynamic linking, there's exactly one copy of the standard library, and all your programs just *point* to it instead of including the machine code directly.

Seeing Dynamic Libraries In Use

In UNIX, the program "ldd" will show you all the dynamically linked libraries that a program uses:
olawlor@dellawlor:~/class/cs301/lecture/dll/simple$ ldd ./calls
This is not really a library, but part of the new faster syscall interface:
linux-gate.so.1 => (0xffffe000)
This is a local library in the current directory:
foo.so => ./foo.so (0xb7ee8000)
These are the C++ "iostream" and other library functions:
libstdc++.so.6 => /usr/local/lib/libstdc++.so.6 (0xb7de4000)
This is the C Math library, where functions like sin() live:
libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb7dc2000)
This is a gcc-specific library, for stuff like "long long" divide:
libgcc_s.so.1 => /usr/local/lib/libgcc_s.so.1 (0xb7db7000)
This is the standard C library, for stuff like malloc() and printf()
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7c88000)
This is the dynamic linker library
/lib/ld-linux.so.2 (0xb7eed000)
There's a Windows program called Depends that does the same thing for Windows executables.

Building a Dynamically Linked Library

A dynamically linked library is just a set of object files (.o or .obj) just like an ordinary statically linked library.  You've got to compile the object files with a few extra flags, though:
On Linux or other UNIX systems, build the .o file with:
  g++ foo.cpp -c -fPIC

The "-fPIC" is for "Position Independent Code", and it sets up the "Global Offset Table" stuff described below in "The Horror of UNIX Shared Libraries".
On Windows, build the .obj files with:
  cl /TP /GR /EHsc /MD /c foo.cpp

The "/MD" links your code with the multithreaded C standard library; if you forget this, you'll get a bizarre inexplicable crash inside "fseek" in your library.  You'll also have to add DLL export and import statements to your header as described below in the "The Horror of Windows DLLs".

Once you've built a bunch of object files, you can link them into a dynamically linked library like so:
On Linux, build the .so file with:
  g++ -shared foo.o -o foo.so
On Mac OS X, build the .so file with:
  g++ -dynamiclib foo.o -o foo.so
I highly recommend also using the hideous flag "-Wl,-rpath=." to make Linux look in the current directory for shared libraries.  Otherwise users will need to set their LD_LIBRARY_PATH environment variable before they can run a program using your library.
On Windows, build the .dll and .lib files with:
  cl /LD /link /TP /GR /EHsc /MD foo.obj -ofoo.dll
"/LD" means make a shared library.

Windows is a lot smarter about searching for shared libraries--it'll look for libraries in:
  • C:\Windows\System32 (and other windows dirs)
  • The exe's directory (install directory)
  • The directory you're running the program in

Compile-Time Linking with a Dynamically Linked Library

Once you've build a .so or .dll file, you link to it by just listing it on the final command line:
On UNIX-like systems:
  g++ main.cpp foo.so -o main
The resulting executable will look for "foo.so" when run.  "foo.so" must either be copied to one of the system /lib directories, or you need to "export LD_LIBRARY_PATH=.", or you need to use rpath when compiling "foo.so".  UNIX is pretty silly about refusing to admit that foo.so is sitting right next to the executable that needs it.
On Windows:
  cl /TP /GR /EHsc /MD main.cpp foo.lib -omain.exe
Note that you link with the ".lib" file, but at runtime main.exe will require "foo.dll".  The DLL can be in the same directory as main, or in C:\Windows\System32, and it will just work.

You can also put a line like this into main.cpp (or foo.h!), and then you don't even need to list foo.lib on the command line or in the IDE:
#pragma comment (lib,"foo.lib")
I *really* recommend using #pragma comment whenever you depend on a Windows library. It saves many people a lot of trouble!

Note that we don't have to do anything special to access the routines inside foo.cpp; main.cpp can just call the routines, and the linker find the appropriate library, and figures out how to ask for them.  If the library is missing, the dynamic linker won't let your program start ("The system cannot find the file foo.dll, which the program main.exe requires to run.").

Explicitly Loading a Dynamically Linked Library

Sometimes, as when writing a web browser, you want to be able to load up a new piece of code (a browser plugin that reads Adobe PDF files, for example), but you want life to go on even if the piece of code doesn't exist.  That is, you want to explicitly ask for the library to be loaded, and get the stuff you need from it yourself.  This is actually pretty easy; here's a UNIX example of how to grab a function out of a library.  On UNIX, the functions are dlopen, dlsym, and dlclose (in <dlfcn.h> and -ldl).  On Windows, the corresponding functions are LoadLibrary, GetProcAddress, and FreeLibrary (in <windows.h> and kernel32.lib), but beware of the Horror of Windows declspec (below).

The Horror of UNIX Shared Libraries: GOT

So at runtime, when loading up a new dynamic library the dynamic linker has to set up pointers from inside the shared library to outside functions and globals.  UNIX uses a very curious trick to organize these pointers, called the "Global Offset Table", or "GOT".

Check out the disassembly if you compile this code with shared library support (-fPIC, or the "Shared" checkbox in NetRun):
int thingy;

int foo(void) {
return thingy+5;
(executable NetRun link)

The disassembly, with my comments in italics:
00000000 <foo>:
Standard compiler-generated "function prologue" to set up the stack frame:
0: 55 push ebp
1: 89 e5 mov ebp,esp
This weirdness just loads up the program counter (address of this code) into ecx:
3: e8 00 00 00 00 call 8 <foo+0x8>
8: 59 pop ecx
The GOT is attached at link time to this code; so it's at a fixed offset from here:
9: 81 c1 03 00 00 00 add ecx,0x3
The current address of each global is stored in the GOT--each function and global has a slot:
f: 8b 81 00 00 00 00 mov eax,DWORD PTR [ecx]
11: R_386_GOT32 thingy
Finally, we've loaded up eax with the address of "thingy", so we can start using it:
15: 8b 00 mov eax,DWORD PTR [eax]
17: 83 c0 05 add eax,0x5
1a: 5d pop ebp
1b: c3 ret
The GOT is hence:
Both "Weaves" and Charm++ actually change the GOT in a user-level threads package to give each thread its *own* set of globals!

The Horror of Windows DLLs: declspec

The horror of UNIX .so files is hidden inside the machine code.  Windows has never liked to hide horrible things, and Windows DLLs are no exception to this.

Everything inside a DLL you want to make accessible from outside must be declared with:

From outside the DLL, you must mark everything that will come from a DLL with:

These declarations are really annoying, because usually you want to just write your function prototypes normally, like:
    void foo(void);

This works with UNIX DLLs.  It doesn't work with Windows DLLs--anything without the appropriate dllspec results in a link error, since the dynamic linker doesn't see it.

It wouldn't be that hard to change all your declarations of everything to include a little macro, like this:
    SILLY_DLL_CRAP void foo(void);

But SILLY_DLL_CRAP has to expand to __declspec(dllexport) from inside the DLL, and __declspec(dllimport) from outside the DLL.  So it's really common to see something like this in the headers of DLLs that work on Windows:

#  define SILLY_DLL_CRAP __declspec(dllexport)
#  define SILLY_DLL_CRAP __declspec(dllimport)

You've then got to be careful to:
This is by far the most annoying aspect of working with DLLs on Windows.  Of course, usually your macro is called something like MY_DLL_ENTRY instead of SILLY_DLL_CRAP.