Dynamically Linked Libraries: DLLs
CS 301 Lecture, Dr. Lawlor
So pretty much every program ever written in C or C++ calls a standard
library function like printf or std::cout. But with ordinary
"static"
linking, the machine code for every library function called by a
program has to be included right inside the program's executable
file. So if you've got 10,000 programs on
your hard drive, with static linking you'd have 10,000 copies of printf
and std::cout!
That's silly, and it's something dynamic linking
can prevent--with dynamic linking, there's exactly one copy of the
standard library, and all your programs just *point* to it instead of
including the machine code directly.
Seeing Dynamic Libraries In Use
In UNIX, the program "ldd" will show you all the dynamically linked libraries that a program uses:
olawlor@dellawlor:~/class/cs301/lecture/dll/simple$ ldd ./calls
This is not really a library, but part of the new faster syscall interface:
linux-gate.so.1 => (0xffffe000)
This is a local library in the current directory:
foo.so => ./foo.so (0xb7ee8000)
These are the C++ "iostream" and other library functions:
libstdc++.so.6 => /usr/local/lib/libstdc++.so.6 (0xb7de4000)
This is the C Math library, where functions like sin() live:
libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb7dc2000)
This is a gcc-specific library, for stuff like "long long" divide:
libgcc_s.so.1 => /usr/local/lib/libgcc_s.so.1 (0xb7db7000)
This is the standard C library, for stuff like malloc() and printf()
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7c88000)
This is the dynamic linker library
/lib/ld-linux.so.2 (0xb7eed000)
There's a Windows program called Depends that does the same thing for Windows executables.
Building a Dynamically Linked Library
A dynamically linked library is just a set of object files (.o or .obj)
just like an ordinary statically linked library. You've got to
compile the object files with a few extra flags, though:
On Linux or other UNIX systems, build the .o file with:
g++ foo.cpp -c -fPIC
The "-fPIC" is for "Position Independent Code", and it sets up the
"Global Offset Table" stuff described below in "The Horror of UNIX
Shared Libraries".
|
On Windows, build the .obj files with:
cl /TP /GR /EHsc /MD /c foo.cpp
The "/MD" links your code with the multithreaded C standard library; if
you forget this, you'll get a bizarre inexplicable crash inside "fseek"
in your library. You'll also have to add DLL export and import
statements to your header as described below in the "The Horror of
Windows DLLs".
|
Once you've built a bunch of object files, you can link them into a dynamically linked library like so:
On Linux, build the .so file with:
g++ -shared foo.o -o foo.so
On Mac OS X, build the .so file with:
g++ -dynamiclib foo.o -o foo.so
I
highly recommend also using the hideous flag "-Wl,-rpath=." to make
Linux look in the current directory for shared libraries.
Otherwise users will need to set their LD_LIBRARY_PATH environment variable before they can run a program using your library.
|
On Windows, build the .dll and .lib files with:
cl /LD /link /TP /GR /EHsc /MD foo.obj -ofoo.dll
"/LD" means make a shared library.
Windows is a lot smarter about searching for shared libraries--it'll look for libraries in:
- C:\Windows\System32 (and other windows dirs)
- The exe's directory (install directory)
- The directory you're running the program in
|
Compile-Time Linking with a Dynamically Linked Library
Once you've build a .so or .dll file, you link to it by just listing it on the final command line:
On UNIX-like systems:
g++ main.cpp foo.so -o main The
resulting executable will look for "foo.so" when run. "foo.so"
must either be copied to one of the system /lib directories, or you
need to "export LD_LIBRARY_PATH=.", or you need to use rpath when
compiling "foo.so". UNIX is pretty silly about refusing to admit
that foo.so is sitting right next to the executable that needs it.
|
On Windows:
cl /TP /GR /EHsc /MD main.cpp foo.lib -omain.exe Note
that you link with the ".lib" file, but at runtime main.exe will
require "foo.dll". The DLL can be in the same directory as main,
or in C:\Windows\System32, and it will just work.
You can also put a line like this into main.cpp (or foo.h!), and then
you don't even need to list foo.lib on the command line or in the IDE:
#pragma comment (lib,"foo.lib")
I *really* recommend using #pragma comment whenever you depend on a Windows library. It saves many people a lot of trouble!
|
Note that we don't have to do anything special to access the routines
inside foo.cpp; main.cpp can just call the routines, and the linker
find the appropriate library, and figures out how to ask for
them. If the library is missing, the dynamic linker won't let
your program start ("The system cannot find the file foo.dll, which the
program main.exe requires to run.").
Explicitly Loading a Dynamically Linked Library
Sometimes, as when writing a web browser, you want to be able to load
up a new piece of code (a browser plugin that reads Adobe PDF files,
for example), but you want life to go on even if the piece of code
doesn't exist. That is, you want to explicitly ask for the
library to be loaded, and get the stuff you need from it
yourself. This is actually pretty easy; here's a UNIX example of how to grab a function out of a library. On UNIX, the functions are dlopen, dlsym, and dlclose (in <dlfcn.h> and -ldl). On Windows, the corresponding functions are LoadLibrary, GetProcAddress, and FreeLibrary (in <windows.h> and kernel32.lib), but beware of the Horror of Windows declspec (below).
The Horror of UNIX Shared Libraries: GOT
So at runtime, when loading up a new dynamic library the dynamic linker
has to set up pointers from inside the shared library to outside
functions and globals. UNIX uses a very curious trick to organize
these pointers, called the "Global Offset Table", or "GOT".
Check out the disassembly if you compile this code with shared library support (-fPIC, or the "Shared" checkbox in NetRun):
int thingy;
int foo(void) {
return thingy+5;
}
(executable NetRun link)
The disassembly, with my comments in italics:
00000000 <foo>:
Standard compiler-generated "function prologue" to set up the stack frame:
0: 55 push ebp
1: 89 e5 mov ebp,esp
This weirdness just loads up the program counter (address of this code) into ecx:
3: e8 00 00 00 00 call 8 <foo+0x8>
8: 59 pop ecx
The GOT is attached at link time to this code; so it's at a fixed offset from here:
9: 81 c1 03 00 00 00 add ecx,0x3
b: R_386_GOTPC _GLOBAL_OFFSET_TABLE_
The current address of each global is stored in the GOT--each function and global has a slot:
f: 8b 81 00 00 00 00 mov eax,DWORD PTR [ecx]
11: R_386_GOT32 thingy
Finally, we've loaded up eax with the address of "thingy", so we can start using it:
15: 8b 00 mov eax,DWORD PTR [eax]
17: 83 c0 05 add eax,0x5
1a: 5d pop ebp
1b: c3 ret
The GOT is hence:
- Just a big table of addresses of global variables and functions.
- The way "-fPIC" (shared library) code accesses globals and functions.
- Easy to set up by the dynamic linker, since it's just a single table.
Both "Weaves" and Charm++ actually change the GOT in a user-level threads package to give each thread its *own* set of globals!
The Horror of Windows DLLs: declspec
The horror of UNIX .so files is hidden inside the machine code.
Windows has never liked to hide horrible things, and Windows DLLs are
no exception to this.
Everything inside a DLL you want to make accessible from outside must be declared with:
__declspec(dllexport)
From outside the DLL, you must mark everything that will come from a DLL with:
__declspec(dllimport)
These declarations are really annoying, because usually you want to just write your function prototypes normally, like:
void foo(void);
This works with UNIX DLLs. It doesn't work with Windows
DLLs--anything without the appropriate dllspec results in a link error,
since the dynamic linker doesn't see it.
It wouldn't be that hard to change all your declarations of everything to include a little macro, like this:
SILLY_DLL_CRAP void foo(void);
But SILLY_DLL_CRAP has to expand to __declspec(dllexport) from inside
the DLL, and __declspec(dllimport) from outside the DLL. So it's
really common to see something like this in the headers of DLLs that
work on Windows:
#ifdef INSIDE_MY_DLL
# define SILLY_DLL_CRAP __declspec(dllexport)
#else
# define SILLY_DLL_CRAP __declspec(dllimport)
#endif
You've then got to be careful to:
- Always #define INSIDE_MY_DLL inside your DLL code (foo.cpp above).
- Always use SILLY_DLL_CRAP in front of the prototypes of everything in your DLL (all functions, globals, and even classes)
- SILLY_DLL_CRAP void someFunctionInsideAdll(void);
- class SILLY_DLL_CRAP someClassInsideAdll { ... };
- SILLY_DLL_CRAP extern int someGlobalInsideAdll;
This is by far the most annoying aspect of working with DLLs on
Windows. Of course, usually your macro is called something like
MY_DLL_ENTRY instead of SILLY_DLL_CRAP.