Memory Allocation Errors & History
CS 321 2007 Lecture, Dr. Lawlor
If you write bad code that overwrites beyond the end of memory, you will
eventually crash. Unfortunately, the crash is usually far from
where the error actually occurred, which makes it extremely frustrating
to track down. For example, this program writes beyond the end of
a buffer. But the write doesn't crash, and freeing the written
buffer doesn't crash--it only crashes when you try to free a totally
int *arr=(int *)malloc(1023*sizeof(int));
(executable NetRun link)
int *bystander=(int *)malloc(3*sizeof(int));
std::cout<<"About to access past end of buffer "<<arr<<"\n";
std::cout<<"Well, that worked. Deleting buffer "<<arr<<"\n";
std::cout<<"Also OK. Deleting totally unrelated buffer "<<bystander<<"\n";
This prints the following:
About to access past end of buffer 0x804c008
Buffer overflow errors are complicated because new and malloc are
complicated--they've got an intricate data structure surrounding your
allocated memory regions, and it's easy to mess up part of that data
structure by writing out-of-bounds.
Well, that worked. Deleting buffer 0x804c008
Also OK. Deleting totally unrelated buffer 0x804d008
Caught signal signal SIGSEGV-- segmentation violation (bad memory access) (#11)
This is one big reason why people like Java--the runtime system can make sure you never access out-of-bounds memory.
But you can actually get a lot of interesting bounds-checking by using
a memory-checking library, like Electric Fence. Electric Fence
uses mmap to put an un-accessible page after every new/malloc buffer,
which causes a program to instantly segfault if it reads or writes out
of bounds. Sadly, Electric Fence was abandoned by its original
author Bruce Perens
("A new version will be here shortly" has been on that site since
2003!). The smart German guy Hayati Aygün fixed Bruce's
version up, but he's since been asked to change the name to DUMA. Here's Hayati's version of Electric Fence as used by NetRun--just link with "-lefence" to see it in action.
On Linux, valgrind is another cool
program that checks all memory accesses. It's actually an x86
binary translator that inserts bounds-checking before all memory
accesses. Once it's installed, just "valgrind ./my_program" to
run your program with memory checking. The only major problem
I've had with valgrind is running it on OpenGL application using direct
rendering, which seems to lock up my machine hard, forcing me to reboot.
Malloc via Mmap or brk
Deep down, new and malloc are usually implemented using one of two ways:
On UNIX, little allocations usually request memory from the OS using an older, simpler interface called sbrk,
or "set break", which back in the 1970's set the dividing line between
program memory and OS memory. Nowadays it's just another
interface to mmap, but it is nice and simple--it just takes the number
of bytes you want to allocate, and returns a pointer to those
bytes. It's not nearly as flexible as mmap, but it is something
to keep in mind.
- Big allocations directly call mmap, getting a page-granularity
allocation. This is nice because an mmap'd buffer can immediately
be munmap'd when you're done using it.
- Little allocations (less than 128K on Linux) look through the
free list and do a sub-page allocation themselves using the little
8-byte housekeeping header we've looked at. This is nice because
it means a 4-byte allocation won't waste an entire 4-kilobyte page.