Memory Allocation Errors & History

CS 321 2007 Lecture, Dr. Lawlor

If you write bad code that overwrites beyond the end of memory, you will eventually crash. Unfortunately, the crash is usually far from where the error actually occurred, which makes it extremely frustrating to track down. For example, this program writes beyond the end of a buffer. But the write doesn't crash, and freeing the written buffer doesn't crash--it only crashes when you try to free a totally innocent bystander!

int *arr=(int *)malloc(1023*sizeof(int));
int *bystander=(int *)malloc(3*sizeof(int));

std::cout<<"About to access past end of buffer "<<arr<<"\n";
arr[1023]=5;  //!

std::cout<<"Well, that worked.  Deleting buffer "<<arr<<"\n";
free(arr);

std::cout<<"Also OK.  Deleting totally unrelated buffer "<<bystander<<"\n";
free(bystander);

(executable NetRun link)

This prints the following:

About to access past end of buffer 0x804c008
Well, that worked.  Deleting buffer 0x804c008
Also OK.  Deleting totally unrelated buffer 0x804d008
-------------------
Caught signal signal SIGSEGV-- segmentation violation (bad memory access) (#11)

Buffer overflow errors are complicated because new and malloc are complicated--they've got an intricate data structure surrounding your allocated memory regions, and it's easy to mess up part of that data structure by writing out-of-bounds.

This is one big reason why people like Java--the runtime system can make sure you never access out-of-bounds memory.

But you can actually get a lot of interesting bounds-checking by using a memory-checking library, like Electric Fence. Electric Fence uses mmap to put an un-accessible page after every new/malloc buffer, which causes a program to instantly segfault if it reads or writes out of bounds. Sadly, Electric Fence was abandoned by its original author Bruce Perens ("A new version will be here shortly" has been on that site since 2003!). The smart German guy Hayati Aygün fixed Bruce's version up, but he's since been asked to change the name to DUMA. Here's Hayati's version of Electric Fence as used by NetRun--just link with "-lefence" to see it in action.

On Linux, valgrind is another cool program that checks all memory accesses. It's actually an x86 binary translator that inserts bounds-checking before all memory accesses. Once it's installed, just "valgrind ./my_program" to run your program with memory checking. The only major problem I've had with valgrind is running it on OpenGL application using direct rendering, which seems to lock up my machine hard, forcing me to reboot.

Malloc via Mmap or brk

Deep down, new and malloc are usually implemented using one of two ways:

Big allocations directly call mmap, getting a page-granularity allocation. This is nice because an mmap'd buffer can immediately be munmap'd when you're done using it.
Little allocations (less than 128K on Linux) look through the free list and do a sub-page allocation themselves using the little 8-byte housekeeping header we've looked at. This is nice because it means a 4-byte allocation won't waste an entire 4-kilobyte page.

On UNIX, little allocations usually request memory from the OS using an older, simpler interface called sbrk, or "set break", which back in the 1970's set the dividing line between program memory and OS memory. Nowadays it's just another interface to mmap, but it is nice and simple--it just takes the number of bytes you want to allocate, and returns a pointer to those bytes. It's not nearly as flexible as mmap, but it is something to keep in mind.