Memory Problems: Use-after-delete, etc

CS 301 Lecture, Dr. Lawlor

Unfortunately, it's easy to write bad code that manipulates memory incorrectly, and then crashes.  Worse, the crash can occurs *hours* after the bad code has screwed up memory, during which the code might appear to work properly!

Memory Leak

In assembly, you've seen how every single stack allocation (such as a push) must be matched by a deallocation (such as a pop).  Similarly, every call to malloc or new must be matched by a call to free or delete (repectively!).

In a short-running program, you can actually survive if you malloc some space that you never free.  This is called a "memory leak".  When your program exits, the OS will clean up all your memory, including leaked memory, and life goes on.  But in a long-running program, or a program the repeatedly allocates memory, failing to free space will eventually cause malloc to run out of space.  When malloc is out of space to allocate, it returns NULL, and your program usually crashes horribly (from the segfault when acessing NULL).  In C++, when out of memory space "new" throws an exception, and your program usually crashes horribly (from the unexpected exception).

A memory leak is sort of like eating too much.  It's bad, but if you only do it on Thanksgiving, you'll be fine.  It gets dangerous if you eat too much day after day, because you'll soon run out of space (e.g., for your organs).

Some languages, like Java and C#, use "garbage collection" to avoid memory leaks.  The nice part about garbage collection is you never need to explicitly call "free".  The not-so-nice part is your program periodically has to stop, look through all of its memory, and get rid of allocated blocks it's not using (the pause for garbage collection).   You can get a garbage collector for C++ too.

Use After Delete

One common problem is using a pointer after you no longer have the right to use the pointer:
int *A=new int[10];
A[7]=11;
delete[] A;

return A[7]; // aieeeee!

(executable NetRun link)

Again, in a perfect world this would segfault, but in practice it reliably returns 11.  The problem is that internally, "delete" just marks the buffer as free for reuse, but doesn't actually change the value (except maybe the first few elements).

So the next allocation to come along might actually get the exact same addresses, and overwrite the deleted array with its own stuff:

int *A=new int[10];
A[7]=11;
delete[] A;

int *B=new int[10];
B[7]=2222222;

std::cout<<"Array A: "<<A<<" and B: "<<B<<"\n";
return A[7];

(executable NetRun link)

Now the program returns 2222222, the B[7] value, because A == B!

The solution: don't call delete until you're really sure everybody's done with that memory!  Use-after-delete bugs are one big argument in favor of garbage collection.

Use-after-delete can happen on the stack, too, where it's called "return reference to temporary".  The problem is that as soon as your function returns, your local variables are released to the stack, and the next function to be called will overwrite them.  So if you return a pointer to a local variable, you'll get weird inconsistent behavior.  The solution in this case is return a buffer object, or array allocated with new or malloc, and have your caller delete the thing when it's done.

Allocation/Deallocation Mismatch

It's really easy to forget what a pointer is pointing to, and call the wrong deallocation routine on it. 
If you call free on a stack-allocated array, the best you can hope for is a quick crash:
int A[10];
A[7]=11;
free(A); // aieeee!
return 0;

(executable NetRun link)

(This crashes immediately at runtime with "free: invalid pointer".  Nice!)

If you call free on a new'd array, your code is wrong, but might work anyway.
int *A=new int[10];
A[7]=11;
free(A); // aieeee!
return 0;

(executable NetRun link)

(This is WRONG, but on Linux, it appears to run fine... for now!  It'll crash eventually, possibly way later.)

The really confusing one is new/delete and new[]/delete[].  If you write a class that prints out its construction and destruction, like this, then you can actually watch the construction and destruction happen. 
class ctortest {
public:
int value;
ctortest() { std::cout<<"You created an object at "<<this<<"\n";}
~ctortest() { std::cout<<"You deleted an object at "<<this<<"\n";}
};
This correct code calls three constructors, and three destructors:
ctortest *arr=new ctortest[3];
delete[] arr;

(executable NetRun link)

This incorrect code calls only ONE of the three destructors, because delete (no brackets) expects a pointer:
ctortest *arr=new ctortest[3];
delete arr;

(executable NetRun link)

This incorrect code allocates one object but calls SEVENTEEN destructors, because delete[] only works with pointers from new[]:
ctortest *arr=new ctortest;
delete[] arr;

(executable NetRun link)

This correct code calls one constructor and one destructor.
ctortest *arr=new ctortest;
delete arr;

(executable NetRun link)

If you use new, use delete.
If you use new[], use delete[].
It's that simple (in principle!).

Memory Corruption

All of the above are examples of memory corruption.  This is when the values in memory get "corrupted", or messed up, which usually results in a segfault (eventually!) when some code accesses the messed-up data structure.

One of the most common ways to corrupt memory is to allocate an n element array, and access element [n] and beyond; a "write past the end of the buffer".  In a perfect world, the first such beyond-the-end access would segfault, and you'd immediately find and fix the problem.  Sadly, this almost *never* happens, unless you access way past the end of the buffer. 

"malloc" (and new) store their housekeeping information past the end of your arrays, so past-the-end accesses usually mess up the heap data structures.  Again, you might then get a crash when you try to free (or delete) the offending array.  Sadly, even that often that works fine too.

In fact, heap corruption usually shows up several allocations later, way past the original source of the problem!
int *arr=(int *)malloc(1023*sizeof(int));
int *bystander=(int *)malloc(3*sizeof(int));

std::cout<<"About to write past end of my buffer "<<arr<<"\n";
arr[1023]=5; //aieeeee!

std::cout<<"Well, that worked. Deleting my buffer "<<arr<<"\n";
free(arr);

std::cout<<"Also OK. Deleting totally unrelated buffer "<<bystander<<"\n";
free(bystander);

return 0;

(executable NetRun link)

This delayed action makes heap corruption very tricky to find.  In particular, maybe the bad manipulation of "arr" is happening inside "dumbguy.cpp", and the problem only shows up inside "bystander" inside "mycode.cpp".  So dumbguy's screwup causes my perfectly good code to break!

The solution: NEVER access past the end of an array.  Be careful with array indexing, and add some index checking code if you're at all in doubt!

On Linux, there's an awesome program called "valgrind" that checks every memory access you make, and immediately prints out an error if you access memory you shouldn't.  It detects past-the-end, before-the-start, use-after-delete, allocation-deallocation mismatch, and also detects memory leaks.  You use it like "valgrind ./mycode.exe".  There's a similar commercial tool for Windows called purify.

Protected Memory

The one saving grace regarding memory problems is that they are confined to a single execution of a program--because the OS constructs your processes' entire memory image from scratch when your program starts, and destroys it completely after your program exits, errors in your memory can only be caused by errors in your program, and errors in your program cannot cause errors in other processes' memory.  In the bad old days of DOS, Windows 3.1, and MacOS classic, before "protected memory", any program could trash any other program's memory--so during debugging, if your code freaked out, you might have to reboot your whole machine!  On my old 1992 Mac, I learned to get really paranoid about array indexing by suffering through 5-minute reboots every time I screwed up...