Stack Frames, and Call and Return

CS 301 Lecture, Dr. Lawlor

As we've seen, the x86 calling conventions say that (page 37) eax holds the return value from a function. You can use ecx and edx for anything you like. You have to put esp back where you found it.

ebp, ebx, edi, and esi are called "saved" or "preserved" registers, since you *can* use them, but you *must* put them back to where they were before you return. The standard place to save registers is on the stack, for example by pushing their old value at the start of your function, then popping their old value back at the end of your function.

Stack Frames: ebp

There's one fairly handy saved register called ebp, which means "extended base pointer". Here's the standard use of ebp: to stash the value of the stack pointer at the start of the function. This is sometimes a little easier than indexing from esp directly, since esp changes every time you push or pop--ebp, by contrast, can stay the same through your entire function.

push ebp; stash old value of ebp on the stack
mov ebp,esp; ebp == stack pointer at start of function

sub esp,1000 ; make some room on the stack
mov DWORD[ebp-4],7 ; local variables are at negative offsets from the base pointer
mov eax,DWORD[ebp-4]; same local variable

mov esp,ebp; restore stack pointer (easier than figuring the correct "add"!)
pop ebp; restore ebp
ret
(Try this in NetRun now!)

Call and Return

OK, so far we've seen that the stack gets used in assembly language for:

Temporary storage, like small arrays in the program. You just "sub esp, N" to allocate N bytes starting at esp; as long as you be sure to "add esp,N" to give those bytes back before your function returns. One nice part about the stack is that once you move the stack pointer over an area, those bytes are YOURS until you give them back, unlike registers, which almost always get overwritten when you call another function.
Passing function arguments. A function's first argument is the thing sitting on top of the stack when the function gets called. You can "push X" to get X onto the top of the stack, but you do have to use "pop reg" or "add esp,4" to clean up the stack after the function returns.

There's one more place the stack gets used, and that's to keep track of where "ret" should go when you return from a function. This is very simple--"ret" jumps back to the address on the top of the stack. "call" pushes this return address before jumping into the new function.

For example, there's one subtle difference between these two pieces of code: in the first case, we go and come back; in the second case, we leave forever.

Assembly

C/C++

call make_beef
mov eax,0xC0FFEE
ret

make_beef:
	mov eax,0xBEEF
	ret

(Try this in NetRun now!)

Returns 0xC0FFEE, because we come back from "make_beef".

int make_beef(void);
int foo(void) {
	make_beef();
	return 0xC0FFEE;
}

int make_beef(void) {
	return 0xBEEF;
}

(Try this in NetRun now!)
Also returns 0xC0FFEE, for the same reason.

jmp make_beef
mov eax,0xC0FFEE
ret

make_beef:
	mov eax,0xBEEF
	ret

(Try this in NetRun now!)
Returns 0xBEEF, because we never come back from "make_beef".

int foo(void) {
	goto make_beef;
	return 0xC0FFEE;
make_beef:
	return 0xBEEF;
}

(Try this in NetRun now!)

Again, "make_beef" never comes back, so we get 0xBEEF.

It's easy to manually add code to jump back from "make_beef", like this:

jmp make_beef
come_back:
mov eax,0xC0FFEE
ret

make_beef:
	mov eax,0xBEEF
	jmp come_back

(Try this in NetRun now!)

But the "call" instruction allows "ret" to jump back to the right place automatically, by pushing the return address on the stack. "ret" then pops the return address and goes there:

push come_back ; - simulated "call" -
jmp make_beef  ; -   continued  -
come_back:     ; - end of simulated "call" -

mov eax,0xC0FFEE
ret

make_beef:
	mov eax,0xBEEF
	pop ecx   ; - simulated "ret" -
	jmp ecx   ; - end of simulated "ret" -

(Try this in NetRun now!)

Why you care #1: Stack Space Usage

Every time you call a nested function, the stack has to hold the address to return to. This actually takes up a few bytes of stack space per call, so a deeply-recursive function can run out of space pretty quickly. For example, this code runs out of stack space and exits (rather than crashing or printing the return value) for an input value as low as 10 million:

int silly_recursive(int i) {
	if (i==0) return 0;
	else return i+silly_recursive(i-1);
}

int foo(void) {
	std::cout<<"Returns: "<<silly_recursive(read_input());
	return 2;
}

(Try this in NetRun now!)

The same computation works fine (aside from integer overflow) when written as an iteration, not a recursion, because iteration doesn't touch the stack:

int silly_iterative(int n) {
	int sum=0;
	for (int i=0;i<=n;i++) sum+=i;
	return sum;
}

int foo(void) {
	std::cout<<"Returns: "<<silly_iterative(read_input());
	return 2;
}

(Try this in NetRun now!)

Why you care #2: Buffer Overflow Attack

Another place understanding call and return come in handy is in writing secure code. Here's some insecure code:

int happy_innocent_code(void) {
	char str[8];
	cin>>str;
	cout<<"I just read a string: "<<str<<"!  I'm a big boy!\n";
	return 0;
}

void evil_bad_code(void) {
	cout<<"Mwa ha ha ha...\n";
	cout<<"...er, I can't return.  Crashing.\n";
}

int foo(void) {
	//void *p=(void *)evil_bad_code; /* address of the bad code */
	//printf("evil code is at: '%4s'\n",(char *)&p);
	happy_innocent_code();
	cout<<"How nice!\n";
	return 0;

}

(Try this in NetRun now!)

The "cin>>str" line in happy can overwrite happy's stack space with whatever's in the read-in string, if the read-in string is longer than 7 bytes. So you can get a horrific crash if you just enter any long string, because the correct return address is overwritten with string data.

But it gets worse. Note that we never explicitly call "evil_bad_code", but the commented-out code helped me craft the attack string "1234xxxxyyyyDŠ", where the last four bytes of that attack string get written into the part of the stack that should be storing happy's return address. If we overwrite this with the address of evil code, happy will return directly to evil bad code, which then can do anything it likes. Kaboom!

Be sure to use "std::string", not raw arrays of char, in all your input data!

There's a pretty informative writeup on this by the hacker Aleph One called "smashing the stack for fun and profit". Luckily, most network-facing code nowadays (including NetRun itself) uses strings properly, and isn't vulnerable to buffer overflow exploits like this.