The Stack

Push and Pop

"The stack" is a frequently-used area of memory that functions as temporary storage--say, as space for local variables when a function runs out of room, or to store values while calling a function. The easiest and most common way to use the stack is with the dedicated "push" and "pop" instructions.

"push" stores a value or register out onto the stack.
"pop" retrieves the last value pushed from the stack. Everything you push, you MUST pop again afterwards, or your code will crash almost immediately!

For example, this loads 23 into rax, and then 17 into rcx:

push 17
push 23
pop rax
pop rcx
ret

(Try this in NetRun now!)

You commonly use push and pop to save registers at the start and end of your function. You have the right to use rcx, rdx, rsi, rdi, and r8-r11 for anything you like, no questions asked. However, you MUST put rsp back where you found it, as well as rbp, rbx, and r12-r15. For example, "rbp" is a preserved register, so you need to save its value before you can use it:

push rbp ; save old copy of this register

mov rbp,23
mov rax,rbp

pop rbp ; restore main's copy from the stack
ret

(Try this in NetRun now!)

If you have multiple registers to save and restore, be sure to pop them in the *opposite* order they were pushed:

push rbp ; save old copy of this register
push r15

mov rbp,23
mov rax,rbp

pop r15 ; restore main's copy from the stack
pop rbp
ret

(Try this in NetRun now!)

One big advantage to saved registers: you can call other functions, and know that they won't change their values. All the scratch registers, by contrast, are likely to get overwritten.

The Stack Pointer

"push" and "pop" are implemented using the "stack pointer" to point to the most recently pushed value. On x86, the stack pointer is stored in the register called "rsp" (Register: Stack Pointer).

Conceptually, the stack is divided into two areas: high addresses are all in use and reserved (you can't change these values!), and lower addresses that are unused (free or scratch space). The stack pointer points to the last in-use byte of the stack. The standard convention is that when your function starts up, you can claim some of the stack by moving the stack pointer down--this indicates to any functions you might call that you're using those bytes of the stack. You can then use that memory for anything you want, as long as you move the stack pointer back before your function returns.

Address	Contents
0x000...000		"low memory"

	unused stack area	(you can claim this space)
rsp->	end of reserved data	"top of the stack"
	reserved stack data	(main's variables)
0xfff...fff		"high memory"

It's very annoying that the stack starts at high addresses and grows toward lower addresses: everything else on the machine (arrays, malloc space, strings, even integers) starts at low addresses and grows toward higher addresses. The reason is historical: on ancient machines with only a little memory space to work with, they'd put their data at one end of memory (near address zero), and the stack as far away as it could get, near high memory. Then the program's data or stack space could grow as far as possible without overwriting the other. Of course, on a 64-bit machine you've got billions of gigabytes of address space, so you're unlikely to run out no matter which way the stack grows, but we're stuck with the convention that "the stack grows toward lower memory". Confusingly, the last reserved value (at the lowest address rsp) is called the "top" of the stack.

"push" and "pop" are implemented via the stack pointer:

"push thing" makes space for thing on the stack, and copies the value of thing into memory there. It's the same as "sub rsp,8" and then "mov QWORD [rsp],thing".
"pop thing" copies whatever is on top of the stack into thing, then removes that space from the stack. It's the same as "mov thing,QWORD [rsp]" followed by "add rsp,8".

Sadly, if you screw up the stack, such as by forgetting to pop or move the stack pointer back, or overwriting part of the stack that isn't yours, then the function that called you (such as main) will normally crash horribly. So be careful with the stack!

Here's how we allocate some space on the stack, then read and write it:

sub rsp,16 ; I claim the next sixteen bytes in the name of... me!

mov QWORD [rsp],1492 ; store a long integer into our stack space
mov rax,QWORD [rsp] ; read our long from where we stored it

add rsp,16 ; Hand back the stack space
ret

(Try this in NetRun now!)

Here's how we'd allocate one hundred long integers on the stack, then use just one of them:

sub rsp,800 ; I claim the next eight hundred bytes

mov rdi,rsp ; points to the start of our 100-integer array
add rdi,320 ; jump down to integer 40 in the array
mov QWORD [rdi],1492 ; store an integer into our stack space
mov rax,QWORD [rdi] ; read our integer from where we stored it

add rsp,800 ; Hand back the stack space
ret

(Try this in NetRun now!)

These are handy if you've only got one integer to stick on or pull off the stack. In 32-bit mode, push and pop are really useful for function arguments, which by convention in 32-bit mode are stored on top of the stack when you call the function:

push 19
extern print_int
call print_int
pop eax ; MUST clean up the stack
ret

(Try this in NetRun now!) (32-bit mode)

This prints the "19" that's stored on top of the stack. In 32-bit mode, all function arguments are stored on the stack (unlike registers for 64-bit code). This means the stack is a rather funny mix of function arguments, local and temporary variables, totally unused space for alignment, etc.

Stack Frames: rbp

There's one fairly handy saved register called rbp, which means "extended base pointer". Here's the standard use of rbp: to stash the value of the stack pointer at the start of the function. This is sometimes a little easier than indexing from rsp directly, since rsp changes every time you push or pop--rbp, by contrast, can stay the same through your entire function.

push rbp; stash old value of rbp on the stack
mov rbp,rsp; rbp == stack pointer at start of function

sub rsp,1000 ; make some room on the stack
mov QWORD[rbp-4],7 ; local variables are at negative offsets from the base pointer
mov eax,QWORD[rbp-4]; same local variable

mov rsp,rbp; restore stack pointer (easier than figuring the correct "add"!)
pop rbp; restore rbp
ret
(Try this in NetRun now!)

rbp isn't used very often in 64-bit mode, but in 32-bit mode it's almost standard. The piece of the stack around the base pointer is often called the function's "stack frame": negative offsets get to the function's local variables, positive offsets get to the caller's parameters, and directly at rbp is the saved copy of the old rbp. This effectively makes a chain of rbp pointers (assuming every function uses the frame pointer correctly); on some machines you can "unwind the stack" or print a "stack trace" by following this chain of pointers.

Call and Return

OK, so far we've seen that the stack gets used in assembly language for:

Temporary storage, like small arrays in the program. You just "sub rsp, N" to allocate N bytes starting at rsp; as long as you be sure to "add rsp,N" to give those bytes back before your function returns. One nice part about the stack is that once you move the stack pointer over an area, those bytes are YOURS until you give them back, unlike registers, which almost always get overwritten when you call another function.
Saving preserved registers, usually just a "push" at the start of your function, and a "pop" at the end.

There's one more place the stack gets used, and that's to keep track of where "ret" should go when you return from a function. This is very simple--"ret" jumps back to the address on the top of the stack. "call" pushes this return address before jumping into the new function.

Instruction	Equivalent Instruction Sequence
call bar	push next_instruction jmp bar next_instruction:
ret	pop rdx (rdx or some other scratch register; ret doesn't modify any registers) jmp rdx

For example, there's one subtle difference between these two pieces of code: in the first case, we go and come back; in the second case, we leave forever.

Assembly

C/C++

Call

call make_beef
mov eax,0xC0FFEE
ret

make_beef:
	mov eax,0xBEEF
	ret

(Try this in NetRun now!)

Returns 0xC0FFEE, because we come back from "make_beef".

int make_beef(void);
int foo(void) {
	make_beef();
	return 0xC0FFEE;
}

int make_beef(void) {
	return 0xBEEF;
}

(Try this in NetRun now!)
Also returns 0xC0FFEE, for the same reason.

Jump

jmp make_beef
mov eax,0xC0FFEE
ret

make_beef:
	mov eax,0xBEEF
	ret

(Try this in NetRun now!)
Returns 0xBEEF, because we never come back from "make_beef".

int foo(void) {
	goto make_beef;
	return 0xC0FFEE;
make_beef:
	return 0xBEEF;
}

(Try this in NetRun now!)

Again, "make_beef" never comes back, so we get 0xBEEF.

It's easy to manually add code to jump back from "make_beef", like this:

jmp make_beef
come_back:
mov eax,0xC0FFEE
ret

make_beef:
	mov eax,0xBEEF
	jmp come_back

(Try this in NetRun now!)

But the "call" instruction allows "ret" to jump back to the right place automatically, by pushing the return address on the stack. "ret" then pops the return address and goes there:

push come_back ; - simulated "call" -
jmp make_beef  ; -   continued  -
come_back:     ; - end of simulated "call" -

mov eax,0xC0FFEE
ret

make_beef:
	mov eax,0xBEEF
	pop rcx   ; - simulated "ret" -
	jmp rcx   ; - end of simulated "ret" -

(Try this in NetRun now!)

There's a very weird hacky way to figure out what address your code is running from: call the next instruction, and then pop the return address that "call" pushed!

call nextline
nextline:
pop rax ; rax will store the location in memory of nextline
ret

(Try this in NetRun now!)

This is only useful if your code doesn't know where in memory it will get loaded. This is true for some shared libraries, where you see exactly the above instruction sequence!

Optimization: Tail Recursion

Because calling functions takes some overhead (push return address, call function, do work, pop return address, return there), recursion is slower than iteration. For example:

C++ Plain Recursion

Assembly Plain Recursion

int sum(int i) {
	if (i==0) return 0;
	else return sum(i-1)+i;
}
int foo(void)
{
	return sum(10000000);
}

(Try this in NetRun now!)

mov rdi,10000000
call sum
ret

; sum function: computes sum of numbers from 0...i
; one parameter: i, in rdi: number of recursions left
sum:
cmp   rdi,0 ; check if we're done
jle    base_case
push rdi
sub  rdi,0x1; i-1
call sum ; recursion step
pop rdi
add rax,rdi ; partial + i
ret

base_case: ; no more iterations--return zero
mov   rax,0
ret

(Try this in NetRun now!)

Folks who love recursion have found an interesting optimization called "tail recursion", where you arrange for there to be *nothing* for the function to do after recursing, so there's no point in your children returning to you--you just "jmp" to them, not "call", because you don't want them to ever come back to you. The base case is the only "ret". Here's an example:

C++ Tail Recursion

Assembly Tail Recursion

int sum(int i,int partial) {
	if (i==0) return partial;
	else return sum(i-1,partial+i);
}
int foo(void)
{
	return sum(10000000,0);
}

(Try this in NetRun now!)

mov edi,1000000000 ; sum first argument
mov esi,0 ; partial result
call sum
ret

sum:
mov    eax,esi
cmp   edi,0
je    base_case
lea    esi,[rax+rdi*1]
sub    edi,0x1
jmp sum ; tail recursion step!

base_case:
ret

(Try this in NetRun now!)

Tail recursion eliminates both the memory used on the stack for the return addresses, and the time taken for the call and return. It can make recursion exactly as fast as iteration. (Curiously, you can always transform an iteration into a recursion, and vice versa, although the code may get nasty.)

Sadly, my version of the gcc compiler doesn't seem to do the tail recursion optimization anymore on 64-bit machines, although it used to do them on 32-bit machines.

Why you care #1: Stack Space Usage

Every time you call a nested function, the stack has to hold the address to return to. This actually takes up a few bytes of stack space per call, so a deeply-recursive function can run out of space pretty quickly. For example, this code runs out of stack space and exits (rather than crashing or printing the return value) for an input value as low as 10 million:

int silly_recursive(int i) {
	if (i==0) return 0;
	else return i+silly_recursive(i-1);
}

int foo(void) {
	std::cout<<"Returns: "<<silly_recursive(read_input());
	return 2;
}

(Try this in NetRun now!)

The same computation works fine (aside from integer overflow) when written as an iteration, not a recursion, because iteration doesn't touch the stack:

int silly_iterative(int n) {
	int sum=0;
	for (int i=0;i<=n;i++) sum+=i;
	return sum;
}

int foo(void) {
	std::cout<<"Returns: "<<silly_iterative(read_input());
	return 2;
}

(Try this in NetRun now!)

Why you care #2: Buffer Overflow Attack

Another place understanding call and return come in handy is in writing secure code. Here's some insecure code:

int happy_innocent_code(void) {
	char str[8];
	cin>>str;
	cout<<"I just read a string: "<<str<<"!  I'm a big boy!\n";
	return 0;
}

void evil_bad_code(void) {
	cout<<"Mwa ha ha ha...\n";
	cout<<"...er, I can't return.  Crashing.\n";
}

int foo(void) {
	//void *p=(void *)evil_bad_code; /* address of the bad code */
	//printf("evil code is at: '%4s'\n",(char *)&p);
	happy_innocent_code();
	cout<<"How nice!\n";
	return 0;

}

(Try this in NetRun now!)

The "cin>>str" line in happy can overwrite happy's stack space with whatever's in the read-in string, if the read-in string is longer than 7 bytes. So you can get a horrific crash if you just enter any long string, because the correct return address is overwritten with string data.

But it gets worse. Note that we never explicitly call "evil_bad_code", but the commented-out code helped me craft the attack string "aaaabbbbccccDŠeeeeffff", where the funky binary bytes of that attack string get written into the part of the stack that should be storing happy's return address. If we overwrite this with the address of evil code, happy will return directly to evil bad code, which then can do anything it likes. Kaboom!

Be sure to use "std::string", not raw arrays of char, in all your input data!

There's a pretty informative writeup on this by the hacker Aleph One called "smashing the stack for fun and profit". Luckily, most network-facing code nowadays (including NetRun itself) uses strings properly, and isn't vulnerable to buffer overflow exploits like this. Modern compilers, like gcc 4, automatically include protection against stack smashing (try the above on my 64-bit machine, which has this new compiler!).