x86 Assembly: Memory and Stack Frames
CS 301 Lecture, Dr. Lawlor, 2005/09/30
Accessing Memory
On the x86 with GNU assembly, you can use a register as a pointer
by wrapping it in parenthesis. So while "%eax" is the value in
%eax, "(%eax)" is value in memory pointed to by %eax. You can
also add a byte offset, in decimal or hex, to the beginning of the
expression--so "4(%eax)" is the value in memory at address %eax plus
four bytes.
Note that assembly offsets are always in *bytes*, not integers, so on a
32-bit machine you're almost always saying offsets of 4, 8, 12 (0xC),
16 (0x10), etc.
Accessing Arguments from Memory
I can access all my arguments by just using the correct offset from the
stack pointer. If I'm just starting a routine, the top of the
stack contains the caller's return address, which takes 4 bytes.
So my first argument begins 4 bytes up into the stack, at
"4(%esp)". So a routine that just returns its first argument
would look like this:
my_sub:
# Stack contains:
# my argument <- 4(%esp)
# return address <- (%esp)
mov 4(%esp),%eax # Copy first argument, at 4 bytes into the stack, into eax, the return register.
ret
The Frame Pointer
Note that if I now push something else onto the stack, my first
argument is now at "8(%esp)". In general, if I'm pushing and
popping all the time, it's really a pain to keep track stuff relative
to %esp, because %esp keeps changing.
The standard solution to this is to use *another* register, the "frame
pointer" (%ebp on x86), that within our subroutine never changes.
Then once we set up %ebp at the start of our routine, the location of
our arguments and local variables is fixed relative to %ebp, so we can
then push and pop willy-nilly.
The standard code to set up the frame pointer (the "subroutine prologue") looks like this:
push %ebp # Save the old frame pointer on the stack
mov %esp,%ebp # Set our frame pointer to be our stack pointer's initial value
If you disassemble some compiler-generated code, these are almost always the first two instructions.
The standard code to undo our setup (the "subroutine epilogue") looks like this:
mov %ebp,%esp # Restore the stack pointer (pops off anything we've pushed)
pop %ebp # Restore the old frame pointer
These two instructions can also be replaced with a single "leave" instruction, which does exactly the same thing.
So a more idiomatic subroutine would be this:
my_sub:
push %ebp # Save the old frame pointer on the stack
mov %esp,%ebp # Set our frame pointer to be our stack pointer's initial value
# Stack contains:
# my argument <- 8(%ebp)
# return address <- 4(%ebp)
# saved ebp <- 0(%ebp)
mov 8(%ebp),%eax # Copy first argument, at 8 bytes off ebp, into eax (the return register).
mov %ebp,%esp # Restore the stack pointer (pops off anything we've pushed)
pop %ebp # Restore the old frame pointer
ret
Of course, setting up and tearing down the stack frame takes time, so
it's really up to you to decide whether to use a frame pointer or
not. If you've got a stack frame set up, you can call the
NetRun support routine "print_stack" to display the stack between %esp
and %ebp (plus a bit).
Stack Unwinding
Note that the standard function prologue results in the frame pointer
%ebp pointing to the top of the stack at the start of the
routine. But the top of the stack at this point contains
the old frame pointer, so the new %ebp actually points to a location in
memory that contains the *old* %ebp.
push %ebp # Save the old frame pointer on the stack
mov %esp,%ebp # Set our frame pointer to be our stack pointer's initial value
Here's what a real stack looks like after I've called "my_sub" from another little subroutine (see NetRun code):
Address Data
0xbfffef2c 16(bp)=0x08048660 (old old return address, from call)
0xbfffef28 12(bp)=0xbfffef38 (old old base pointer, from push)
0xbfffef24 8(bp)=0x00001234 (function argument)
0xbfffef20 4(bp)=0x08048251 (old return address, from call
0xbfffef1c 0(bp)=0xbfffef28 (old base pointer, from push)
Note that the frame pointer %ebp has value 0xbfffef1c. In memory
at this address is the old base pointer, with value 0xbfffef28.
In memory at *this* address is an older yet base pointer, with value
0xbfffef38, and so on up the chain.
So *if* the code has been regularly using frame pointers, you can
actually follow this chain of frame pointers higher and higher, to
figure out exactly which routines called which routines in order to get
you here. Eventually, you'll reach the "main" routine, and from there the library code that calls main. See the stack unwind example C++ program.
Local Variables
As part of my subroutine's stack setup, I can make as much room on the
stack as I want, by just moving the stack pointer like "sub
$100,%esp". I can then use that space for anything I want--