x86 Assembly: Memory and Stack Frames

CS 301 Lecture, Dr. Lawlor, 2005/09/30

Accessing Memory

On the x86 with GNU assembly, you can use a register as a pointer by wrapping it in parenthesis.  So while "%eax" is the value in %eax, "(%eax)" is value in memory pointed to by %eax.  You can also add a byte offset, in decimal or hex, to the beginning of the expression--so "4(%eax)" is the value in memory at address %eax plus four bytes.

Note that assembly offsets are always in *bytes*, not integers, so on a 32-bit machine you're almost always saying offsets of 4, 8, 12 (0xC), 16 (0x10), etc. 

Accessing Arguments from Memory

I can access all my arguments by just using the correct offset from the stack pointer.  If I'm just starting a routine, the top of the stack contains the caller's return address, which takes 4 bytes.  So my first argument begins 4 bytes up into the stack, at "4(%esp)".  So a routine that just returns its first argument would look like this:

my_sub:
   # Stack contains:
   #      my argument     <- 4(%esp)
   #      return address   <-   (%esp)
    mov 4(%esp),%eax  # Copy first argument, at 4 bytes into the stack, into eax, the return register.
    ret

The Frame Pointer

Note that if I now push something else onto the stack, my first argument is now at "8(%esp)".  In general, if I'm pushing and popping all the time, it's really a pain to keep track stuff relative to %esp, because %esp keeps changing.

The standard solution to this is to use *another* register, the "frame pointer" (%ebp on x86), that within our subroutine never changes.  Then once we set up %ebp at the start of our routine, the location of our arguments and local variables is fixed relative to %ebp, so we can then push and pop willy-nilly.

The standard code to set up the frame pointer (the "subroutine prologue") looks like this:
    push %ebp   # Save the old frame pointer on the stack
    mov %esp,%ebp   # Set our frame pointer to be our stack pointer's initial value 
If you disassemble some compiler-generated code, these are almost always the first two instructions.

The standard code to undo our setup (the "subroutine epilogue") looks like this:
    mov %ebp,%esp   # Restore the stack pointer (pops off anything we've pushed)
    pop %ebp    # Restore the old frame pointer
These two instructions can also be replaced with a single "leave" instruction, which does exactly the same thing.

So a more idiomatic subroutine would be this:
my_sub:
    push %ebp   # Save the old frame pointer on the stack
    mov %esp,%ebp   # Set our frame pointer to be our stack pointer's initial value

   # Stack contains:
   #      my argument     <- 8(%ebp)
   #      return address   <- 4(%ebp)
   #      saved ebp           <- 0(%ebp)
    mov 8(%ebp),%eax  # Copy first argument, at 8 bytes off ebp, into eax (the return register).
 
    mov %ebp,%esp   # Restore the stack pointer (pops off anything we've pushed)
    pop %ebp    # Restore the old frame pointer
    ret

Of course, setting up and tearing down the stack frame takes time, so it's really up to you to decide whether to use a frame pointer or not.   If you've got a stack frame set up, you can call the NetRun support routine "print_stack" to display the stack between %esp and %ebp (plus a bit).

Stack Unwinding

Note that the standard function prologue results in the frame pointer %ebp pointing to the top of the stack at the start of the routine.   But the top of the stack at this point contains the old frame pointer, so the new %ebp actually points to a location in memory that contains the *old* %ebp.
    push %ebp   # Save the old frame pointer on the stack
    mov %esp,%ebp   # Set our frame pointer to be our stack pointer's initial value

Here's what a real stack looks like after I've called "my_sub" from another little subroutine (see NetRun code):
 Address             Data
0xbfffef2c 16(bp)=0x08048660 (old old return address, from call)
0xbfffef28 12(bp)=0xbfffef38 (old old base pointer, from push)
0xbfffef24 8(bp)=0x00001234 (function argument)
0xbfffef20 4(bp)=0x08048251 (old return address, from call
0xbfffef1c 0(bp)=0xbfffef28 (old base pointer, from push)
Note that the frame pointer %ebp has value 0xbfffef1c.  In memory at this address is the old base pointer, with value 0xbfffef28.  In memory at *this* address is an older yet base pointer, with value 0xbfffef38, and so on up the chain.

So *if* the code has been regularly using frame pointers, you can actually follow this chain of frame pointers higher and higher, to figure out exactly which routines called which routines in order to get you here.  Eventually, you'll reach the "main" routine, and from there the library code that calls main.  See the stack unwind example C++ program.

Local Variables

As part of my subroutine's stack setup, I can make as much room on the stack as I want, by just moving the stack pointer like "sub $100,%esp".  I can then use that space for anything I want--