Review & Survey of Memory Access

CS 301 Lecture, Dr. Lawlor

We've seen a whole variety of different idioms and technology related to memory access over the last two weeks, so here's a good summary of everything you need to know about memory.

Other places to go for help:

My x86 cheat sheet.
Paul Carter's PC Assembly Language, a free-to-download PDF book introducing you to NASM-syntax assembly.
The NASM assembler manual, especially chapter 3, which describes the syntax NASM accepts.

Working with Addresses

A memory address (a pointer) is just a count of the bytes from zero to the location of the byte you want. You can store a memory address in a register, then add and subtract from the address to change what you're pointing at ("pointer arithmetic").

Here's how many bytes are used by various objects (in C/C++, you can use "sizeof" to determine this):

Number of Bytes	C/C++ Types	Assembly Type
1	char	BYTE
2	short	WORD
4	int, float	DWORD
8	double	QWORD

"long" and pointer types are either 4 or 8 bytes, depending on if you're running 32-bit or 64-bit code. NetRun runs your code in 32 bit mode by default.

Loading an Integer from Memory

You can access an individual integer from memory with the syntax DWORD[address]. This tells the CPU to load up a DWORD (a 32-bit value) from the given memory address. You can write the address as a label, a register, or even a combination of registers.

	mov reg,DWORD[address]

Accessing integers in memory is very common--your local variables, arrays, and so on are usually integers.

Here's an example where we load a statically allocated integer from memory.

mov eax,DWORD[myInt] ; read this int into eax
ret

myInt:
	dd 0xa3a2a1a0 ; "data DWORD" containing this value

(Try this in NetRun now!)

Bytes in an Integer

You can access an individual byte from memory with the syntax BYTE[address]. Most instructions want DWORDs, not BYTEs, so you need to use a BYTE-friendly instruction like "movzx" (move with zero-extend):

	movzx reg,BYTE[address]

Accessing data as bytes is useful for string processing, or to understand what really shows up in memory.

Here's an example, where we load a byte from the middle of an integer. Note that this returns 0xa2, since byte 0 is the 0xa0--the little byte--on our little-endian x86 machines.

movzx eax,BYTE[myInt + 2] ; read this byte into eax
ret

myInt:
	dd 0xa3a2a1a0 ; "data DWORD" containing this value

(Try this in NetRun now!)

Array Indexing

Each integer is four bytes, so if you have a pointer to one integer, the next integer starts 4 bytes higher in memory. So to access integer i of an array, you need to go 4*i bytes higher in memory:

	mov reg,DWORD[array + 4*index]

You can also compute this manually, using "imul" and "add".

Here's an example:

mov eax,DWORD[myArr + 4*2] ; eax = myArr[2]
ret

myArr:
	dd 10 ; that's myArr[0]
	dd 11 ; and myArr[1]
	dd 12 ; and myArr[2]
	dd 13 ; and myArr[3]

(Try this in NetRun now!)

Static Allocation

You can tell the assembler to keep some constants right next to your machine code. The handy way to get the address of your new constant is with a label, and the way to specify the constant's value is with "dd" (data dword, 32-bit constant) and "db" (data byte, 8-bit constant).

someConstant:
    dd constant0
    dd constant1

Here's an example where we index into a little 4-integer array:

mov eax,DWORD[myArr + 4*2] ; eax = myArr[2]
ret

myArr:
	dd 10 ; that's myArr[0]
	dd 11 ; and myArr[1]
	dd 12 ; and myArr[2]
	dd 13 ; and myArr[3]

(Try this in NetRun now!)

Stack: Allocate Temporary Storage

You can claim a portion of stack space for your own use by just moving the stack pointer down:

sub esp,N
... use N bytes of space starting at esp here ...
add esp,N

This is very common for allocating temporary arrays, strings, local variables, and so on. You'll get a horrible crash if you forget to put the stack pointer back, though!

Here's an example where we allocate 32 bytes of space, store something into the last bytes, and read it back:

sub esp,32 ; Claim this many bytes of stack space
mov DWORD[esp+28],13  ; Store a constant into that space
mov eax,DWORD[esp+28] ; Load it back 
add esp,32 ; Clean up the stack
ret

(Try this in NetRun now!)

Here's another example where I allocate a 100-integer array, and loop over the array:

sub esp,4*100; claim 100 integers of space

mov eax,0; i
mov ecx,esp; ecx points to start of our array
loopstart:
	mov DWORD[ecx+4*eax],eax; arr[i]=i;
	add eax,1 ; i++
	cmp eax,100; if (i<100) ...
	jl loopstart; ... then goto start

mov eax,DWORD[ecx+4*36]; return array element 36
add esp,4*100; clean up array
ret

(Try this in NetRun now!)

Stack: Save & Restore Registers

One standard use of the stack is as a place to store the old value of a "preserved" register, so you can use the register for your own purposes:

push reg
... use reg here (change its value) ...
pop reg
... now reg is back to its old value ...

Here's an example where I save and restore ebp. "esp", "ebp", "edi", and "esi" are all preserved registers, which means other functions won't change them, but you must save and restore them before you can use them too!

push ebp
mov ebp,17
mov eax,ebp
pop ebp
ret

(Try this in NetRun now!)

Stack Frames: ebp

One standard use of register ebp is as a "stack frame pointer", a saved copy of the top of the stack at the start of your function. The big advantage to doing this is that ebp doesn't move when you allocate new space, or push or pop.

push ebp; stash old value of ebp on the stack
mov ebp,esp; ebp == stack pointer at start of function

... do the work of the function here, including pushing values and adding to the stack ...

mov esp,ebp; restore stack pointer (easier than figuring the correct "add"!)
pop ebp; restore ebp

A stack frame is useful in long functions, since it provides a fixed reference point for doing addressing, and makes it easier to clean off the stack at the end of the function.

Here's an example where we use a stack frame:

push ebp; stash old value of ebp on the stack
mov ebp,esp; ebp == stack pointer at start of function

sub esp,1000 ; make some room on the stack
mov DWORD[ebp-4],7 ; local variables are at negative offsets from the base pointer
mov eax,DWORD[ebp-4]; same local variable

mov esp,ebp; restore stack pointer (easier than figuring the correct "add"!)
pop ebp; restore ebp
ret
(Try this in NetRun now!)

Stack: Pass Parameters to a Function

Functions on x86 expect their parameters to be sitting on top of the stack. So before you call a function, you either push or mov their parameters onto the top of the stack:

push firstArg
call someFunc
pop reg

If a function takes two or more arguments, the first (leftmost) argument should be sitting on top of the stack:

push secondArg
push firstArg
call someFunc ; same as someFunc(firstArg,secondArg);
pop reg1
pop reg2

Here's an example of calling a one-argument function. I don't care about the return value, so I just use "add" to clean the parameter off the stack:

push 17
extern print_int
call print_int
add esp,4
ret

(Try this in NetRun now!)