Memory Access in Assembly Language

CS 301 Lecture, Dr. Lawlor

You can tell the assembler to keep some constants right next to your machine code, known as "static" data after the C++ keyword.  The handy way to get the address of your new constant is with a label, and the way to specify the constant's value is with one of the new instructions "dq" (data quadword), "dd" (data dword), "dw" (data word) or "db" (data byte).  These instructions reserve the corresponding amount of space, and initialize that space to the value you give.

The syntax for accessing these constants from normal assembly looks like:
mov ... DWORD[somePtr] ...

section .data
somePtr:
    dd constant0
Here are all the constant-generating "instructions", and the size of the data they make:
Instruction C++ Access
Register
Bits Bytes
dq 0x3 long
QWORD[somePtr]
r _ _
64
8
dd 0x3 int
DWORD[somePtr]
e _ _
32
4
dw 0x3 short
WORD[somePtr]
_ _
16
2
db 0x3 char
BYTE[somePtr]
_ l
8
1

Static Integers

Here's an example where we load a statically allocated integer from memory.
mov eax,DWORD[myInt] ; copy this int into eax
ret

section .data
myInt:
dd 0xa3a2a1a0 ; "data DWORD" containing this value

(Try this in NetRun now!)

You can copy a pointer value into a register, too.  Here we're dereferencing a pointer stored in a register:
mov rdx, someIntPtr ; copy the address myIntPtr into rdx (like C++: p=someIntPtr;)
mov eax, DWORD [rdx] ; read memory rdx points to (like C++: return *p;)
ret

section .data
someIntPtr: ; A place in memory, where we're storing an integer.
dd 123 ; "data DWORD", our integer

(Try this in NetRun now!)

A pointer to an array initially looks just like a pointer to anything else:
mov rcx, myArray ; rcx points to myArray  (like C++: p=arr;)
mov eax, DWORD [rcx] ; read memory pointed to by rcx (like C++: return *p;)
ret

section .data
myArray: ; A place in memory, where we're storing some integers.
dd 100 ; "data DWORD", here our array element [0]
dd 101 ; [1]
dd 102 ; [2]
dd 103 ; [3]

(Try this in NetRun now!)

Here's an example where we index into our little 4-integer array:
mov eax, DWORD [myArray+4*2] ; read myArray[2]
ret

section .data
myArray: ; A place in memory, where we're storing some integers.
dd 100 ; "data DWORD", here our array element [0]
dd 101 ; [1]
dd 102 ; [2]
dd 103 ; [3]

(Try this in NetRun now!)

Static Bytes & Strings

You can access an individual byte from memory with the syntax BYTE[address].  Most instructions want DWORDs, not BYTEs, so you need to use a BYTE-friendly instruction like  "movzx" (move with zero-extend):
	movzx reg,BYTE[address]
Accessing data as bytes is useful for string processing, or to understand what really shows up in memory.

For example, here I'm defining a short 3-byte string, and reading one byte out:
movzx eax,BYTE[myString + 2] ; read this byte into eax
ret

section .data
myString:
db 'w','o','a'

(Try this in NetRun now!)

These are all equivalent ways to get the same 3-byte string:

db 0x77
db 0x6f
db 0x61
db 'w'
db 'o'
db 'a'
db 'w','o','a'
db 'woa'
db "woa"

There are several standard functions that take a "C string": a pointer to a bunch of ASCII bytes, followed by a zero byte.  "puts" is one such function, and it prints the string you pass it plus a newline. We can call puts to print out our string like this:

mov rdi,myString  ; points to string constant below
extern puts
call puts
ret

section .data
myString:
db 'woa',0 ; need the trailing zero to mark the end of the string...

(Try this in NetRun now!)

Here's an example where we load a byte from the middle of an integer.  Note that this returns 0xa2, since byte 0 is the 0xa0--the little byte--on our little-endian x86 machines.

movzx eax,BYTE[myInt + 2] ; read this byte into eax
ret

section .data
myInt:
dd 0xa3a2a1a0 ; "data DWORD" containing this value

(Try this in NetRun now!)

Modifiable Static Data

By default, stuff in "section .data" is readable and writeable.  So this works fine:
mov DWORD[myInt],7 ; overwrite our int
mov eax,DWORD[myInt] ; copy the modified int into eax
ret

section .data
myInt:
dd 2 ; "data DWORD" containing this value

(Try this in NetRun now!)

But if you leave off the "section .data", the constant is stored next to the program's machine code in "section .text" (a weird ancient name; machine code is not human-readable text!).  This code section is readable but not writeable, so this segfaults:
mov DWORD[myInt],7 ; overwrite our int
mov eax,DWORD[myInt] ; copy the modified int into eax
ret

myInt:
dd 2 ; "data DWORD" containing this value

(Try this in NetRun now!)

You can even store *code* in the modifiable "section .data". 
call myFunction
ret

section .data
myFunction:
mov eax,2
ret

(Try this in NetRun now!)

The "mov" and "ret" instructions just emit bytes of machine code, identical to:
call myFunction
ret

section .data
myFunction:
db 0xb8,0x02,0x00,0x00,0x00,0xc3; code for my function

(Try this in NetRun now!)

However, when code is in modifiable memory, you can modify the machine code!  For example, if I know what bytes the assembler will output for "myFunction", I can actually figure out where to go in and modify the "myFunction" machine code, to change what the function returns!  In this case, I just want to skip in past the 0xb8 (mov opcode) and overwrite the constant being loaded:
mov DWORD[myFunction+1],7 ; overwrite constant loaded by first, 0xb8 instruction
call myFunction
ret

section .data
myFunction:
mov eax,2 ; <- modified at runtime!
ret

(Try this in NetRun now!)

This returns 7, because the bytes of "myFunction" are modified before execution.

There's also a "section .bss" that contains zero-initialized storage.  In summary:

Static Pointers

One common trick is to use pointers in the static data.  For example, I can build a static linked list like this:
mov rcx,myFirstData ; cur=head
keep_printing:
mov edi,DWORD[rcx+8] ; print_int(cur->value)
extern print_int
push rcx
call print_int
pop rcx
mov rcx,QWORD[rcx] ; cur=cur->next
cmp rcx,0
jne keep_printing
ret

section .data
myFirstData:
dq mySecondData
dd 3

mySecondData:
dq myThirdData
dd 7

myThirdData:
dq 0 ; END of list
dd 0

(Try this in NetRun now!)