Memory Allocation in Assembly

CS 301: Assembly Language Programming Lecture, Dr. Lawlor

sizeof(dog_in_window)

Any time you're doing low level memory manipulation, you need to keep in mind how many bytes are used by an object. For example, to allocate memory, you need the size in bytes. To do pointer manipulation, like skipping over an object to find the next object, you need the size in bytes to know how far to skip.

Array type	First element	Move to next	Access i'th element
Array of char (string)	BYTE [ ptr ]	ptr+1	BYTE [ ptr + i ]
Array of integer	DWORD [ ptr ]	ptr+4	DWORD [ ptr + 4*i ]
Array of longs	QWORD [ ptr ]	ptr+8	QWORD [ ptr + 8*i ]

There's a handy keyword "sizeof" built into C and C++, that you can apply to any variable or type name, to find out how many bytes of storage it requires. For example, if you want to know if you're running on a 64-bit machine, where "long" is 64 bits or 8 bytes, you just print the size:

return sizeof(long);

(Try this in NetRun now!)

This returns 8 on my machines, indicating "long" takes 8 bytes.

This provides an interesting way to see inside objects, by measuring their size. For example, this secretive class is still just 8 bytes, indicating in memory it's just one "long" integer. You can even typecast a class pointer to a long int pointer, dereference the pointer, and there's the value. No real mystery!

class arcane_mystery {
private: long dark_secrets;
};

return sizeof(arcane_mystery);

(Try this in NetRun now!)

There isn't any equivalent to "sizeof" in assembly language--you just need to remember the sizes of everything yourself!

Converting Between Pointer Types

In assembly you can easily convert between pointer and numeric types without doing anything special--for example, you can read an 8-byte value from memory using QWORD[rcx], add some bytes to the rcx pointer, and write a 1-byte value to memory using BYTE[rcx]. In fact, in assembly a common bug is accidentally changing a pointer instead of the value, or messing up your array indexing so you wind up pointing halfway inside an array element.

In C or C++, you need to typecast pointers to convert data between types. An example of a C-style cast is:

return *(long *)&foo;

(Try this in NetRun now!)

This converts the bytes of the foo function itself (!) into a long integer, and returns it ( 0xC3FFFFFFF9058B48). In C++, you can use the same simple C syntax, or a fancier looking C++ cast that produces exactly the same results:

return *reinterpret_cast<long *>(&foo);

(Try this in NetRun now!)

This works with *any* C++ object: functions, variables, classes, even pointers themselves. You can take their address, typecast the pointer, and read their raw in-memory values exactly like in assembly language

Making Memory Writeable in Assembly: section .data

By default, a string defined with "db" is treated as part of the program's executable code, so the string's bytes can't be modified--if you write to BYTE[rdi], the program will crash with a write to unwriteable memory. But you can tell the assembler with "section .data" to put the string into modifiable memory, which you can read or write:

mov rdi, daString ; pointer to string
mov BYTE [rdi+0], 'Y'; change the string's bytes
extern puts
call puts ; print the string
ret

section .data ; switch storage mode to modifiable data
daString:
	db `No.`,0    ; sets bytes of string in memory

(Try this in NetRun now!)

Again, there are a number of "sections" you can access. By default everything's in the code section ".text". The section directive applies to everything listed in the code until you hit another section directive.

Name	Use	Discussion
section .data	r/w data	This data is initialized, but can be modified.
section .rodata	r/o data	This data can't be modified, which lets it be shared across copies of the program.
section .bss	r/w space	This is automatically initialized to zero, meaning the contents don't need to be stored explicitly.
section .text	r/o code	This is the program's executable machine code (it's binary data, not plain text!).

In C or C++, global or static variables get stored in section .data if you give them an initial value. If they don't have an initial value, they're put in section .bss to get zero-initialized on load. If they're "const", they go in .rodata.

Stack Allocation

We've been using the stack all semester, first in the form of the "call" and "ret" instructions, and later in the form of "push" and "pop" instructions. Internally, the stack is represented via a pointer stored in the register rsp. The general rules for use of this stack pointer register are:

Any memory address >= rsp is already in use: like a preserved register, you're not allowed to change this memory. If you do, the program is likely to crash, probably when you try to return.
Any memory address < rsp is free: like a scratch register, anybody can use this memory.
You must return the pointer rsp back to the value you found it.

This means you can actually manually allocate space on the stack, by:

Marking n bytes as in-use by subtracting* n from rsp: sub rsp,n
Releasing n bytes by adding to rsp: add rsp,n

Exactly like "call" corresponds to pushing the return address and jumping to the called function, and "ret" corresponds to popping the return address and jumping there, you can fake a push thingy by allocating 8 bytes of stack space and then mov QWORD [rsp], thingy, and you can fake a pop thingy by mov thingy, QWORD [rsp]and then release 8 bytes of stack space.

More generally, if you need to allocate dynamic storage for use during a single function call, the stack is an easy and efficient place to do it.

The only downside of stack allocation is you MUST release your stack space before you can return from your function. This means long-lived data structures, like a user info class used for the rest of the program, need to be allocated somewhere else.

* It's quite weird that unlike everything else on the machine, the stack "grows down", allocating by subtracting from the pointer value. This is despite the most recent allocation being called the "top of the stack", when it's really the smallest address--this kinda makes sense if you think of writing out all the memory values at increasing addresses from top to bottom, but that's not usually how memory is shown! There are a few advantages to making the stack grow down. First, if you allocate an array of a few longs, the stack pointer also points to the start of the newly created array: the first one is stored in memory at QWORD [rsp], and the next one is at QWORD [rsp+8] (but if the stack grew upward, like everything else, you'd be pointing at the *end* of the array after allocating it by moving the pointer, requiring more error-prone pointer shuffling). Second, since the heap grows upward, toward larger pointer values, the program's entire memory is usually organized with the heap down low at one end of memory, and the stack up high near the other end of memory, and the unused memory in the middle can be traded off between stack and heap, whichever needs it.

Dynamic allocation with malloc

malloc is the standard C way to allocate memory from "the heap", the area of memory where most of a program's stuff is stored. Unlike the C++ "new" operator, malloc doesn't explicitly know which data type it's allocating, since its only parameter is the number of bytes to allocate. This means you need a pointer typecast to add a type to the bare pointer returned by malloc.

Plain C, also works in C++	C++
int arr=(int )malloc(100*sizeof(int));	int *arr=new int[100];
myClass c=(myClass )malloc(sizeof(myClass));	myClass *c=new myClass;

To free memory, call the function free, like "free(ptr);". You need to eventually call free exactly once for every malloc. As usual, you'll get a horrible crash, sometimes delayed and sometimes instant, if you free the same block more than once, free memory that didn't come from malloc, free memory and then somebody keeps using it, or many other memory misdeeds. If you forget to call free, you won't get an immediate crash, but in a long-running program like a network server, these un-freed objects will build up and eventually consume all the server's memory, causing it to slow down and eventually crash.

int len=3;
int *arr=(int *)malloc(len*sizeof(int));
int i;
for (i=0;i<len;i++)
	arr[i]=7+i;
iarray_print(arr,len);
free(arr);

(Try this in NetRun now!)

Unlike C++ "new", it's easy to call malloc from assembly. In fact, you just "call malloc", with the number of bytes to allocate in rdi, and you get back a pointer in rax. Here's a simple example where we allocate space for one long, write into the pointer returned by malloc, and immediately free it.

mov rdi,8  ; a byte count to allocate
extern malloc
call malloc
; rax is start of our array

mov QWORD[rax],3 ; yay writeable memory!

mov rdi,rax ; pointer to deallocate
extern free
call free

ret

(Try this in NetRun now!)

Here's a complete example, including looping and printing the array, equivalent to the above C code:

push rbx ; save register
mov rbx,3 ; number of integers (4-byte DWORDs) to allocate

mov rdi,rbx ; integer count
imul rdi,4  ; now a byte count
extern malloc
call malloc
; rax is start of our array

mov rcx,0 ; loop index
jmp loopTest
loopStart:
	mov rdx,7
	add rdx,rcx ; ==7+i
	mov DWORD[rax+4*rcx],edx ; arr[i]=7+i  (int version)

	add rcx,1
	loopTest:
	cmp rcx,rbx ; i<len
	jl loopStart

mov rdi,rax ; pointer to start of array
mov rsi,rbx ; number of integers
extern iarray_print
push rax ; save pointer
push 3 ; align stack
call iarray_print
pop rcx ; clean up stack
pop rax ; restore pointer

mov rdi,rax; pointer to memory to free
extern free
call free

pop rbx ; restore register
ret

(Try this in NetRun now!)

Summary of memory allocation areas

Storage Area	Allocate	Deallocate
Static read-only constant	readHere: dq 3	n/a
Static read-write data	section .data readWriteHere: dq 3	n/a
The Stack	sub rsp, nBytes ; ptr = rsp	add rsp,nBytes
The Heap	mov rdi, nBytes call malloc ; ptr = rax	mov rdi, ptr call free

No matter which way you allocate the memory, you access the memory using the same [ptr] syntax, and the program will still return garbage or crash if you move outside the memory area you correctly allocated.