Data Structures in Assembly

CS 301: Assembly Language Programming Lecture, Dr. Lawlor

Plain C doesn't have "class" data type, but you can declare a "struct" that works much the same way as "class", except everything is public by default. Also unlike C++, you need to keep saying "struct foo" instead of just "foo" (unless you do a strange looking "typedef struct foo foo;").

struct student {
	long ID; // student ID number
	long grade; // percent grade
	long awesome; // awesomeness counter
};
struct student s;
s.ID=3123456;
s.grade=98;
s.awesome=2;
return s.grade;

(Try this in NetRun now!)

In memory, this struct is identical to a three-element array of longs--if rdi points to the struct s, then s.ID is at QWORD[rdi], grade is at QWORD[rdi+8], and s.awesome is at QWORD[rdi+16]. Each field's distance in bytes from the start of the struct is called the field offset--in C, you can call offsetof(struct,field) to measure this, but in assembly, you just need to remember it, so I normally add a comment with the offset.

mov rdi,studentPtr  ; rdi points to struct student
mov rax,QWORD[rdi+8] ; extract student's grade
ret

studentPtr:
	dq 3123456 ; offset 0: student ID
	dq 98      ; offset 8: student grade
	dq 2       ; offset 16: student awesome counter

(Try this in NetRun now!)

Accessing fields in a struct

In C, you need to access a struct differently depending on whether you have a value, or a pointer to the struct. You get at a pointer's parts using the strange "->" syntax, which is designed to look like a pointer.

Declaration	Access
struct student s;	return s.id;
struct student *sp=&s;	return sp->id;

As usual, C++ complicates the situation with references, which use . just like values; and namespaces, which use :: for some reason. Since only one of ., ->, or :: will ever compile, I think they should have just allowed "." for everything like most other languages (Python, Java, JavaScript, D, etc).

At least in assembly things are consistent--the only way to access parts of a struct is via a pointer to the start of the struct, moved down in memory to the field of the struct.

Class std::string

All that's inside a std::string is one pointer--you can verify this with sizeof(std::string), which is 8 bytes, the size of a pointer. This points directly to the string's character data:

a
std::string layout in memory

(I'm not showing it here, but there is also a QWORD [rax-8] that stores a reference count, used to make quick read-only copies of strings.)

If your assembly language function is passed a pointer to a std::string, you can print the string's character data by just extracting the character pointer from the string:

; rdi: our one argument, a pointer to a std::string
mov rdi, QWORD[rdi] ; extract pointer to char, only thing in a std::string!
extern puts
call puts
ret

(Try this in NetRun now!)

Iterative structures: Linked Lists

One common struct in C is a linked list, which is a struct that includes a pointer to another struct of the same type, hence linking the structs together into a chain or list. This is commonly used in C to create growable data structures, like a list of students in a course. Here's an example in C:

struct linked_list {
	long id; // the data in this link: one student ID
	struct linked_list *next; // the next node, or NULL if none
};
struct linked_list tail={7,NULL};
struct linked_list mid={4,&tail};
struct linked_list start={2,&mid};

struct linked_list *cur;
for (cur=&start; cur!=NULL; cur=cur->next) {
	printf("Node %p has id %ld\n",
	             cur,       cur->id);
}

(Try this in NetRun now!)

The loop is quite strange, starting at the first link in the chain, printing it, and then moving down the chain until we hit the NULL at the end.

In assembly language, the trick is remembering where each field of the struct lives:

linked list layout in memory

Here's the code to walk down a linked list in assembly:

push rbx
mov rbx,listStart

loopAgain:
	mov rdi,QWORD[rbx] ; load student ID
	extern print_int
	call print_int
	mov rbx,QWORD[rbx+8] ; move to next student
	cmp rbx,0 ; check for NULL
	jne loopAgain

pop rbx
ret

listStart:
	dq 2       ; offset 0: student ID
	dq listMid ; offset 8: next link

listMid:
	dq 4       ; offset 0: student ID
	dq listEnd ; offset 8: next link

listEnd:
	dq 7       ; offset 0: student ID
	dq 0       ; offset 8: next link (0 indicates the end of the list)

(Try this in NetRun now!)

Definitely try this! Pointers inside structs are how we build complicated data structures in assembly!