Pointer Arithmetic in C and Assembly

Pointers in C++ are considered fairly tricky. Part of the problem is data types: everybody understands an "int", but what is an "int *" (pointer to int) really?

Pointers in assembly language have much simpler syntax: BYTE [rax] means go out to memory and grab one byte at the address stored in register rax. That address is always measured in bytes, and is called a "pointer", but it's just a number in rax.

THE key to understanding pointers is to realize that everything in memory is stored as a flat one-dimensional sequence of bytes. The first byte (top of the screen) is 0x00000000, and the last byte (bottom of the screen) is 0xFFFF...FFF. To get to the next byte, add one to the pointer. To get to the next 32-bit int, add four (bytes) to the pointer. To get to the next 64-bit long, add eight bytes to the pointer.

Human	C++	Assembly
Declare a long integer.	long y;	rdx (nothing to declare, just use a register)
Copy one long integer to another.	y=x;	mov rdx,rax
Declare a pointer to an long.	long *p;	rax (nothing to declare, use any 64-bit register)
Dereference (look up) the long.	y=*p;	mov rdx,QWORD [rax]
Find the address of a long.	p=&y;	mov rax,place_you_stored_Y
Access an array (easy way)	y=p[2];	(sorry, no easy way exists!)
Access an array (hard way)	p=p+2; y=*p;	add rax,2*8; (move forward by two 8 byte longs) mov rdx,QWORD [rax] ; (grab that long)
Access an array (too clever)	y=*(p+2)	mov rdx,QWORD [rax+2*8]; (yes, that actually works!)

Loading from the wrong place, or loading the wrong amount of data, is an INCREDIBLY COMMON problem when using pointers, in any language. You WILL make this mistake at some point over the course of the semester, so be careful!

Pointer Typecasting

Assembly doesn't have types, so you can access BYTE [rax] and then QWORD [rax] without any other comment. This makes it easy to accidentally access the wrong amount of data, usually leading to a weird crash.

In C or C++, pointers have types, which is designed to keep you from accidentally accessing the wrong data type. When doing the low-level weirdness common in this class, we often want to switch pointer types, accessing memory that normally stores one type as a different type.

The syntax is just a typecast, (new_type)old_pointer:

int v1=0x44332211; // an int in memory
int *p1=&v1; // p1 points to v1
short *p2; 
p2=(short *)p1; // typecast!
short v2=*p2; // derereference pointer
return v2;

(Try this in NetRun now!)

This returns 0x2211, which is the first 2 bytes of v1.

You can fold all of this together into one confusing line:

int v1=0x44332211; // an int in memory
return *(short *)&v1; // dereference short-casted pointer to v1

(Try this in NetRun now!)

This line is a little more like assembly, where we would just say "mov ax, WORD[v1]"

You can typecast any pointer into any other pointer. In particular, it's often useful to typecast a complex object into a "char *", and access sizeof bytes as bytes. This is commonly used for network communication, disk I/O, and stranger things like checkpointing objects.

It's also possibly to typecast bytes in memory into a function pointer, and call the code as if it was machine code. For example, 0xc3 is a single instruction "ret". You can call this as if it was a function like this:

const static unsigned char code=0xc3; // one byte of machine code: "ret"
typedef long (*funptr)(void); // function pointer typedef
funptr f=(funptr)&code; // convert to callable code
return f(); // call the code

(Try this in NetRun now!)

Here's the equivalent in assembly language.

call code
ret

code:
	db 0xc3

(Try this in NetRun now!)

CS 301 Lecture Note, 2014, Dr. Orion Lawlor, UAF Computer Science Department.