Pointers in C and Assembly

CS 301 Lecture, Dr. Lawlor

Here's some code to access an array in C++:
int *arr=new int[10];
for (int i=0;i<10;i++)
arr[i]=(i*10);
return arr[7];
(executable NetRun link)

There are a bunch of weird symbols in C and C++ used to work with pointers:
Note that the last line can be written in any of these ways, which are all equivalent:
It's a curious fact, but arrays are usually indistinguishable from pointers in C or C++.

Pointers in plain old C

C can be thought of as just a portable assembler--almost every construct in C corresponds one-to-one with a single line of assembly.

So you can actually learn a lot about how to access memory using pointers by writing some low-level, old school C code.

Here's what you'd write in plain C to do the array arithmetic above.  Be careful!  The "malloc" routine takes as input the number of *bytes* to allocate, not the number of *ints*.  We're using the "sizeof" operator to return the number of bytes in an int.
int i;
int *arr=malloc(10*sizeof(int));
for (i=0;i<10;i++)
arr[i]=(i*10);
return arr[7];
(executable NetRun link)

In assembly, you can store a pointer in any register, such as eax, and do pointer arithmetic using the normal arithmetic instructions.  So
    add eax,28
might be a regular arithmetic operation (if you're thinking of eax as a normal value), or it might be pointer arithmetic (if you're thinking of eax as a pointer).  You can't tell without seeing where eax was loaded from, and how it's used--there is no type information in assembly!

To dereference a pointer in assembly, you write it in brackets, like "[eax]".  This treats eax as a pointer, and accesses the memory it points to. 

To summarize,
C/C++
Assembly
int *p;
; Not needed--no types!  (Woo hoo!)  Er, now be careful, kids...
p=malloc(40);
push 40; Function arguments go on the stack in 32-bit x86
extern malloc
call malloc
add esp,4; Undo "push"
; Malloc's return value, a pointer, comes back in eax
p++;
add eax,4; Subtle: advance 1 int, by advancing 4 bytes
int i=*p;
mov ecx, [eax]; Treat eax as a pointer, and copy out the value it points to

Because ecx is 32 bits, the "mov" above is a 32-bit move--it reads 4 bytes from memory.

But because assembly doesn't have type information, sometimes the assembler can't figure out what you mean by a line, like "mov [eax],3".  Clearly, this sets something to 3.  But do you want to have eax pointing to a single byte (or char), a short, a long, or what?  This will manifest itself as the fairly self-explanatory "error: operation size not specified" in NASM, but sadly you get the confusing "invalid combination of opcode and operands" in YASM.

The solution to this is just to tell the assembler what size data you're trying to point to.  The sizes available in NASM/YASM are: