Pointers to Ints, Arrays, and Functions

CS 301 Lecture, Dr. Lawlor, 2005/10/17

So here's a C subroutine that increments the value i points to:
Executable NetRun Link
void bar(int *p) {
  (*p)++;
}
int foo(void) { 
int i=100;
bar(&i);
return i;
}

The equivalent subroutine in assembly might look like this:
Executable NetRun Link
bar:
  mov 4(%esp),%eax    # Load p (the pointer parameter) from the stack into eax
  incl (%eax)    # Increment the thing eax POINTS to (eax itself is unchanged!)
  ret
Note that the parenthesis are critical!  Just plain "incl %eax" increments the register; adding parenthesis treats %eax as a pointer, and increments the memory it points to.

Almost all instructions in x86 can mess with memory directly; so:
Messes with memory (the thing eax *points* to)
Messes with eax itself (and not memory)
incl (%eax)
incl %eax
mov $7, (%eax)
mov $7, %eax
add $15, (%eax)
add $15,%eax

That is, whether %eax is treated like a pointer or a variable depends on the instruction you use!  If this seems confusing, keep in mind that C also allows integers and pointers to be manipulated independently:
Messes with memory (the thing p *points* to)
Messes with p itself (and not what it points to)
(*p)++;
p++;
*p=7;
p=(int *)7;
*p+=15;
p+=15;

Stupid Pointer Tricks

On a 32-bit machine, all pointers are 4 bytes: pointers to ints, pointers to chars, pointers to doubles, and even pointers to functions.  You can get a "generic" pointer type with "void *": this tells the compiler not to worry about what it points to.  Any data type (but not functions) can be silently & implicitly converted to a "void *". 

You can explicitly convert between any two different pointer types with a typecast:
	int i;
int *p=&i; /* p points to i */
char *c=(char *)p; /* c has the same value as p; so it points to the first *char* of i */
*c=1; /* Sets the first *byte* of i to 1 */
For example, you can't actually do anything (in C) with a "void *".  But you can typecast it to some real pointer type and then do some work on it, like this:
void increment_generic(char whut,void *p) {
switch(whut) {
case 'i': (*(int *)p)++; break; /* Increments the int pointed to by p */
case 'c': (*(char *)p)++; break; /* Increments the char pointed to by p */
case 'd': (*(double *)p)++; break; /* Increments the double pointed to by p */
};
}
int foo(void) {
int i=100;
increment_generic('i',&i);
return i;
}
You tend to have to do this sort of thing for network I/O, where you want to accept input data of any type, and process it the right way using (mostly) the same code.

In assembly, there's no difference between a "void *", "int *", "char *", or even a plain "long".  (On a 32-bit machine, even "int" and pointers are interchangable.  On a 64-bit machine, an "int" might still be 4 bytes but a pointer 8 bytes; but a "long" is usually the same size as a pointer.)  This means you can (fairly) safely typecast pointers to longs, and back again:
	int i=3;
int *p=&i; /* p contains the address of i */
long j=(long)p; /* j also contains the address of i */
return *(int *)j; /* treat j as an int pointer */
In assembly, there's no difference between p and j!  They can both be stored in a register, copied to and from memory, etc.  So if you were sufficiently silly, you could just forgo pointers and declare all your variables and arguments as "long", and then typecast to pointers right before you use them.  The problem with this?  The same problem as assembly--*you* have to remember what's an int and what's a pointer, and screwing up either causes bizarre problems at runtime (bogus values, segfaults, etc).  If you use the type system correctly, the *compiler* keeps track of what is and isn't a pointer, and gives you decent errors when you screw them up.

Function Pointers

You can have a pointer to anything in memory: an int, a char, an array of ints or chars, or even a subroutine.  What's a pointer to a subroutine?  Just a pointer to the first byte of its first instruction!

The syntax for a function pointer is perfectly reasonable, but it's got one extra set of parenthesis wrapped around it.  It starts with an ordinary function prototype:
int some_fn(int some_arg);
Now you add a "typedef" to make "some_fn" a new type (not a function prototype):
typedef int some_fn(int some_arg);
Now add the silly parenthesis:
typedef int (some_fn)(int some_arg);
This is the function itself; you want a pointer to this function, so you add an asterix like this:
typedef int (*some_fn)(int some_arg);
There!  You can now declare variables as having type "some_fn", pass functions as arguments to subroutines, keep arrays of functions, etc:
Executable NetRun Link
typedef int (*some_fn)(int some_arg);
int add_three(int i) { return 3+i; }

int call_fn(some_fn f) { return f(100); }
int foo(void) {
some_fn f=add_three; /* Make a new function pointer "f", pointing to function "add_three" */
return call_fn(f);
}
Note that there *isn't* a function called "f" anywhere--"f" is the pointer to a function.

In assembly, if you've loaded up a pointer to a function into eax, you can just say "call *%eax" to jump to that function and start running there. Arguments go on the stack, and the return value goes in eax as usual.  So the equivalent of the C "call_fn" above is:
Executable NetRun Link
/* This assembly subroutine calls this function pointer with argument 100 */
extern "C" int call_fn(some_fn f);
__asm__(
"call_fn:\n"
" mov 4(%esp),%eax\n"
" push $100\n" /* Subroutine argument */
" call *%eax\n" /* Call subroutine */
" add $4, %esp\n" /* Pop off argument */
" ret\n"
);
Here's a horrifying example, where we treat a pointer to an *integer* as a pointer to a subroutine, and "execute" the integer!  Luckily, we've given the integer a special value that can be interpreted as an x86 machine language program:
Executable NetRun Link
int foo(void) {
/*This integer's bytes contain a tiny handcrafted subroutine:
0x31 0x0c == xor %eax, %eax (xor %eax by itself, giving zero)
0xc3 == ret (return from subroutine)
Recall that on x86 (a little-endian machine), the 0x31 comes *first* in memory.
*/
int i=0xc3c031;
typedef int (*my_fn)(void);
my_fn f=(my_fn)&i; /* f points to i, dressed up as a subroutine */
return f(); /* Call the function pointed to by f: the integer i! */
}