Pointers to Ints, Arrays, and Functions
CS 301 Lecture, Dr. Lawlor, 2005/10/17
So here's a C subroutine that increments the value i points to:
Executable NetRun Link
void bar(int *p) {
(*p)++;
}
int foo(void) {
int i=100;
bar(&i);
return i;
}
The equivalent subroutine in assembly might look like this:
Executable NetRun Link
bar:
mov 4(%esp),%eax # Load p (the pointer parameter) from the stack into eax
incl (%eax) # Increment the thing eax POINTS to (eax itself is unchanged!)
ret
Note that the parenthesis are critical! Just plain "incl %eax"
increments the register; adding parenthesis treats %eax as a pointer,
and increments the memory it points to.
Almost all instructions in x86 can mess with memory directly; so:
Messes with memory (the thing eax *points* to)
|
Messes with eax itself (and not memory)
|
incl (%eax)
|
incl %eax
|
mov $7, (%eax)
|
mov $7, %eax
|
add $15, (%eax)
|
add $15,%eax
|
That is, whether %eax is treated like a pointer or a variable depends
on the instruction you use! If this seems confusing, keep in mind
that C also allows integers and pointers to be manipulated
independently:
Messes with memory (the thing p *points* to)
|
Messes with p itself (and not what it points to)
|
(*p)++;
|
p++;
|
*p=7;
|
p=(int *)7;
|
*p+=15;
|
p+=15;
|
Stupid Pointer Tricks
On a 32-bit machine, all pointers are 4 bytes: pointers to ints,
pointers to chars, pointers to doubles, and even pointers to
functions. You can get a "generic" pointer type with "void *": this
tells the compiler not to worry about what it points to. Any data type
(but not functions) can be silently & implicitly converted to a
"void *".
You can explicitly convert between any two different pointer types with a typecast:
int i;
int *p=&i; /* p points to i */
char *c=(char *)p; /* c has the same value as p; so it points to the first *char* of i */
*c=1; /* Sets the first *byte* of i to 1 */
For example, you can't actually do anything (in C) with a "void *". But you can
typecast it to some real pointer type and then do some work on it, like this:
void increment_generic(char whut,void *p) {
switch(whut) {
case 'i': (*(int *)p)++; break; /* Increments the int pointed to by p */
case 'c': (*(char *)p)++; break; /* Increments the char pointed to by p */
case 'd': (*(double *)p)++; break; /* Increments the double pointed to by p */
};
}
int foo(void) {
int i=100;
increment_generic('i',&i);
return i;
}
You tend to have to do this sort of thing for network I/O, where you
want to accept input data of any type, and process it the right way
using (mostly) the same code.
In assembly, there's no difference between a "void *", "int *", "char
*", or even a plain "long". (On a 32-bit machine, even "int" and
pointers are interchangable. On a 64-bit machine, an "int" might still
be 4 bytes but a pointer 8 bytes; but a "long" is usually the same size
as a pointer.) This means you can (fairly) safely typecast pointers to
longs, and back again:
int i=3;
int *p=&i; /* p contains the address of i */
long j=(long)p; /* j also contains the address of i */
return *(int *)j; /* treat j as an int pointer */
In assembly, there's no difference between p and j! They can both
be
stored in a register, copied to and from memory, etc. So if you
were
sufficiently silly, you could just forgo pointers and declare all your
variables and arguments as "long", and then typecast to pointers right
before you use them. The problem with this? The same
problem as
assembly--*you* have to remember what's an int and what's
a pointer, and screwing up either causes bizarre problems at runtime
(bogus values, segfaults, etc). If you use the type system
correctly, the *compiler* keeps track of what is and isn't a pointer,
and gives you decent errors when you screw them up.
Function Pointers
You can have a pointer to anything in memory: an int, a char, an array
of ints or chars, or even a subroutine. What's a pointer to a
subroutine? Just a pointer to the first byte of its first
instruction!
The syntax for a function pointer is perfectly reasonable, but it's got
one extra set of parenthesis wrapped around it. It starts with an
ordinary function prototype:
int some_fn(int some_arg);
Now you add a "typedef" to make "some_fn" a new type (not a function prototype):
typedef int some_fn(int some_arg);
Now add the silly parenthesis:
typedef int (some_fn)(int some_arg);
This is the function itself; you want a pointer to this function, so you add an asterix like this:
typedef int (*some_fn)(int some_arg);
There! You can now declare variables as having type "some_fn",
pass functions as arguments to subroutines, keep arrays of functions,
etc:
Executable NetRun Link
typedef int (*some_fn)(int some_arg);
int add_three(int i) { return 3+i; }
int call_fn(some_fn f) { return f(100); }
int foo(void) {
some_fn f=add_three; /* Make a new function pointer "f", pointing to function "add_three" */
return call_fn(f);
}
Note that there *isn't* a function called "f" anywhere--"f" is the pointer to a function.
In assembly, if you've loaded up a pointer to a function into eax, you
can just say "call *%eax" to jump to that function and start running
there. Arguments go on the stack, and the return value goes in eax as
usual. So the equivalent of the C "call_fn" above is:
Executable NetRun Link
/* This assembly subroutine calls this function pointer with argument 100 */
extern "C" int call_fn(some_fn f);
__asm__(
"call_fn:\n"
" mov 4(%esp),%eax\n"
" push $100\n" /* Subroutine argument */
" call *%eax\n" /* Call subroutine */
" add $4, %esp\n" /* Pop off argument */
" ret\n"
);
Here's a horrifying example, where we treat a pointer to an *integer*
as a pointer to a subroutine, and "execute" the integer! Luckily,
we've given the integer a special value that can be interpreted as an
x86 machine language program:
Executable NetRun Link
int foo(void) {
/*This integer's bytes contain a tiny handcrafted subroutine:
0x31 0x0c == xor %eax, %eax (xor %eax by itself, giving zero)
0xc3 == ret (return from subroutine)
Recall that on x86 (a little-endian machine), the 0x31 comes *first* in memory.
*/
int i=0xc3c031;
typedef int (*my_fn)(void);
my_fn f=(my_fn)&i; /* f points to i, dressed up as a subroutine */
return f(); /* Call the function pointed to by f: the integer i! */
}