In-memory Ints, Arrays, and the Stack

To Review

Bytes are 8-bit values. You usually write a byte's value as two hex digits (8 bits). In C/C++, the "char" or "unsigned char" datatype is usually used to hold bytes.

Your computer's memory is just a bunch of bytes.

A pointer is just an integer index into the bytes of memory. You can print out a pointer, and it looks like an ordinary 32-bit value. Here's how:

char *p="mmrrrow?";
std::cout<<"The pointer value is "<<(void *)p<<"\n";
return 0;

(executable NetRun link)

(A "void *" is a pointer to... a something. You don't have to tell the compiler what!) This prints out:

The pointer value is 0x8049d9e
Program complete.  Return 0 (0x0)

You can do arithmetic on a pointer, and change what it points to ("pointer arithmetic"). In assembly or hardcore C/C++, pointer arithmetic is common. In high-level C++, pointer arithmetic is quite rare (although iterators are made to look and act just like pointers!). In Java, C#, Perl, or Javascript, pointer arithmetic is not possible.

A C string is just a set of bytes in memory that each represent one ASCII character. After the characters of the string, there's a zero byte (the "NUL terminator", which sounds pretty scary!) to indicate the end of the string.

void print_bytes(unsigned char *array,int nBytes);
int foo(void) {
	char *p="mmrrrow?";
	print_bytes((unsigned char *)p,9);
	return 0;
}

void print_bytes(unsigned char *array,int nBytes) {
	int i;  printf("Bytes at %p:",array);
	for (i=0;i<nBytes;i++)
		printf(" 0x%02x",(int)array[i]);
	printf("\n");
}

(executable NetRun link)

A function is just a set of bytes in memory that represent machine code instructions. The CPU can "execute" these instructions if it jumps to those bytes. You can print the machine code bytes of a function just like those of a string:

void print_bytes(unsigned char *array,int nBytes);
int foo(void) {
	print_bytes((unsigned char *)foo,20);
	return 0;
}

void print_bytes(unsigned char *array,int nBytes) {
	int i;  printf("Bytes at %p:",array);
	for (i=0;i<nBytes;i++)
		printf(" 0x%02x",(int)array[i]);
	printf("\n");
}

(executable NetRun link)

Arrays

An array is just a set of bytes in memory. In C/C++, an array is represented by a pointer to the first element of the array. Subsequent elements of the array are located higher and higher in memory (at higher and higher byte addresses).

So, in a char array, to access array element i, you just need to add i bytes to the starting address for the array, and access memory there. In C/C++, this add-and-access for a "char *p" looks like "*(p+i)". Note that this means "p[i] == *(p+i)". (And, as an aside, "p[i]==*(p+i)==*(i+p)==i[p]" !).

Note that this means "*p" (access the pointer p) and "p[0]" (access the first element of the array p) are totally equivalent!

char *p="ZLQ!?";
std::cout<<" element zero is "<<p[0]<<"\n";
std::cout<<" dereferenced pointer is "<<*p<<"\n";
std::cout<<" element one is "<<p[1]<<"\n";
std::cout<<" dereferenced pointer-plus-one is "<<*(p+1)<<"\n";
std::cout<<" element one (backwards indexing) is "<<1[p]<<"\n";

(executable NetRun link)

So consider "char *p". You can think of this in any of several ways:

This could be a pointer to a single character, like "char c='?'; p=&c;". You can look at the character with "*p".
This could be a pointer to an array of characters, in which case "p[3]" would return you the character at index 3, or equivalently "*(p+3)".
This could be a pointer to the start of a string, which is just an array of characters ending in a 0 byte, like p="Yo!";
This could be a pointer to some memory that doesn't represent printable characters at all, but bytes. "unsigned char" is a bit more common for this, but plain "char" is also one byte long. Remember, *everything* is bytes! The "print_bytes" functions used above treat everything as a character array.

Integers

An "int" is just 4 bytes, representing the 32 bits of the integer. Unfortunately, on x86 the bytes in an integer *aren't* in the order you might expect--the low-order byte comes first:

void print_bytes(unsigned char *array,int nBytes) {
	std::cout<<std::hex<<"Bytes at "<<(void *)array<<":";
	for (int i=0;i<nBytes;i++)
		std::cout<<" 0x"<<(int)array[i];
	std::cout<<"\n";
}

int foo(void) {
	int x=0xA0B1C2D3;
	print_bytes((unsigned char *)&x,sizeof(x));
	
	return 0;
}

(executable NetRun link)

On an ordinary x86 CPU, this code prints:

Bytes at 0xbfd96124: 0xd3 0xc2 0xb1 0xa0

Because on x86, a *little* endian machine, you're pointing to the *little* byte.

Yet on a PowerPC or MIPS CPU, this same code prints:

Bytes at 0x7ffffae8: 0xa0 0xb1 0xc2 0xd3

Because on PowerPC or MIPS, both *big* endian machines, you're pointing to the *big* byte.

In *both* cases, subsequent bytes in an integer are higher in memory (at bigger and bigger addresses) than the first byte.

Arrays of Integers

An integer array is pretty simple. Your pointer points to the first byte of the first int. Subsequent bytes give you the rest of the first int, then you get the first byte of the next int, and so on:

void print_bytes(unsigned char *array,int nBytes) {
	std::cout<<std::hex<<"Bytes at "<<(void *)array<<":";
	for (int i=0;i<nBytes;i++)
		std::cout<<" 0x"<<(int)array[i];
	std::cout<<"\n";
}

int foo(void) {
	int x[]={
		0xA0B1C2D3,
		0xE0E1E2E3,
		0xF0F1F2F3
	};
	print_bytes((unsigned char *)&x,sizeof(x));
	
	return 0;
}

(executable NetRun link)

So the bytes in our 3-int array (on our little-endian machine) are:

0xd3 0xc2 0xb1 0xa0 0xe3 0xe2 0xe1 0xe0 0xf3 0xf2 0xf1 0xf0

So overall, we've seen that strings, arrays, and integers all "grow up", starting at an address and continuing to higher and higher addresses.

The Stack

In fact, the only data structure on most computers that "grows down" (moving to lower and lower addresses) is the stack.

This is unfortunate, but every machine's stack seems to grow down.

So you can fake a "push eax" with a "sub esp,4" followed by a "mov [esp],eax" (the brackets make the assembler access the memory esp *points* to). You can "pop" an integer off the stack by just "add esp,4", possibly with a "mov eax,[esp]" beforehand to load the value.

Curiously, because function arguments are usually pushed right-to-left, the rightmost (last) function argument gets pushed onto the stack first, so it gets the highest address. The leftmost (first) argument gets pushed last, so it gets the lowest address. This means your function arguments are laid out in memory exactly like a little array!

void print_bytes(unsigned char *array,int nBytes) {
	std::cout<<std::hex<<"Bytes at "<<(void *)array<<":";
	for (int i=0;i<nBytes;i++)
		std::cout<<" 0x"<<(int)array[i];
	std::cout<<"\n";
}
void bar(int a,int b,int c) {
	print_bytes((unsigned char *)&a,12);
}

int foo(void) {
	bar(0xA0B1C2D3, 0xE0E1E2E3, 0xF0F1F2F3);
	return 0;
}

(executable NetRun link)

This prints out

0xd3 0xc2 0xb1 0xa0 0xe3 0xe2 0xe1 0xe0 0xf3 0xf2 0xf1 0xf0

which is exactly like our array example above.