In-memory Ints, Arrays, and the Stack
CS 301 Lecture, Dr. Lawlor
To Review
Bytes are 8-bit values. You usually write a byte's value as two
hex digits (8 bits). In C/C++, the "char" or "unsigned char"
datatype is usually used to hold bytes.
Your computer's memory is just a bunch of bytes.
A pointer is just an integer index into the bytes of memory. You
can print out a pointer, and it looks like an ordinary 32-bit
value. Here's how:
char *p="mmrrrow?";
std::cout<<"The pointer value is "<<(void *)p<<"\n";
return 0;
(executable NetRun link)
(A "void *" is a pointer to... a something. You don't have to tell the compiler what!) This prints out:
The pointer value is 0x8049d9e
Program complete. Return 0 (0x0)
You can do arithmetic on a pointer, and change what it points to
("pointer arithmetic"). In assembly or hardcore C/C++, pointer
arithmetic is common. In high-level C++, pointer arithmetic is
quite rare (although iterators are made to look and act just like
pointers!). In Java, C#, Perl, or Javascript, pointer arithmetic
is not possible.
A C string is just a set of bytes in memory that each represent one ASCII
character. After the characters of the string, there's a zero byte
(the "NUL terminator", which sounds pretty scary!) to indicate the end
of the string.
void print_bytes(unsigned char *array,int nBytes);
int foo(void) {
char *p="mmrrrow?";
print_bytes((unsigned char *)p,9);
return 0;
}
void print_bytes(unsigned char *array,int nBytes) {
int i; printf("Bytes at %p:",array);
for (i=0;i<nBytes;i++)
printf(" 0x%02x",(int)array[i]);
printf("\n");
}
(executable NetRun link)
A function is just a set of bytes in memory that represent machine code
instructions. The CPU can "execute" these instructions if it
jumps to those bytes. You can print the machine code bytes of a
function just like those of a string:
void print_bytes(unsigned char *array,int nBytes);
int foo(void) {
print_bytes((unsigned char *)foo,20);
return 0;
}
void print_bytes(unsigned char *array,int nBytes) {
int i; printf("Bytes at %p:",array);
for (i=0;i<nBytes;i++)
printf(" 0x%02x",(int)array[i]);
printf("\n");
}
(executable NetRun link)
Arrays
An array is just a set of bytes in memory. In C/C++,
an array is represented by a pointer to the first element of the
array. Subsequent elements of the array are located higher and higher in memory (at higher and higher byte addresses).
So, in a char array, to access array element i, you just need to add i
bytes to the starting address for the array, and access
memory there. In C/C++, this add-and-access for a
"char *p" looks like "*(p+i)". Note that this means "p[i] ==
*(p+i)". (And, as an aside, "p[i]==*(p+i)==*(i+p)==i[p]" !).
Note that this means "*p" (access the pointer p) and "p[0]" (access the first element of the array p) are totally equivalent!
char *p="ZLQ!?";
std::cout<<" element zero is "<<p[0]<<"\n";
std::cout<<" dereferenced pointer is "<<*p<<"\n";
std::cout<<" element one is "<<p[1]<<"\n";
std::cout<<" dereferenced pointer-plus-one is "<<*(p+1)<<"\n";
std::cout<<" element one (backwards indexing) is "<<1[p]<<"\n";
(executable NetRun link)
So consider "char *p". You can think of this in any of several ways:
- This could be a pointer to a single character, like "char c='?';
p=&c;". You can look at the character with "*p".
- This could be a pointer to an array of characters, in which case
"p[3]" would return you the character at index 3, or equivalently
"*(p+3)".
- This could be a pointer to the start of a string, which is just an array of characters ending in a 0 byte, like p="Yo!";
- This could be a pointer to some memory that doesn't
represent printable characters at all, but bytes. "unsigned char"
is a bit more common for this, but plain "char" is also one byte
long. Remember, *everything* is bytes! The "print_bytes"
functions used above treat everything as a character array.
Integers
An "int" is just 4 bytes, representing the 32 bits of the
integer. Unfortunately, on x86 the bytes in an integer
*aren't* in the order you might expect--the low-order byte comes first:
void print_bytes(unsigned char *array,int nBytes) {
std::cout<<std::hex<<"Bytes at "<<(void *)array<<":";
for (int i=0;i<nBytes;i++)
std::cout<<" 0x"<<(int)array[i];
std::cout<<"\n";
}
int foo(void) {
int x=0xA0B1C2D3;
print_bytes((unsigned char *)&x,sizeof(x));
return 0;
}
(executable NetRun link)
On an ordinary x86 CPU, this code prints:
Bytes at 0xbfd96124: 0xd3 0xc2 0xb1 0xa0
Because on x86, a *little* endian machine, you're pointing to the *little* byte.
Yet on a PowerPC or MIPS CPU, this same code prints:
Bytes at 0x7ffffae8: 0xa0 0xb1 0xc2 0xd3
Because on PowerPC or MIPS, both *big* endian machines, you're pointing to the *big* byte.
In *both* cases, subsequent bytes in an integer are higher in memory (at bigger and bigger addresses) than the first byte.
Arrays of Integers
An integer array is pretty simple. Your pointer points to the
first byte of the first int. Subsequent bytes give you the rest
of the first int, then you get the first byte of the next int, and so
on:
void print_bytes(unsigned char *array,int nBytes) {
std::cout<<std::hex<<"Bytes at "<<(void *)array<<":";
for (int i=0;i<nBytes;i++)
std::cout<<" 0x"<<(int)array[i];
std::cout<<"\n";
}
int foo(void) {
int x[]={
0xA0B1C2D3,
0xE0E1E2E3,
0xF0F1F2F3
};
print_bytes((unsigned char *)&x,sizeof(x));
return 0;
}
(executable NetRun link)
So the bytes in our 3-int array (on our little-endian machine) are:
0xd3 0xc2 0xb1 0xa0 0xe3 0xe2 0xe1 0xe0 0xf3 0xf2 0xf1 0xf0
So overall, we've seen that strings, arrays, and integers all "grow up", starting at an address and continuing to higher and higher addresses.
The Stack
In fact, the only data structure on most computers that "grows down" (moving to lower and lower addresses) is the stack.
This is unfortunate, but every machine's stack seems to grow down.
So you can fake a "push eax" with a "sub esp,4" followed by a
"mov [esp],eax" (the brackets make the assembler access the
memory esp *points* to). You can "pop" an integer off the stack
by just "add esp,4", possibly with a "mov eax,[esp]" beforehand to load the value.
Curiously, because function arguments are usually pushed right-to-left,
the rightmost (last) function argument gets pushed onto the stack
first, so it gets the highest address. The leftmost (first)
argument gets pushed last, so it gets the lowest address. This
means your function arguments are laid out in memory exactly like a
little array!
void print_bytes(unsigned char *array,int nBytes) {
std::cout<<std::hex<<"Bytes at "<<(void *)array<<":";
for (int i=0;i<nBytes;i++)
std::cout<<" 0x"<<(int)array[i];
std::cout<<"\n";
}
void bar(int a,int b,int c) {
print_bytes((unsigned char *)&a,12);
}
int foo(void) {
bar(0xA0B1C2D3, 0xE0E1E2E3, 0xF0F1F2F3);
return 0;
}
(executable NetRun link)
This prints out
0xd3 0xc2 0xb1 0xa0 0xe3 0xe2 0xe1 0xe0 0xf3 0xf2 0xf1 0xf0
which is exactly like our array example above.