Assembly: Memory Layout & Array Indexing
CS 301 Lecture, Dr. Lawlor, 2005/10/07
How big are the various datatypes on your machine? "sizeof" tells
you the number of "char"s (which are the same as bytes on all modern
machines). "sizeof" works on any type or variable in a
program. So on a 32-bit
machine "sizeof(int)==4", since 32 bits is four bytes. Here's a
collection of normal type sizes on common machines:
|
x86-32
|
PowerPC-32
|
x86-64
|
PowerPC-64
|
Java
|
sizeof(char)
|
1
|
1
|
1
|
1
|
2 (same as C 'wchar_t')
|
sizeof(short)
|
2
|
2
|
2
|
2
|
2
|
sizeof(int)
|
4
|
4
|
4
|
4
|
4
|
sizeof(long)
|
4
|
4
|
8
|
8
|
8
|
sizeof(void *)
|
4
|
4
|
8
|
8
|
? (size not defined)
|
sizeof(float)
|
4
|
4
|
4
|
4
|
4
|
sizeof(double)
|
8
|
8
|
8
|
8
|
8
|
Note that the only difference between the 32-bit and 64-bit versions of
most machines is "long" and pointers--everything else stays the same
size, including (suprisingly) "int". All pointers are the same size on normal machines.
Array Indexing
So you've got an array somewhere, perhaps declared like "int
arr[16]". How is this array stored in memory--i.e., how can you
access this array from assembly?
Try running this code:
int arr[16];
for (int i=0;i<16;i++)
std::cout<<&arr[i]<<std::endl;
return 1;
On my Linux machine, this prints out:
0xbfffd2e0
0xbfffd2e4
0xbfffd2e8
0xbfffd2ec
0xbfffd2f0
0xbfffd2f4
...
That is, each entry in the array starts just 4 bytes after the one
before it. In general, an array of "T" objects will just be a
pile of T's next to each other in memory. A pointer to "the
array" is just a pointer to the first element. To get to A[1],
you've got to skip over A[0]--just add 4 bytes and you're there.
To get to A[2], you've got to jump 8 bytes past the start of the
array. To get to A[i], you've got to look 4*i bytes past the
start of the array.
So how do you get to the i'th array element? Just take a pointer
to the start of the array, and add 4*i bytes. That gives you a
pointer to the i'th element. You can then read, write, or modify
the element as needed.
The obvious way to do this in assembly is like this:
# Assume the array index i is in %ebx.
imul $4, %ebx # multiply index by size of each element (4 bytes, since sizeof(int)==4) to give a byte offset
add %eax, %ebx # Add offset to the address of the start of the array
mov $1234, (%ebx) # Copy a constant into that memory address: start + offset
Array indexing is a really common task, however, so x86 assembly
provides special support for it--any instruction that can access memory
(which is most instructions!) can use a special "scaled index plus
displacement" mode, which looks like this:
mov $1234, (%eax,%ebx,4) # Copy a constant into memory at address %eax + %ebx*4
This does the exact same thing (multiply, add, and memory access) as
the three lines of assembly above! It's probably slightly faster
than the three lines of assembly, mostly because the multiply isn't
totally general purpose--scaled index only works with a scale factor of
1, 2, 4, or 8; so if your array elements aren't this size, you gotta
write out an explicit multiply.
You sometimes see the funny scaled index mode used with the equally
funny "lea" instruction, or Load Effective Address. This
instruction computes an address exactly like "mov", but doesn't
actually *do* the memory access--it just stores the computed address
into a register. So
lea (%eax,%ebx,4), %ecx # Just does %ecx = %eax + %ebx*4 (doesn't touch memory!)
Compilers and devious assembly programmers will sometimes use "lea" for
normal arithmetic, but it's more common inside subroutines.
Struct Layout
A simple struct, like "plain", is laid out in memory *exactly* like an array: all the elements are just next to each other.
struct plain {
int i,j,k,l,m;
};
So this struct should have sizeof(me)==5*sizeof(int)==20 bytes (on a
32-bit machine). If I replace "int" with "short" above, I
get sizeof(me)==5*sizeof(short)==10 bytes (on almost all
machines). No suprises; and access is exactly the same as with an
array.
But unlike an array, a struct can hold elements of *different* types, like this (see
NetRun example):
struct me {
short i;
int j,k;
};
The suprising thing about this is that the compiler puts *four* bytes
of space between i and j, even though i is just *two* bytes!
The reason for this is called "alignment"--it's better to store
integers at an address that's a multiple of 4; or an "aligned
address". On some machines (e.g., DEC Alpha, PowerPC), unaligned
accesses will crash the machine (cause a "bus error"), or cause a
software trap that takes hundreds or thousands of times longer to
process than a normal memory access. On other machines, like x86,
unaligned accesses work, but they're just a bit slower than normal
accesses. See the IBM data alignment page for more details and a good example.
So for performance and correctness, the compiler puts two bytes of
padding after the "i" above, so that the access to "j" will be
aligned. Problems can arise because of misalignment anywhere--in
your code, stack data, or heap allocated data.