Assembly: Memory Layout & Array Indexing

CS 301 Lecture, Dr. Lawlor, 2005/10/07

How big are the various datatypes on your machine? "sizeof" tells you the number of "char"s (which are the same as bytes on all modern machines). "sizeof" works on any type or variable in a program. So on a 32-bit machine "sizeof(int)==4", since 32 bits is four bytes. Here's a collection of normal type sizes on common machines:

	x86-32	PowerPC-32	x86-64	PowerPC-64	Java
sizeof(char)	1	1	1	1	2 (same as C 'wchar_t')
sizeof(short)	2	2	2	2	2
sizeof(int)	4	4	4	4	4
sizeof(long)	4	4	8	8	8
sizeof(void *)	4	4	8	8	? (size not defined)
sizeof(float)	4	4	4	4	4
sizeof(double)	8	8	8	8	8

Note that the only difference between the 32-bit and 64-bit versions of most machines is "long" and pointers--everything else stays the same size, including (suprisingly) "int". All pointers are the same size on normal machines.

Array Indexing

So you've got an array somewhere, perhaps declared like "int arr[16]". How is this array stored in memory--i.e., how can you access this array from assembly?

Try running this code:

int arr[16];
for (int i=0;i<16;i++)
std::cout<<&arr[i]<<std::endl;
return 1;

On my Linux machine, this prints out:

0xbfffd2e0
0xbfffd2e4
0xbfffd2e8
0xbfffd2ec
0xbfffd2f0
0xbfffd2f4
...

That is, each entry in the array starts just 4 bytes after the one before it. In general, an array of "T" objects will just be a pile of T's next to each other in memory. A pointer to "the array" is just a pointer to the first element. To get to A[1], you've got to skip over A[0]--just add 4 bytes and you're there. To get to A[2], you've got to jump 8 bytes past the start of the array. To get to A[i], you've got to look 4*i bytes past the start of the array.

So how do you get to the i'th array element? Just take a pointer to the start of the array, and add 4*i bytes. That gives you a pointer to the i'th element. You can then read, write, or modify the element as needed.

The obvious way to do this in assembly is like this:

# Assume the array index i is in %ebx.
imul $4, %ebx  # multiply index by size of each element (4 bytes, since sizeof(int)==4) to give a byte offset 
add %eax, %ebx # Add offset to the address of the start of the array
mov $1234, (%ebx)  #  Copy a constant into that memory address: start + offset

Array indexing is a really common task, however, so x86 assembly provides special support for it--any instruction that can access memory (which is most instructions!) can use a special "scaled index plus displacement" mode, which looks like this:

mov $1234, (%eax,%ebx,4)  #  Copy a constant into memory at address %eax + %ebx*4

This does the exact same thing (multiply, add, and memory access) as the three lines of assembly above! It's probably slightly faster than the three lines of assembly, mostly because the multiply isn't totally general purpose--scaled index only works with a scale factor of 1, 2, 4, or 8; so if your array elements aren't this size, you gotta write out an explicit multiply.

You sometimes see the funny scaled index mode used with the equally funny "lea" instruction, or Load Effective Address. This instruction computes an address exactly like "mov", but doesn't actually *do* the memory access--it just stores the computed address into a register. So

lea (%eax,%ebx,4), %ecx  # Just does %ecx = %eax + %ebx*4  (doesn't touch memory!)

Compilers and devious assembly programmers will sometimes use "lea" for normal arithmetic, but it's more common inside subroutines.

Struct Layout

A simple struct, like "plain", is laid out in memory *exactly* like an array: all the elements are just next to each other.

struct plain {
    int i,j,k,l,m;
};

So this struct should have sizeof(me)==5*sizeof(int)==20 bytes (on a 32-bit machine).   If I replace "int" with "short" above, I get sizeof(me)==5*sizeof(short)==10 bytes (on almost all machines). No suprises; and access is exactly the same as with an array.

But unlike an array, a struct can hold elements of *different* types, like this (see NetRun example):
struct me {
    short i;
    int j,k;
};

The suprising thing about this is that the compiler puts *four* bytes of space between i and j, even though i is just *two* bytes!

The reason for this is called "alignment"--it's better to store integers at an address that's a multiple of 4; or an "aligned address". On some machines (e.g., DEC Alpha, PowerPC), unaligned accesses will crash the machine (cause a "bus error"), or cause a software trap that takes hundreds or thousands of times longer to process than a normal memory access. On other machines, like x86, unaligned accesses work, but they're just a bit slower than normal accesses.   See the IBM data alignment page for more details and a good example.

So for performance and correctness, the compiler puts two bytes of padding after the "i" above, so that the access to "j" will be aligned. Problems can arise because of misalignment anywhere--in your code, stack data, or heap allocated data.