Structs, Alignment, and Padding

CS 301 Lecture, Dr. Lawlor

(Also read Chapters 3.9 and 3.10 of the Bryant and O'Hallaron textbook for more info on structs)

Structs

A "struct" or "class" is just an object that contains a bunch of subobjects, called "fields".  So

struct bar {
    int x;
    int y;
};

This is a struct, bar, that contains two fields x and y.  x and y are usually laid out in memory right next to each other, so bar is a total of 8 bytes long: the first 4 bytes are x, and the last 4 bytes are y.

There's a cool macro called "offsetof(struct,field)" that returns the number of bytes between the start of the struct and the start of that field.  So "offsetof(bar,x)==0" bytes and "offsetof(bar,y)==4" bytes, while "sizeof(bar)==8" bytes. 

In assembly, if eax was pointing to the start of bar,
    mov DWORD [bar+4],0x117
would set bar's y field to 0x117, because the y field starts 4 bytes from the start of bar.

Structs in C

In plain C, not C++, "struct bar" isn't the same as "bar".  So you need to use a typedef to make both a "struct tag" and an actual typename at the same time:

typedef struct bar_tag {
    int x;
    int y;
} bar;

So now "bar" is a typedef for "struct bar_tag", which is just a struct like in C++.  This is one case where the C++ version is so much better the C way has been almost totally forgotten.

Structs and Alignment

"alignment" is where a 4-byte object must sit in memory at a pointer divisible by 4.    On some CPUs (PowerPC and DEC Alpha are prominent examples), for example reading an int from an unaligned address like 0x10000003 can be 1000x slower than reading from an aligned address like 0x10000004!  The penalty for unaligned access on x86 machines is usually undetectable (a few percent at best), but is occasionally a fewfold slowdown.

To avoid unaligned accesses, the compiler may insert "padding" (unused space) into your structs to improve alignment.

For example,
struct bar {
int x;
char z;
int y;
};

std::cout<<"sizeof(bar)=="<<sizeof(bar)<<"\n";
std::cout<<"offsetof(bar,x)=="<<offsetof(bar,x)<<"\n";
std::cout<<"offsetof(bar,z)=="<<offsetof(bar,z)<<"\n";
std::cout<<"offsetof(bar,y)=="<<offsetof(bar,y)<<"\n";
return 0;
(executable NetRun link)

In a perfect world, this would be a 9-byte struct: two 4-byte ints, and one one-byte char.  But to avoid an unaligned access to the int, the compiler sticks in 3 bytes of padding after the char.

Field
Size
x
4 bytes
z
1 byte
(padding)
3 bytes (to a total of 4)
y
4 bytes

On x86, char is 1-byte aligned (in other words, char never has padding), short is 2-byte aligned (meaning the pointer must be a multiple of 2), and everything else (int, long, long long, and even double) are 4-byte aligned.  On most other machines, including PowerPC, a builtin type of N bytes must be N-byte aligned; so double is on 8-byte alignment--a char followed by a double wastes 7 bytes for alignment!

Fighting Padding Waste

Padding can cause wasted space, and cause very strange values for disk files and the network, so we often want to avoid padding.

Bitfields

A "bitfield" is a struct where you tell the compiler you only care about a subset of the bits in each field.  The syntax is just to put a colon and a number of bits after each field.  For example, "t" is just 2 bits long here because of the ":2"
struct bar {
unsigned char src:3;
unsigned char dest:3;
unsigned char t:2; /* just 2 bits long! */
};

bar b;
b.t=3;
b.dest=6;
b.src=2;

printf("b in octal is 0%o\n", *(unsigned char *)&b);
return sizeof(b);
(executable NetRun link)

The overall struct is just 1 byte, 8 bits, which is pretty cool.  This example is actually the funk_emu 03ds byte, which is the x86 ModR/M byte.

Be warned that the usual padding and alignment rules apply even to bitfields; so replacing "unsigned char" with "int" above results in a 4-byte struct, because the compiler makes sure "int"s are 4-byte aligned, even in a bitfield!