Classes and Structures

CS 301 Lecture, Dr. Lawlor
To review:

Structs

A "struct" or "class" is just an object that contains a bunch of subobjects, called "fields".

"class" is only in C++. "struct" is the C way. The only difference between a "class" and a "struct" is that "class" requires a "public:" keyword; "struct" is public by default.

So consider a declaration like:

struct bar {
int x;
int y;
};

This is a struct, bar, that contains two fields x and y. x and y are usually laid out in memory right next to each other, so bar is a total of 8 bytes long: the first 4 bytes are x, and the next 4 bytes are y.

There's a cool macro called "offsetof(struct,field)" that returns the number of bytes between the start of the struct and the start of that field. So "offsetof(bar,x)==0" bytes and "offsetof(bar,y)==4" bytes, while "sizeof(bar)==8" bytes.

struct bar {
	int x;
	int y;
};
bar b;
std::cout<<"Location of bar: "<<(void *)&b<<"\n";
std::cout<<"Location of bar.x: "<<(void *)&b.x<<"\n";
std::cout<<"Location of bar.y: "<<(void *)&b.y<<"\n\n";

std::cout<<"sizeof bar: "<<sizeof(b)<<"\n";
std::cout<<"offsetof of bar.x: "<<offsetof(bar,x)<<"\n";
std::cout<<"offsetof of bar.y: "<<offsetof(bar,y)<<"\n";
return 0;

(executable NetRun link)

Here's what this program prints out. Notice that a pointer to the struct has the same value as a pointer to the first element of the struct.

Location of bar: 0xbfda3a80
Location of bar.x: 0xbfda3a80
Location of bar.y: 0xbfda3a84

sizeof bar: 8
offsetof of bar.x: 0
offsetof of bar.y: 4
Program complete.  Return 0 (0x0)

In assembly, if edx was pointing to the start of bar,
mov [edx+4],eax
would set bar's y field to eax, because the y field starts 4 bytes from the start of bar.

So:

The fields of a struct get laid out in memory at higher and higher addresses, just like arrays have each index higher than the previous index.
A pointer to a struct is a pointer to the first byte of the first field of the struct, just like arrays.

Structs in plain C

In plain C, not C++, "class" isn't a keyword, so you have to use "struct". Also, "struct bar" isn't the same as "bar". So you need to use a typedef to make both a "struct tag" and an actual typename at the same time:

typedef struct bar_tag {
int x;
int y;
} bar;

So now "bar" is a typedef for "struct bar_tag", which is just a struct like in C++. This is one case where the C++ version is so much better the C way has been almost totally forgotten.

Structs and Alignment

"alignment" is where a 4-byte object must sit in memory at a pointer divisible by 4. On some CPUs (PowerPC and DEC Alpha are prominent examples), for example reading an int from an unaligned address like 0x10000003 can be 1000x slower than reading from an aligned address like 0x10000004! The penalty for unaligned access on x86 machines is small--either unnoticable or at worst a twofold slowdown. On MIPS or SPARC machines, an unaligned pointer access can kill your program, like so:

unsigned char arr[10];
int *matey=(int *)(arr+1); /* UNALIGNED int pointer! */
std::cout<<"About to access pointer at "<<matey<<"\n";
(*matey) = 0xF00dBbad;
std::cout<<"Pointer access went fine!\n";
return arr[3];

(executable NetRun link)

To avoid unaligned accesses, the compiler may insert "padding" (unused space) into your structs to improve alignment.

For example,

struct bar {
	int x;
	char z;
	int y;
};

std::cout<<"sizeof(bar)=="<<sizeof(bar)<<"\n";
std::cout<<"offsetof(bar,x)=="<<offsetof(bar,x)<<"\n";
std::cout<<"offsetof(bar,z)=="<<offsetof(bar,z)<<"\n";
std::cout<<"offsetof(bar,y)=="<<offsetof(bar,y)<<"\n";
return 0;

(executable NetRun link)

In a perfect world, this would be a 9-byte struct: two 4-byte ints, and one one-byte char. But to avoid an unaligned access to the int, the compiler sticks in 3 bytes of padding after the char.

Field	Size
x	4 bytes
z	1 byte
(padding)	3 bytes (to a total of 4)
y	4 bytes

On x86, char is 1-byte aligned (in other words, char never has padding), short is 2-byte aligned (meaning the pointer must be a multiple of 2), and everything else (int, long, long long, and even double) are 4-byte aligned. On most other machines, including PowerPC, a builtin type of N bytes must be N-byte aligned; so double is on 8-byte alignment--a char followed by a double wastes 7 bytes for alignment!

Fighting Padding Waste

Padding can cause wasted space, and cause very strange values for disk files and the network, so we often want to avoid padding.

If everything's the same type (for example, all chars, or all ints), alignment will be perfect and there will never be any padding.
If the types already have good alignment (for example, four chars, and then an int), the compiler won't stick in any extra padding. Often you can help the compiler out by just declaring types of the same size in the same place in the struct or class--first all the chars, then all the shorts, then all the ints, etc.
Some compilers have command-line flags or source-code pragma options to adjust alignment--but usually it's to increase the alignment requirements, not eliminate them!

Curiously, padding ignores sub-struct boundaries--the compiler will insert padding if the contained fields aren't aligned properly. In particular, this means a struct/class containing nothing but char fields will never itself need padding.

On some machines, like x86-64, the stack itself is 16-byte aligned. This means if you need only 4 bytes of space on the stack, you have to take 16 bytes, and leave 12 bytes unused!