Structures and Classes

Structs

A "struct" or "class" is just an object that contains a bunch of subobjects, called "fields".

"class" is only available in C++. "struct" works in C++ or C. The only difference between a "class" and a "struct" in C++ is that "class" requires a "public:" keyword; "struct" is public by default.

So consider a declaration like:

struct bar {
int x;
int y;
};

This is a struct, bar, that contains two fields x and y. x and y are usually laid out in memory right next to each other, so bar is a total of 8 bytes long: the first 4 bytes are x, and the next 4 bytes are y. In memory, it's exactly like a two-element array, to the point where you can't always tell whether bar was a struct or array by looking at the assembly.

There's a cool macro called "offsetof(struct,field)" that returns the number of bytes between the start of the struct and the start of that field. So "offsetof(bar,x)==0" bytes and "offsetof(bar,y)==4" bytes, while "sizeof(bar)==8" bytes.

struct bar {
	int x;
	int y;
};
bar b;
std::cout<<"Location of bar: "<<(void *)&b<<"\n";
std::cout<<"Location of bar.x: "<<(void *)&b.x<<"\n";
std::cout<<"Location of bar.y: "<<(void *)&b.y<<"\n\n";

std::cout<<"sizeof bar: "<<sizeof(b)<<"\n";
std::cout<<"offsetof of bar.x: "<<offsetof(bar,x)<<"\n";
std::cout<<"offsetof of bar.y: "<<offsetof(bar,y)<<"\n";
return 0;

(executable NetRun link)

Here's what this program prints out. Notice that a pointer to the struct has the same value as a pointer to the first element of the struct.

Location of bar: 0xbfda3a80
Location of bar.x: 0xbfda3a80
Location of bar.y: 0xbfda3a84

sizeof bar: 8
offsetof of bar.x: 0
offsetof of bar.y: 4
Program complete.  Return 0 (0x0)

In assembly, if edx was pointing to the start of bar,
mov [edx+4],eax
would set bar's y field to eax, because the y field starts 4 bytes from the start of bar.

So:

The fields of a struct get laid out in memory at higher and higher addresses, just like arrays have each index higher than the previous index.
A pointer to a struct is a pointer to the first byte of the first field of the struct, just like arrays.
A struct passed "by value" (e.g., func(bar b)) has the struct sitting right on the stack. A struct passed by pointer or reference (e.g., func(bar *b) or func(bar &b)) passes only a pointer on the stack.

Structs in plain C

In plain C, not C++, "class" isn't a keyword, so you have to use "struct". Also, "struct bar" isn't the same as "bar". So you need to use a typedef to make both a "struct tag" and an actual typename at the same time:

typedef struct bar_tag {
int x;
int y;
} bar;

So now "bar" is a typedef for "struct bar_tag", which acts just a struct like in C++. This is one case where the C++ version is so much better the C way has been almost totally forgotten.

Structs and Alignment

"alignment" is where a 4-byte object must sit in memory at a pointer divisible by 4. On some CPUs (PowerPC and DEC Alpha are prominent examples), for example reading an int from an unalignedaddress like 0x10000003 can be 30x to 1000x slower than reading from an aligned address like 0x10000004! On x86, the penalty for unaligned access is small--either unnoticable or at worst a twofold slowdown. On MIPS or SPARC machines, an unaligned pointer access can kill your program! Here's a runnable example:

static const char arr[8]={0xaa,0xbb,0xcc,0xdd,0xee,0xff,0x00,0x11};
int *matey=(int *)(arr+1); /* <- treat the middle of the array like an int */
return *matey;

(Try this in NetRun now!)

To avoid unaligned accesses, the compiler may insert "padding" (unused space) into your structs to improve alignment.

For example,

struct bar {
	int x;
	char z;
	int y;
};

std::cout<<"sizeof(bar)=="<<sizeof(bar)<<"\n";
std::cout<<"offsetof(bar,x)=="<<offsetof(bar,x)<<"\n";
std::cout<<"offsetof(bar,z)=="<<offsetof(bar,z)<<"\n";
std::cout<<"offsetof(bar,y)=="<<offsetof(bar,y)<<"\n";
return 0;

(executable NetRun link)

In a perfect world, this would be a 9-byte struct: two 4-byte ints, and one one-byte char. But to avoid an unaligned access to that last int, the compiler sticks in 3 bytes of padding after the char.

Field	Size
x	4 bytes
z	1 byte
(padding)	3 bytes (to a total of 4)
y	4 bytes

On x86, char is 1-byte aligned (in other words, char never has padding), short is 2-byte aligned (meaning the pointer must be a multiple of 2), and everything else (int, long, long long, and even double) are 4-byte aligned. On most other machines, including PowerPC, a builtin type of N bytes must be N-byte aligned; so double is on 8-byte alignment--a char followed by a double wastes 7 bytes for alignment!

Fighting Padding Waste

Padding can cause wasted space, and cause very strange values for disk files and the network, so we often want to avoid padding.

If everything's the same type (for example, all chars, or all ints), alignment will be perfect and there will never be any padding.
If the types already have good alignment (for example, four chars, and then an int), the compiler won't stick in any extra padding. Often you can help the compiler out by just declaring types of the same size in the same place in the struct or class--first all the chars, then all the shorts, then all the ints, etc.
Some compilers have command-line flags or source-code pragma options to adjust alignment--but usually it's to increase the alignment requirements, not eliminate them!

Curiously, padding ignores sub-struct boundaries--the compiler will insert padding if the contained fields aren't aligned properly. In particular, this means a struct/class containing nothing but char fields will never itself need padding.

On some machines, like x86-64, the stack itself is 16-byte aligned. This means if you need only 4 bytes of space on the stack, you have to take 16 bytes, and leave 12 bytes unused!

Extra: C++ Virtual Functions and "vtbl"

You can tell C about a whole group of functions that take the same arguments and return types, and then "point" to one of those functions. This is called a "function pointer", which in assembly is just the address of the code to run. The ugliest part about function pointers (by far!) is the syntax--I recommend you always use a typedef to define a function pointer. For example, here's how you make a new type "fn_ptr_t" that takes a short and returns an int:

typedef int (*fn_ptr_t)(short param);

Now you can declare variables of type "fn_ptr_t", assign compatible functions to them, and finally call them.

One common use of function pointers is to fake C++-style "virtual methods" in C. Here's how to do very-C++-style virtual methods in C.

struct parent;
typedef void (*fn_parent_bar)(struct parent *this);
struct parent_vtable { /* "virtual method table", listing function pointers to all virtual methods */
	fn_parent_bar bar;	
};

struct parent {
	const struct parent_vtable *vtable;
	int v;
};

void parent_bar(struct parent *this) { printf("I'm the parent (v=%d)\n",this->v); }
void child_bar(struct parent *this) { printf("I'm the child (v=%d)\n",this->v); }

int foo(void) {
	struct parent_vtable parent_class;  /* set up vtables */
	parent_class.bar=parent_bar;  /* set up function pointer */
	struct parent_vtable child_class;
	child_class.bar=child_bar;

	struct parent p;
	p.vtable=&parent_class;
	p.v=10;
	p.vtable->bar(&p); /* call function pointer */
	
	struct parent c; /* "child class", with different methods */
	c.vtable=&child_class;
	c.v=11;
	c.vtable->bar(&c);

	return 0;
}

(Try this in NetRun now!)

The only difference between C++ virtual methods and the above "struct field is a function pointer" trick are:

The C++ syntax is nicer--just put "virtual" ahead of the method name, instead of declaring a separate function.
The "this" pointer is automatically passed into C++ class methods, instead of manually like above.
The compiler creates and lays out the second, smaller, readonly struct called the "vtable". The vtable is pointed to by an invisible class member; all instances of a given class point to the same vtable (for the class).

"sizeof" will actually show you the space used by the "vtable" pointer in a C++ class with virtual methods--a class with virtual methods is 4 bytes bigger than a class without virtual methods, to leave space for the vtable.