Structures and Classes
CS 301 Lecture, Dr. Lawlor
Structs
A "struct" or "class" is just an object that contains a bunch of subobjects, called "fields".
"class" is only available in C++. "struct" works in C++ or C. The only
difference between a "class" and a "struct" in C++ is that "class" requires a
"public:" keyword; "struct" is public by default.
So consider a declaration like:
struct bar {
int x;
int y;
};
This is a struct, bar, that contains two fields x and y. x and y
are usually laid out in memory right next to each other, so bar is a
total of 8 bytes long: the first 4 bytes are x, and the next 4 bytes
are y. In memory, it's exactly like a two-element array, to the
point where you can't always tell whether bar was a struct or array by
looking at the assembly.
There's a cool macro called "offsetof(struct,field)" that returns the
number of bytes between the start of the struct and the start of that
field. So "offsetof(bar,x)==0" bytes and "offsetof(bar,y)==4"
bytes, while "sizeof(bar)==8" bytes.
struct bar {
int x;
int y;
};
bar b;
std::cout<<"Location of bar: "<<(void *)&b<<"\n";
std::cout<<"Location of bar.x: "<<(void *)&b.x<<"\n";
std::cout<<"Location of bar.y: "<<(void *)&b.y<<"\n\n";
std::cout<<"sizeof bar: "<<sizeof(b)<<"\n";
std::cout<<"offsetof of bar.x: "<<offsetof(bar,x)<<"\n";
std::cout<<"offsetof of bar.y: "<<offsetof(bar,y)<<"\n";
return 0;
(executable NetRun link)
Here's what this program prints out. Notice that a pointer to the
struct has the same value as a pointer to the first element of the
struct.
Location of bar: 0xbfda3a80
Location of bar.x: 0xbfda3a80
Location of bar.y: 0xbfda3a84
sizeof bar: 8
offsetof of bar.x: 0
offsetof of bar.y: 4
Program complete. Return 0 (0x0)
In assembly, if edx was pointing to the start of bar,
mov [edx+4],eax
would set bar's y field to eax, because the y field starts 4 bytes from the start of bar.
So:
- The fields of a struct get laid out in memory at higher and
higher addresses, just like arrays have each index higher than the
previous index.
- A pointer to a struct is a pointer to the first byte of the first field of the struct, just like arrays.
- A
struct passed "by value" (e.g., func(bar b)) has the struct sitting
right on the stack. A struct passed by pointer or reference
(e.g., func(bar *b) or func(bar &b)) passes only a pointer on the
stack.
Structs in plain C
In plain C, not C++, "class" isn't a keyword, so you have to use
"struct". Also, "struct bar" isn't the same as "bar". So
you
need to use a typedef to make both a "struct tag" and an actual
typename at the same time:
typedef struct bar_tag {
int x;
int y;
} bar;
So now "bar" is a typedef for "struct bar_tag", which acts just a struct
like in C++. This is one case where the C++ version is so much
better the C way has been almost totally forgotten.
Structs and Alignment
"alignment" is where a 4-byte object must sit in memory at a pointer
divisible by 4. On some CPUs (PowerPC and DEC Alpha
are prominent examples), for example reading an int from an unalignedaddress
like 0x10000003 can be 30x to 1000x slower than reading from an
aligned address like 0x10000004! On x86, the penalty for
unaligned access is small--either unnoticable or at worst a twofold
slowdown. On MIPS or SPARC machines, an unaligned pointer access
can kill your program! Here's a runnable example:
static const char arr[8]={0xaa,0xbb,0xcc,0xdd,0xee,0xff,0x00,0x11};
int *matey=(int *)(arr+1); /* <- treat the middle of the array like an int */
return *matey;
(Try this in NetRun now!)
To avoid unaligned accesses, the compiler may insert "padding" (unused space) into your structs to improve alignment.
For example,
struct bar {
int x;
char z;
int y;
};
std::cout<<"sizeof(bar)=="<<sizeof(bar)<<"\n";
std::cout<<"offsetof(bar,x)=="<<offsetof(bar,x)<<"\n";
std::cout<<"offsetof(bar,z)=="<<offsetof(bar,z)<<"\n";
std::cout<<"offsetof(bar,y)=="<<offsetof(bar,y)<<"\n";
return 0;
(executable NetRun link)
In a perfect world, this would be a 9-byte struct: two 4-byte ints, and
one one-byte char. But to avoid an unaligned access to that last int,
the compiler sticks in 3 bytes of padding after the char.
Field
|
Size
|
x
|
4 bytes
|
z
|
1 byte
|
(padding)
|
3 bytes (to a total of 4)
|
y
|
4 bytes
|
On x86, char is 1-byte aligned (in other words, char never has
padding), short is 2-byte aligned (meaning the pointer must be a
multiple of 2), and everything else (int, long, long long, and even
double) are 4-byte aligned. On most other machines, including
PowerPC, a builtin type of N bytes must be N-byte aligned; so double is
on 8-byte alignment--a char followed by a double wastes 7 bytes for
alignment!
Fighting Padding Waste
Padding can cause wasted space, and cause very strange values for disk files and the network, so we often want to avoid padding.
- If everything's the same type (for example, all chars, or all
ints), alignment will be perfect and there will never be any padding.
- If the types already have good alignment (for example, four
chars, and then an int), the compiler won't stick in any extra
padding. Often you can help the compiler out by just declaring
types of the same size in the same place in the struct or class--first
all the chars, then all the shorts, then all the ints, etc.
- Some
compilers have command-line flags or source-code pragma options to
adjust alignment--but usually it's to increase the alignment
requirements, not eliminate them!
Curiously, padding ignores sub-struct boundaries--the compiler
will insert padding if the contained fields aren't aligned
properly. In particular, this means a struct/class containing
nothing but char fields will never itself need padding.
On some machines, like x86-64,
the stack itself is 16-byte aligned. This means if you need only
4 bytes of space on the stack, you have to take 16 bytes, and leave 12
bytes unused!
Extra: C++ Virtual Functions and "vtbl"
You can tell C about a whole group of functions that take the same
arguments and return types, and then "point" to one of those
functions. This is called a "function pointer", which in assembly
is just the address of the code to run. The ugliest part about
function pointers (by far!) is the syntax--I recommend you always use a
typedef to define a function pointer. For example, here's how you
make a new type "fn_ptr_t" that takes a short and returns an int:
typedef int (*fn_ptr_t)(short param);
Now you can declare variables of type "fn_ptr_t", assign compatible functions to them, and finally call them.
One common use of function pointers is to fake C++-style "virtual
methods" in C. Here's how to do very-C++-style virtual methods in C.
struct parent;
typedef void (*fn_parent_bar)(struct parent *this);
struct parent_vtable { /* "virtual method table", listing function pointers to all virtual methods */
fn_parent_bar bar;
};
struct parent {
const struct parent_vtable *vtable;
int v;
};
void parent_bar(struct parent *this) { printf("I'm the parent (v=%d)\n",this->v); }
void child_bar(struct parent *this) { printf("I'm the child (v=%d)\n",this->v); }
int foo(void) {
struct parent_vtable parent_class; /* set up vtables */
parent_class.bar=parent_bar; /* set up function pointer */
struct parent_vtable child_class;
child_class.bar=child_bar;
struct parent p;
p.vtable=&parent_class;
p.v=10;
p.vtable->bar(&p); /* call function pointer */
struct parent c; /* "child class", with different methods */
c.vtable=&child_class;
c.v=11;
c.vtable->bar(&c);
return 0;
}
(Try this in NetRun now!)
The only difference between C++ virtual methods and the above "struct field is a function pointer" trick are:
- The C++ syntax is nicer--just put "virtual" ahead of the method name, instead of declaring a separate function.
- The "this" pointer is automatically passed into C++ class methods, instead of manually like above.
- The compiler creates and lays out the second, smaller, readonly struct called the
"vtable". The vtable is pointed to by an invisible class member;
all instances of a given class point to the same vtable (for the class).
"sizeof" will actually show you the space used by the "vtable" pointer
in a C++ class with virtual methods--a class with virtual methods is 4
bytes bigger than a class without virtual methods, to leave space for
the vtable.