Table-Driven Programming and Machine Code

CS 301 Lecture, Dr. Lawlor

So C++ allows this handy "array initializer syntax" for setting up a 1D array:
const int arr[]={7,9,23};

(Try this in NetRun now!)

This makes an array, "arr", containing three ints: seven, nine, and twenty-three.

It's the same as declaring:
    int arr[3];
    arr[0]=7;
    arr[1]=9;
    arr[2]=23;
But clearly the initializer is shorter and simpler!

Whitespace inside an initializer doesn't matter, and you can even add comments and stuff in there:
    const int arr[]={
       7, /* dwarves, in decimal */
       0x9, /* cube of three, expressed in hexadecimal */
       027, /* number of angels that can dance on the head of a pin, in octal */
   };
It's still the same three ints as before.

If you want to save a little memory in your program, you can make a 1D array of unsigned char (which are like int, but only 8 bits wide):
    const unsigned char arr[]={7,9,23};

A little array that tells the program what to do is called a "table", and programs written this way are called "table-driven". Here's a simple table-driven program that prints out a certain number of "#" signs on each line, with the exact number determined from a little table:

const unsigned char table[]={
	20,
	20,
	2,
	2,
	2,
	20,
	20,
	0 /* end of the table */
};

int foo(void) {
	int i=0; /* location in the table */
	while (table[i]!=0) { /* print one entry in the table */
		int  n=table[i++]; /* number of times to print */
		char c='#'; /* character to print */
		for (int repeat=0;repeat<n;repeat++) /* print it n times */
			std::cout<<c<<" ";
		std::cout<<std::endl;
	}
	return 0;
}

(Try this in NetRun now!)

Here's a similar table-driven program that reads two entries in the table each time around the loop. The first table entry is treated as a repetition count, and the second table entry is the letter to repeat. Again, the program stops when it hits a repetition count of zero:

const unsigned char table[]={
/* ---  n, character to print n times ---- */
	3,'q',
	2,'@',
	4,'~',
	1,'z',
	0 /* end of the table */
};

int foo(void) {
	int i=0; /* location in the table */
	while (table[i]!=0) { /* print one entry in the table */
		int  n=table[i++]; /* number of times to print */
		char c=table[i++]; /* character to print */
		for (int repeat=0;repeat<n;repeat++) /* print it n times */
			std::cout<<c<<" ";
	}
	return 0;

}

(Try this in NetRun now!)

Note that the above is just a nine-entry table; the relationship between 3 and 'q' is purely conceptual.

How this relates to CS 301: Machine Code

"Machine code" is a block of binary data that the CPU interprets as a long table of commands. You'll be writing machine code for HW2. There's nothing magical about machine code, and in fact you can easily write a little program that walks through and executes a table of machine code:

const unsigned char table[]={
	0, /*yo! */
	1, /*print x... */
	2, /*       ... two times */
	0, /* yo (two times) */
	0,
	2 /* exit */
};

int foo(void) {
	int i=0; /* our location in the table */
	while (1) /* always keep looping through the table */
	switch (table[i++]) { /* look at the next thing in the table */
	case 0: cout<<"Yo!\n"; break; /* single-Yo instruction */
	case 1: { /* multi-x instruction */
		int count=table[i++]; /* next byte is the x repeat count */
		for (int repeat=0;repeat<count;repeat++)
			std::cout<<'x'<<endl;
		break;
	}
	case 2: return 0; /* stop looping through the table */
	default:
		cout<<"Unrecognized table entry!\n";
		return -999;
	}
}

(Try this in NetRun now!)

Note that 0, a "Yo!" instruction, stands alone in the table, while 1, a "multi-x" instruction, takes two bytes, because the second byte is an x count. 2, the exit command, is also a single-byte instruction.

You can of course use any numbers you like for the table values. Here's the same idea, but with x86-compatible instruction numbers:

const unsigned char table[]={
	0xb0, /*set x = ... */
	7, /*         ... this byte */
	0xc3 /* exit */
};

int foo(void) {
	int x=0;
	int i=0; /* our location in the table */
	while (1) /* always keep looping through the table */
	switch (table[i++]) { /* look at the next thing in the table */
	case 0xb0: { /* set-x instruction */
		x=table[i++]; /* next byte is the new value for x */
		break;
	}
	case 0xc3: return x; /* stop looping through the table */
	default:
		cout<<"Illegal instruction!\n";
		return -999;
	}
}

(Try this in NetRun now!)

Our table just has (8-bit) bytes in it, but sometimes we want to be able to set an entire (32-bit) int. The standard x86 solution to this is to read the low byte (near zero), then the not-so-low byte, the not-so-high byte, and the highest byte, like so:

const unsigned char table[]={
	0xb8, /* set x =... */
	4, /* low byte is 4 (that is, 0x04) */
	1, /* next byte is 1 (that is, 0x01) */
	0, /* highest two bytes are both zero */
	0,
	0xc3 /* return that */
};

int foo(void) {
	int x=0; /* register */
	int i=0;
	while (1) switch (table[i++]) {
	case 0xb8:
		x=table[i]|(table[i+1]<<8)|(table[i+2]<<16)|(table[i+3]<<24); 
		i+=4;
		break; 
	case 0xc3: return x;
	default:
		cout<<"Illegal instruction!\n";
		return -999;
	}
}

(Try this in NetRun now!)

Finally, it's handy to have several different registers, and to be able to load different values into these registers--here, we use the low bits of the 0xb8 instruction to determine which register to write to. We'd also like to be able to add our choice of registers together, which we can do in an "add" instruction by reading the register numbers from the next byte, which is called "modR/M" on x86:

const unsigned char table[]={
	0xb8,0x04,0x1,0,0, /* set register 0 to 0x104 */
	0xba,0x05,0x0,0,0, /* set register 2 to 0x5 */
	0x03,0302, /* add register 2 to register 0 */
	0xc3
};

int foo(void) {
	int regs[8]; /* registers */
	int i=0;
	while (1) {
	  unsigned char instruction=table[i++];
	  switch (instruction) {
	  case 0x03: {
		int modRM=table[i++]; /* figure out what to add based on next byte */
		int S=modRM&7, D=(modRM>>3)&7, hi=modRM>>6;
		if (hi!=3) return -998; /* high bits have to be 1 (because.) */
		regs[D] += regs[S]; break;
	  }
	  case 0xb8: case 0xb9: case 0xba: case 0xbb: case 0xbc: case 0xbd: case 0xbe: case 0xbf: 
		regs[instruction&7]=table[i]|(table[i+1]<<8)|(table[i+2]<<16)|(table[i+3]<<24); 
		i+=4; /* skip over entries in table we just read */
		break; 
	  case 0xc3: return regs[0];
	  default:
		cout<<"Unrecognized table entry!\n";
		return -999;
	} }
}

(Try this in NetRun now!)

And for the final horror, because above I "happened" to chose the same numbers as a real x86 CPU uses in its executable table (that is, x86 machine code), now we can actually execute our table directly on the hardware, rather than writing an interpreter. Don't worry about the syntax in foo; all the interesting stuff is happening in the bytes of the table, and the CPU hardware that they command!

const unsigned char table[]={
	0xb8,0x04,0x1,0,0, /* set register 0 to 0x104 */
	0xba,0x05,0x0,0,0, /* set register 2 to 0x5 */
	0x03,0302, /* add register 2 to register 0 */
	0xc3
};

int foo(void) {
	typedef int (*myFn)(void); /* defines a function type that returns int */
	myFn f=(myFn)table; /* make "table" into that function type */
	return f(); /* call our newly-defined function */
}

(Try this in NetRun now!)

The bottom line: this notion of "read a table to figure out what to do next" is very powerful--it's the basis for programmable computers, capable of computing anything that can be computed!