Building a CPU From Scratch

Circuit Simulation via TkGate

We're really not going to do spend much time designing actual digital circuits (this isn't an EE course!), but it's important to understand the general way digital logic circuits work.

The easiest way to do this is to just play around with circuits for a few hours. I've prepared the examples below using TkGate, a UNIX-ey open source digital logic designer and simulator.

In Windows, TkGate is only available via the UNIX compatability layer Cygwin. I've prepared one 55MB .zip file with just TkGate and the stuff it requires, but if you want more of Cywgin you can do a Cygwin Raw Install (install from a normal mirror first, then grab TkGate from the www.cs.ubc.ca/local/software/cygwin mirror).
In Linux, TkGate is often available from your package manager (in Ubuntu, "sudo apt-get install tkgate").
In MacOS X, supposedly TkGate can be built from source, which you download here.

The TkGate Documentation is pretty good, and the program runs a simple tutorial when you start it.

Many other circuit simulators exist. In EE 341, I used "LogicWorks", which is a fine commercial program for Windows and MacOS. Sadly, translating circuits between two graphical simulator packages is only rarely possible!

Simple CPUs built from Digital Logic Circuits

Here are a few steps on the evolutionary path to a CPU:

First, an add circuit. We've set up two 8-bit hex input devices (DIP Switches) in TkGate, hooked them to a Make -> ALU -> Adder, and displayed both input and output binary data in 8-LED arrays. Click the above circuit to download the TkGate Verilog-style circuit description.

Here's almost the same circuit, except now we run the input values past a multiply circuit as well as the adder.

Now we've added a "bus". The little three-input triangles below each arithmetic unit are "tri-state buffer/drivers", which enable us to turn the add and multiply outputs on and off, and so combine the outputs of these two circuits. The idea is we turn on the output we want, which at the moment we do manually, by flipping the appropriate output switch. If you turn multiple outputs on at the same time, or turn no outputs on, TkGate shows a yellow indeterminate output state. In real circuits, driving a bus to several different values causes the buffer drivers to heat up and possibly fry themselves!

Now we add a set of "registers", which are just data storage elements. We hook up the output of each register to a tri-state driver, and use the drivers to select which register we want to read as input. We hook up the data input to each register to the arithmetic output bus. Finally, we can manually "clock in" the arithmetic result into any register (via the triangle-shaped register inputs).

Annoyingly, TkGate starts all the registers in indeterminate (yellow) state, rather than zero, so we have to manually clear the registers at simulated startup by flipping the register clear switch off and then on again.

OK, now we're approaching a real CPU! The only thing we're missing is a control unit that will flip all the switches appropriately to execute some instructions!

The big trapezoidal "0 D 3" modules are called decoders (also known as demultiplexors or demuxes). They take a binary input value (the red bus going in the top), and copy the left input to one of the four outputs along the bottom. We've wired these mux outputs to the input lines of our arithmetic bus, register output bus, and register input bus. This means that rather than manually flipping switches, we just need to load a binary number into the "instruction" register!

In this case, we've connected bits 1:0 (the low two bits) of the instruction to the arithmetic operation demux. So if the low bits are 00, the add circuit turns on. If the low bits are 01, the multiply circuit turns on. Bits 5:4 (the low two bits of the high hex digit) connect to the register output. So 00 means output register 0 to the arithmetic input bus, 01 means output register 1, 10 means output register 2, and 11 is an error. Finally, bits 7:6, the high two bits of the instruction, connect to the register input control lines. To write to register 0, the high bits should be zero, and so on. You need to manually flip the "write" switch to make the registers write (this avoids annoying circuit race conditions).
So overall, this CPU's instructions look like:
<destination register: 2 bits> <source register: 2 bits> <unused: 2 bits> <operation: 2 bits>
<constant: 8 bits>
For example, the instruction (hex) "40" (binary 0100 0000) writes to register 1, reads from register 0, and adds.

Here's a philosophically similar CPU, but I'm using multiplexors instead of tristate busses. You can get an 8-bit mux by double-clicking the mux, clicking the "Port" tab, double-clicking on an input (or output) pin, and typing in the bit width you want (possibly after shuffling dialog boxes). The above CPU uses bits 7:6 as the destination register, bit 5 is reserved, bit 4 is the arithmetic operation, and bits 3:2 and 1:0 are both input register numbers.
So this mux-based CPU's instructions look like:
<destination register: 2 bits> <reserved: 1 bit>
<operation: 1 bit>
<source register A: 2 bits> <source register B: 2 bits>
For example, the instruction (hex) "13" (binary 0001 0011) writes to register 0 the sum of register zero and register 3 (the constant).

CPU Design Philosophy

A CPU is actually a very simple circuit that continually does exactly two things:

Figure out what to do next. (fetch)
Do it. (decode and execute)

CPU's usually encode "what to do next" in a chunk of binary data called a machine language instruction, almost always stored in memory somewhere. CPUs usually have a special register, the "program counter" or "instruction pointer", that points to the address in memory with the instruction to execute next.

So "Figure out what to do next" just means "Load the next instruction from the memory address at the instruction pointer." The instruction pointer is usually moved to point to the next instruction as part of this "fetch" phase.

"Do it" just means to decode the machine language instruction and figure out what it's telling you to do, then actually do what it says (execute). We've basically got that under control in the TkGate circuit above.

A Trivially Simple CPU Emulator: UEMU

A full-fledged CPU emulator can be written in just a few dozen lines of C++. For example, here's the core of "UEMU", a silly little made-up CPU I wrote a few years ago:

int CPU::run(void) {
	while (1) {
		int inst=mem[regs[0xf]++]; /* instruction fetch */
		int opcode=(inst>>12)&0xf, i=(inst>>8)&0xf, 
		         j=(inst>> 4)&0xf, k=(inst>>0)&0xf;
		switch (opcode) { /* instruction decode */
		case 0x1: regs[i]=(inst&0xff); break; /* "load Immediate" */
		case 0xA: regs[i]=regs[j]+regs[k]; break; /* Addition */
		case 0xC: if (regs[j]>=regs[k]) regs[i]++; break; /* Cond */
		case 0xE: /* Emulator (or OS) call: */
			switch (i) {
			case 0x0: printf("%d%c",regs[k]," \n\t,"[j]); break;
			case 0x1: printf("Enter a value:\n");  
				scanf("%i",&regs[k]); break; 
			case 0xD: dump(inst&0xff); break; 
			case 0xE: return regs[k]; /* Emulator Exit */
			default: quit("Unknown OS-trap instruction 0x%X at 0x%X!\n",inst,regs[0xf]-1);
			};
			break;
		default: quit("Unknown instruction 0x%X at 0x%X!\n",inst,regs[0xf]-1);
		};
	}
}

(Try this in NetRun now!) or just download the source code and run it yourself.

"regs" is a little array of registers. "mem" is another little array of memory locations.

If you look at the decode step, it should eventually become clear that executing an instruction like "0x1234" will be an "Immediate" load instruction (case 0x1:), loading into register 0x2 the value 0x34.

Similarly, "0xA234" will Add into register 2 the contents of registers 3 and 4.

Memory

Used only for storing program instructions. UEMU is a bit odd in that memory consists of 16-bit instructions, not 8-bit bytes. Normally a single instruction (or integer) stored in memory spans several bytes, but not in UEMU!

Registers

There are 16 registers numbered 0 through F (in hex). Register F is the program counter--it stores the memory address of the next instruction to execute. Note that loading a new value into register F will cause a jump! All other registers are free for programmer use.

Instructions

The instruction encoding is chosen such that everything's a multiple of 4 bits long, which makes it easy to write machine language programs in hex. There are a ridiculously small number of instructions:

0x1ikk: IMMEDIATE. Assign the "immediate" 8-bit value kk to register i. For example, 0x1307 would assign the value 0x07 to register 3. Note that 0x1Fkk assigns to the program counter, and is hence a jump instruction.
0xAijk: ADD. Add registers j and k, and assign the result to register i. For example, if registers 2 and 3 hold values 0x40 and 0x50, 0xA123 puts the sum 0x90 into register 1. Note that 0xA111 doubles the value of register 1.
0xCijk: CONDITIONAL INCREMENT. If register j is greater than or equal to register k, increment register i. For example, if register 2 is 0x40 and register 3 is 0x50, 0xC232 increments register 2 to 0x40, while 0xC223 does nothing. Note that incrementing the program counter (0xCFjk) conditionally skips the next program instruction.
0xE0jk: EMULATOR OUTPUT. Print the value in register k in decimal, followed by a space if j==0, a newline if j==1, a tab if j==2, and a comma if j==3. For example, 0xE012 prints the value of register 2 in decimal followed by a newline.
0xE10k: EMULATOR INPUT. Read a value from standard input into register k. Accepts decimal, hex, and octal using the C convention. For example, 0xE105 loads a value into register 5.
0xEDkk: EMULATOR DUMP. Dump out the values of all the registers. If kk==0, dumps out the program memory as well. If kk=0xFF, does nothing.
0xEE0k: EMULATOR EXIT. Stop running the emulator, but print out the value in register k.

A tiny software emulator like this is a good way to get the feel for how to design a CPU instruction set.