Building a CPU From Scratch

CS 441 Lecture, Dr. Lawlor

Circuit Simulation via TkGate

We're really not going to do spend much time designing actual digital circuits (this isn't an EE course!), but it's important to understand the general way digital logic circuits work.

The easiest way to do this is to just play around with circuits for a few hours.  I've prepared the examples below using TkGate, a UNIX-ey open source digital logic designer and simulator.
The TkGate Documentation is pretty good, and the program runs a simple tutorial when you start it.

Many other circuit simulators exist.  In EE 341, I used "LogicWorks", which is a fine commercial program for Windows and MacOS.  Sadly, translating circuits between two graphical simulator packages is only rarely possible!

Simple CPUs built from Digital Logic Circuits

Here are a few steps on the evolutionary path to a CPU:
adder circuit
First, an add circuit.  We've set up two 8-bit hex input devices (DIP Switches) in TkGate, hooked them to a Make -> ALU -> Adder, and displayed both input and output binary data in 8-LED arrays.  Click the above circuit to download the TkGate Verilog-style circuit description.

add and multiply
Here's almost the same circuit, except now we run the input values past a multiply circuit as well as the adder.

tri-state bus circuit
Now we've added a "bus".  The little three-input triangles below each arithmetic unit are "tri-state buffer/drivers", which enable us to turn the add and multiply outputs on and off, and so combine the outputs of these two circuits.  The idea is we turn on the output we want, which at the moment we do manually, by flipping the appropriate output switch.  If you turn multiple outputs on at the same time, or turn no outputs on, TkGate shows a yellow indeterminate output state.  In real circuits, driving a bus to several different values causes the buffer drivers to heat up and possibly fry themselves!

CPU with manual control
Now we add a set of "registers", which are just data storage elements.  We hook up the output of each register to a tri-state driver, and use the drivers to select which register we want to read as input.  We hook up the data input to each register to the arithmetic output bus.  Finally, we can manually "clock in" the arithmetic result into any register (via the triangle-shaped register inputs).

Annoyingly, TkGate starts all the registers in indeterminate (yellow) state, rather than zero, so we have to manually clear the registers at simulated startup by flipping the register clear switch off and then on again.

OK, now we're approaching a real CPU!  The only thing we're missing is a control unit that will flip all the switches appropriately to execute some instructions!

CPU with instruction decode
The big trapezoidal "0  D  3" modules are called decoders (also known as demultiplexors or demuxes).  They take a binary input value (the red bus going in the top), and copy the left input to one of the four outputs along the bottom.  We've wired these mux outputs to the input lines of our arithmetic bus, register output bus, and register input bus.  This means that rather than manually flipping switches, we just need to load a binary number into the "instruction" register!

In this case, we've connected bits 1:0 (the low two bits) of the instruction to the arithmetic operation demux.  So if the low bits are 00, the add circuit turns on.  If the low bits are 01, the multiply circuit turns on.  Bits 5:4 (the low two bits of the high hex digit) connect to the register output.  So 00 means output register 0 to the arithmetic input bus, 01 means output register 1, 10 means output register 2, and 11 is an error.  Finally, bits 7:6, the high two bits of the instruction, connect to the register input control lines.  To write to register 0, the high bits should be zero, and so on.  You need to manually flip the "write" switch to make the registers write (this avoids annoying circuit race conditions).
So overall, this CPU's instructions look like:
    <destination register: 2 bits>  <source register: 2 bits>  <unused: 2 bits> <operation: 2 bits>
    <constant: 8 bits>
For example, the instruction (hex) "40" (binary 0100 0000) writes to register 1, reads from register 0, and adds.
CPU with instruction decode
Here's a philosophically similar CPU, but I'm using multiplexors instead of tristate busses. You can get an 8-bit mux by double-clicking the mux, clicking the "Port" tab, double-clicking on an input (or output) pin, and typing in the bit width you want (possibly after shuffling dialog boxes). The above CPU uses bits 7:6 as the destination register, bit 5 is reserved, bit 4 is the arithmetic operation, and bits 3:2 and 1:0 are both input register numbers.
So this mux-based CPU's instructions look like:
    <destination register: 2 bits>  <reserved: 1 bit>
<operation: 1 bit>
<source register A: 2 bits>  <source register B: 2 bits>
For example, the instruction (hex) "13" (binary 0001 0011) writes to register 0 the sum of register zero and register 3 (the constant).

CPU Design Philosophy

A CPU is actually a very simple circuit that continually does exactly two things:
  1. Figure out what to do next. (fetch)
  2. Do it. (decode and execute)
CPU's usually encode "what to do next" in a chunk of binary data called a machine language instruction, almost always stored in memory somewhere.  CPUs usually have a special register, the "program counter" or "instruction pointer", that points to the address in memory with the instruction to execute next.

So "Figure out what to do next" just means "Load the next instruction from the memory address at the instruction pointer."  The instruction pointer is usually moved to point to the next instruction as part of this  "fetch" phase.

"Do it" just means to decode the machine language instruction and figure out what it's telling you to do, then actually do what it says (execute).  We've basically got that under control in the TkGate circuit above.

A Trivially Simple CPU Emulator: UEMU

A full-fledged CPU emulator can be written in just a few dozen lines of C++.  For example, here's the core of "UEMU", a silly little made-up CPU I wrote a few years ago:
int CPU::run(void) {
while (1) {
int inst=mem[regs[0xf]++]; /* instruction fetch */
int opcode=(inst>>12)&0xf, i=(inst>>8)&0xf,
j=(inst>> 4)&0xf, k=(inst>>0)&0xf;
switch (opcode) { /* instruction decode */
case 0x1: regs[i]=(inst&0xff); break; /* "load Immediate" */
case 0xA: regs[i]=regs[j]+regs[k]; break; /* Addition */
case 0xC: if (regs[j]>=regs[k]) regs[i]++; break; /* Cond */
case 0xE: /* Emulator (or OS) call: */
switch (i) {
case 0x0: printf("%d%c",regs[k]," \n\t,"[j]); break;
case 0x1: printf("Enter a value:\n");
scanf("%i",&regs[k]); break;
case 0xD: dump(inst&0xff); break;
case 0xE: return regs[k]; /* Emulator Exit */
default: quit("Unknown OS-trap instruction 0x%X at 0x%X!\n",inst,regs[0xf]-1);
default: quit("Unknown instruction 0x%X at 0x%X!\n",inst,regs[0xf]-1);
(Try this in NetRun now!) or just download the source code and run it yourself.

"regs" is a little array of registers.  "mem" is another little array of memory locations.

If you look at the decode step, it should eventually become clear that executing an instruction like "0x1234" will be an "Immediate" load instruction (case 0x1:), loading into register 0x2 the value 0x34.

Similarly, "0xA234" will Add into register 2 the contents of registers 3 and 4.


Used only for storing program instructions.  UEMU is a bit odd in that memory consists of 16-bit instructions, not 8-bit bytes.  Normally a single instruction (or integer) stored in memory spans several bytes, but not in UEMU!


There are 16 registers numbered 0 through F (in hex).  Register F is the program counter--it stores the memory address of the next instruction to execute.  Note that loading a new value into register F will cause a jump!  All other registers are free for programmer use.


The instruction encoding is chosen such that everything's a multiple of 4 bits long, which makes it easy to write machine language programs in hex.  There are a ridiculously small number of instructions:
A tiny software emulator like this is a good way to get the feel for how to design a CPU instruction set.