# Operand Forwarding Networks

CS 641 Lecture, Dr. Lawlor

This is an interesting paper comparing several radically different ways of moving data around a single chip:

M. Taylor, W. Lee, S. Amarasinghe, A. Agarwal, "Scalar Operand Networks", IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 2, Feb 2005.

One thing I really like about this paper is the notion that
superscalar, multicore, and distributed networks are all just different
approaches for shuffling data between arithmetic units.

## Multiplexors and Busses

There are many places on a CPU where you need to choose one output of several possible inputs:

- The result of each operation can be driven by any arithmetic unit.

- Each output from the register file can be driven by any register.

The typical way to do this is using a multiplexer, or "mux", typically implemented using logic gates.

But there's another implementation using an analog circuit trick: you
can tie together several output lines, as long as you only enable a
single output line at a time. The big difference is that an
output device can enter a high-impedance "disconnected" state, leaving
to three modes: "on, off, or don't care", hence called tri-state logic.

These can be further simplified by noticing that the AND gates are only needed if the enable is true:

This tri-state solution only uses four transistors per output bit,
which is substantially fewer transistors than the mux approach (at
least 4 transistors per AND gate, plus the OR gate), and hence the
tri-state output bus solution is standard.

If you look at the 8008 microarchitecture, there's one big data bus running around the entire CPU.

## Simple CPU Design

Here's a basic Register, which stores a value (displayed inside),
always outputs the value on the Q output, clears the value to zero with
the "0" input (bottom right side), and reads a new value from D when
you bring the "en" clock input high. I'm using an
"Input->Button" as a momentary input to clock in values, or reset
the value to zero. Click the image to get a Logisim .circ file to
play with.

Here's the same register's output fed into an adder, which is then fed
back into the register's input again. This circuit just doubles
the value every time you hit "clock in".

Of course, a real CPU has lots of operations, not just addition.
But we can support different operations by multiplexing which operation
we want, using a "Plexers -> Multiplexer" or "mux". The mux's
input determines which operation we perform.

Finally, a real CPU can store more than one value at once. We can
choose which register to use as an input, by using a mux the same
way. We can now poke the various buttons, and get addition or
multiplication on any of the stored register values, store the value
back to any register, and operate back and forth indefinitely.
Given a list of button values, this circuit can compute stuff.
It's a CPU!