Parallel vs. Serial Computation

CS 301 Lecture, Dr. Lawlor

Normal microprocessors have been getting faster and faster partly by getting denser and denser.  The 1971 Intel 4004 used just 2,300 transistors; a modern Core CPU uses hundreds of millions of transistors.  Note that both chips are about the same physical size (a fraction of an inch across), but modern transistors are a lot tinier than 1970's transistors.

Since the 1970's, chip companies have used those extra transistors to extract parallelism from the stream of incoming instructions, by doing pipelining, superscalar execution, out-of-order processing, and caching loads and stores.  They've been amazingly successful--modern CPUs are now amazingly fast, and figure out how to do dozens of things at once even though machine code is written to a serial "one instruction at a time" model.

But a single silicon chip can now contain so many transistors that nobody knows how to use them to make a single CPU faster.  In other words, that single instruction stream is such a bottleneck that chip designers can't figure out how to do anything else to make a single program faster.  So chip companies have started introducing chips with more than one CPU (multiple 'cores') together on a single piece of silicon (a single 'die').  Intel started doing this with the Core Duo; they've recently introduced the "Quadro", with 4 CPUs on one piece of silicon. 

By contrast, graphics cards have been using the extra transistors to add 'pixel processors' for a long time.  'Pixel processors' are just tiny, cheap CPUs with a fast and graphics-friendly instruction set (for example, they usually don't even have integers, only floating point!).  The nVidia GeForce 8800 has 128 pixel processors that all run simultaneously; my laptop's graphics card, by contrast, has just 4 pixel processors.  The same graphics code can trivially run on both cards.

Graphics processors run a C++ish "shader language" to run "pixel programs"; normal CPUs run normal C++ programs.  The big difference is that a "pixel program" knows it's just one pixel of many, and thus expects to run alongside many other pixels, in parallel.  A C++ program, by contrast, represents one complete program--and if you're only running one program, you only need 1 CPU.

One of the biggest challenges facing computer science today is how to deal with shifting from serial C++ code to parallel machines. 

Serial

Parallel

One thing happens at a time.
Multiple things happen at once.
22 rifle--lots of little cartridges, one bullet each.
8 holes sounds like: Plink! Plink! Plink! Plink! Plink! Plink! Plink! Plink!
12ga buckshot--one big cartridge, lots of bullets in each.
8 holes sounds like: Ca-chunk!
Normal CPU instructions, like "mov" or "add"
  Each instruction does one operation on one value.
"SSE" instructions, like "movps" or "addps".
   Each instruction does one operation on several floats simultaneously.
A CPU running a C++ program starts at beginning, does all the instructions one at a time, and runs to the end.
A graphics card (GPU) shading an area has lots of pixels to shade, so it run the 'pixel program' on each pixel simultaneously.
One normal CPU running one program.
A thousand normal CPUs running a thousand copies of one program, to work together on one problem.

I'd like to cover SSE and graphics card programming briefly over the next few days.