Parallel Computing and MPI

CS 321 Lecture, Dr. Lawlor, 2006/03/10

This material is "bonus" material, in the sense that I won't ask about it on homeworks or tests. If this sort of thing interests you, talk to me and I'll see about offering a class on parallel computing--possibly CS 421 (Distributed Operating Systems) or a CS 493 (Special Topics).

Motivation

Every year, chip designers are able to pack more and more transistors onto each piece of silicon. But it's getting tougher and tougher to figure out where to add transistors to make a single fast processor faster. You could add more cache, or more arithmetic logic, or more prediction/speculation/control logic, but single-processor machines are mostly limited today by the dependencies between instructions.

An obvious way to add value to a piece of silicon, then, is to put more processors on it. This was first tried back in the late 1990's with "Symmetric Multi Threading", or HyperThreading (Intel's brand name), where two processors share their arithmetic units. The trouble with HyperThreading is that sharing creates a more complicated chip, and introduces pipelining problems that don't exist for independent processors.

Intel's latest chip is the Core series. The name comes from the term "multicore", where two or more totally separate processors sit on one piece of silicon. Apple is using multi-core chips, including the Core Duo, in all their latest designs.

Intel has publically discussed moving to multicore systems since 1989.

History of Parallel Computing

Multiple processors running simultaniously means you're doing parallel computing.

The first major parallel machine was built in Illinois, the ILLIAC-IV.

There's a "Top 500" list updated every 6 months of the fastest computers in the world. Machines are measured in "flops": floating-point operations per second. A typical desktop nowadays might get around 8 gigaflops with SSE instructions: 0.5ns per SSE add means 2 billion SSE adds per second, and since each SSE add operates on 4 floats, that's 8 gigaflops.

In 2002, the fastest machine in the world was the Japanese Earth Simulator, an NEC SX-6 style vector processor capable of 35 teraflops (TF). The Earth Simulator cost around $400 million, and had "just" 5,120 processors.

In 2005, IBM's Blue Gene eclipsed the Earth Simulator. Blue Gene uses custom (low-end!) IBM Power 440 microprocessors on a special very high-density package--it's got 128,000 processors, and is capable of 280 teraflops.

IBM is also putting a Power processor into the Cell chips for the PlayStation 3.

MPI: the Message Passing Interface

MPI is the standard interface for talking over the network on a parallel machine. Every big parallel machine has MPI installed, ready to go, and running every day. Back in the late 1990's people were still using other interfaces like PVM, and for shared-memory Fortran programs OpenMP is fairly popular, but the biggest interface for the biggest machines is invariably MPI.

The offical source of MPI documentation is the standard. You can also find tons of MPI tutorials on the web. Here's a typical MPI tutorial.