Parallel Computing Theory

CS 441 Lecture, Dr. Lawlor

Fact: Amdahl's Law limits the speedup of a partially parallelized program to 1/F, where F is the fraction of the program's runtime that is still sequential, even assuming everything else is so parallel it takes no time at all.
    speedup = old time / new time = sequential time / parallel time
                    = 1 / ( F + (1-F)/P ) => 1/F     (as number of processors P approaches infinity)

Question: what fraction of real programs MUST be sequential?  I claim it's only a vanishingly small amount--the user interface probably should be sequential in places, since there's only one user (one mouse, one button).  So Amdahl's law is in some sense true but useless--if the sequential part of the code is a 2-ns button click, a 2-second program has F=1/billion, so the speedup is limited to one billion.  Big deal!

However, it really takes programming effort to parallelize some parts of the program--some algorithms are sequential (you need new algorithms), some hardware like I/O is sequential (you need better hardware), etc. 

Mathematically, since
    F proportional to 1/pain
then Ahmdal's law really says
    speedup is proportional to pain

Which is both a true and useful statement.

I/O Performance

One recurring problem in parallel computing is I/O: talking to the disk, network, or output devices.  Oddly, most real hardware has wildly different latency (time for one individual operation) and bandwidth (total bytes per second)--you might expect latency = 1/bandwidth, but in practice that first byte is way more expensive than the next bytes.  For example, loading a single byte from RAM requires the whole row to be addressed and sensed, so reading adjacent bytes gets much cheaper.  Sending one byte over the network requires an entire packet, so sending multiple bytes at once decreases the cost substantially.

Storage Device
Latency (sec/request)
Requests/sec
Bandwidth (bytes/sec)
Source
Floppy Drive
500ms
2
5KB/s - Horrible
Hazy memories
Modem
100ms (round trip)
10
5KB/s - Horrible "
DVD-ROM
150ms (seek time)
8
5MB/s - Sad
Experiment (in class)
Flash Drive
0.5ms
2000
13MB/s - Sad
Experiment (USB flashdrive)
Hard Disk
10ms
100
50MB/s - Decent
Experiment (laptop hard drive)
Gigabit Ethernet
100us
10,000
100MB/s - Decent
Experiment (powerwall)
RAM
50ns
20 million
2GB/s - Great Experiment (netrun Pentium 4)
Cache
1ns
1 billion
4GB/s - Great
Experiment (netrun Pentium 4)