Parallel & Distributed Computing

CS441 Lecture Notes - Ryan Turnquist


In keeping with the concept of making computing faster, designers have basically two choices: make a single processor computer faster, or split the computation among multiple processors.

The first idea is what we have been talking about so far in this class and it has been accomplished through pipelining, superscalar processors, out-of-order execution, caching and a host of other means. This unfortunately has an upper limit on performance growth due to physical properties of materials and eventually we will reach that limit. The other option is to use multiple processors and split the workload among each one. This is called parallel computing and there are numerous ways of arranging it.


Architecture

Among other criteria, parallel computers can be divided by their memory architectures - shared memory and distributed memory. Shared memory parallel computers are defined by the ability for all CPUs to access all memory as global address space. There are two main types of shared memory: uniform and non-uniform.

Shared Memory

Uniform Memory Access (UMA) is characterized by the use of identical processors who have equal access times to memory. Symmetric Multiprocessor (SMP) machines, like a dual core processor, are the most common representation of UMA machines today.

Non-Uniform Memory Access (NUMA) is characterized by CPUs whose access times are not equal to all memory. These are often made from multiple SMPs, physically linking them and access is slower across the link. A multi-CPU computer (one with more than one actual piece of silicon) is a NUMA machine.

Cache coherency can be an issue and is usually solved at the hardware level with snooping or snarfing. Advantages of shared memory systems include ease of programming (as compared to distributed memory systems) and fast data sharing between CPUs. Scalability and cost are the biggest disadvantages of shared memory parallel computers - as the number of processors is increased the traffic to memory and cache increases geometrically.

Distributed Memory

Distributed memory systems are characterized by each processor having its own memory. Because of this, accessing memory of another processor is up the programmer and results in NUMA times. These systems are much more scalable and cost effective but harder to program, as it is up to the programmer to deal with communication among processors. Cache coherency is however not an issue.


Organization

The organization of concurrent systems can be as tight as a multithreaded or multicore CPU to as loosely coupled as grid computing. The two main organizations I want to talk about are clusters and grid computing (distributed computing).

Clusters

Cluster computers are very tightly coupled computers that can be viewed as a single computer in most respects. Each node of the cluster is commonly connected to the others over a high speed local are network. Clusters are highly economic and provide the benefits of performance or reliability more cost-efficiently than that of a comparable single computer. Scalability is also much easier in a cluster computer than other types of parallel computing. There are generally three types of cluster categories: High Availably, High Performance and Load-Balancing.

  • High availability systems provide improved service availability in the event of failure. Extra nodes act as redundancy which can serve when other nodes fail.
  • High performance systems spread workload among several nodes and are most commonly used in scientific computing. The most popular of this type are Beowulf clusters which are characterized by common equipment running a UNIX-like free OS and using open source software to handle communication and parallelism.
  • Load balancing clusters strive to spread service requests out evenly among nodes to allow for high throughput. Requests go through a load balancing front end that divvies up the work to back end servers. These types of clusters usually also implement some sort of high availability or fail over features.

Grid Computing (Distributed Computing)

Grid or distributed computing is very similar to cluster computing and basically clusters are a special case of distributed computing. The biggest differences between the two are in grid computing: each node is not usually only workings on group tasks, nodes do not fully trust one another, nodes are much more heterogeneous and they are generally separated by geography. Distributed computing almost always falls into one of the following architectures: Client-Server, N-Tier, Tightly Coupled (Clustered) or P2P.

  • Client-Server: Exactly like it sounds - clients requests data from a server and report any permanent data back.
  • N-Tier (Multitier): Like Client-Server architecture except middle management works to coordinate.
  • Tightly Coupled: See Clusters
  • P2P: Peers act as clients and servers to other peers and no single overarching entity manages them.

Some examples include:

  • Folding@Home - Stanford University Chemistry Deptarment's protein folding program aimed at finding cures for many diseases such as Alzheimer's, cancer and Cystic Fibrosis.
  • SETI@Home - Space Sciences Laboratory at Berkeley University (Search for Extraterrestrial Intelligence) monitors radio transmissions detected by the Arecibo radio telescope.
  • BOINC - Space Sciences Laboratory at Berkeley University, platform for producing distributed problems.
  • Distributed.net - Attempts to crack many common cryptographic ciphers
  • Many, many more -just do a Google search

Disadvantages

While distributed computing allows users and computing resources all around the world to solve problems that would normally have an unreasonable timescale, there are many pitfalls that make it difficult to implement.

Poor planning can lead to system unreliability if critical nodes fail or there is inadequate or poor synchronization. Troubleshooting also becomes much more difficult when dealing with many different systems and platforms. Many problems are not suited for distributed computation especially if there is low parallelism or high amounts of communication, which leads to the problem of overhead. If the bandwidth is too low or communication needs are too high overall efficiency and performance gain may not be lower than computing on a different type of system. Proprietary data generated by volunteers could also pose a legal problem. Another major problem could be the amount of raw data that is generated (especially in biogenetic problems) which is meaningless until it is sifted through and presented in a useable way which could take longer than the actual problem solving.

Along with conceptual problems, there are many implementation problems that can be mostly summarized by the eight fallacies of distributed computing written by Peter Deutsch and James Gosling. Many of these seem to be networking issues but what is distributed computing without the network…

  1. The network is reliable
  2. Latency is zero
  3. Bandwidth is infinite
  4. The network is secure
  5. Topology doesn't change
  6. There is one administrator
  7. Transport cost is zero
  8. The network is homogeneous

Implementation

There are many interesting ideas about making distributed computing work and trying to solve the network issues listed above.

Remote Procedure Calls (RPC) - This is a way to call procedures on a networked machine without the hassle of explicitly coding the interaction between the computers. It gives the programmer the ability to call a procedure like they would with a local one.

Remote Method Invocation (RMI) - This is the OO version of RPC and was originally developed for Java. No surprise there, if you read about the formation of the fallacies list, you get a sense that the Sun people have been striving for this kind of thing.

While nearly any programming language that has enough access to the system can be used in parallel computing and specifically distributed computing, there are many, many languages tailored just for this. Many of them are adaptations of C/C++ or Java and some of them have gone the other way - for example, Alef by Rob Pike, became a thread library for C.