Mutli-Core Computing Systems

Introduction

Traditionally in order to increase the performance of a computing system increasing the clock frequency of the processor was the primary method. Manufacturers were able to keep pace with Moore's Law, however in a 2006 whitepaper even Intel admitted continuing at such a pace along the same line of development was impractical due to the increased power required by such systems and the resulting heat generation. In order to meet the growing demands for high speed processing, microprocessor developers came up with “Multi-Core” as their solution.

Origins

Symmetric Multi-Processing has been available in computing systems for a long time. In fact in 1975 Honeywell introduced the Multics system which boasted as many as four “central processors”.

The idea behind SMP is that two or more independent processors are connected to RAM and other system resources in parallel with a bus or crossbar.

Unfortunately seeing performance benefits from multiple processors is not as simple as plugging in new hardware.

Operating systems must support multiple processors by scheduling execution of processes on the available CPU's. Most modern operating systems support SMP. From the perspective of the operating system having two seperate “sockets” with a single core CPU or a dual core single “socket” CPU are the same.

Applications must also support multiple execution paths (threads) to gain the benefit of parallel processing.

Theory

Consider a very basic counting algorithm which given a set of integers and a “target” integer returns the number of times that the target integer is found in the set. Every time that a “target” is found it will be set to zero inside the original set.

int count( int *set, int target)

{

int count;

for (i=0;i<sizeof(set);i++)

{

if (set[i]==target)

{

count++;

set[i]=0;

}

return count;

}

This is a simple serial algorithm. So how could we make it take use of multiple cores?

The key to developing a parallel algorithm is being able to break down the problem into multiple independent sub-problems. In this case we could have three separate threads run the same counting algorithm on one third of the “set” and then add the counts returned by each thread.

This is great if you are using an array to store your data, but what if you are using a linked list? Now each thread has to start at the first node and count to the end of the linked list, so you have lost the ability to process the problem in parallel. So even given a great a parallel algorithm your data storage mechanism can impact your applications performance!

Overhead is another consideration which must be addressed as in order to coordinate the threads within an application and reassemble the data there is a certain amount of processing which must take place.