Parallel Processing Even Trickier than Parallel Parking

Written by:  • Edited by: M.S. Smith
Updated Nov 12, 2009
• Related Guides: Clock Speed | Intel | Parallel Processing

CPUs have more cores than ever, but the software to take advantage of them has yet to arrive, and processor clock speeds have platuead because of heat and power issues. To unlock the potential of multi-core processors in mainstream computing, software development requires new tools and techniques.

What Is Parallel Processing?

People spend a lifetime working on questions of parallel computing, so we’re obviously keeping things brief, and simplifying. If we compare a processor to a factory, a single core processor is like a factory with one assembly line. More cores in the processor represents more assembly lines, which run in parallel.

To achieve the same performance output, either in a factory or processor, you can either upgrade the assembly (processing, in computing terms) line to run faster, or add another line. For about two decades, processor improvements have been based on turning up the clock speed, making the single pipeline able to handle more instructions per second.

The Pentium IV saw the last great increase in clock speed over the previous generation of processors. Its final iteration, the Prescott, was to have been available at up to 3.4 GHz, but those were hard to find because of reliability issues. It was widely believed that the power requirements led to so much heat that the chips couldn’t cope.

Power, Heat, and Memory Walls

A plateau was reached at a little above 3 GHz. The faster a chip runs, the more power it uses, and the more heat it generates. Prescott reached the point where normal cooling solutions, ones that could be placed affordably in mainstream computers, couldn’t keep the CPU in its operating envelope. Factor in the green and mobile computing trends, and cranking up the speed and watts was no longer an option with ecology and battery life to worry about.

Furthermore, CPUs were getting faster much more quickly than memory was. The larger this difference became, the more dramatic the performance hit every time the CPU had to access main memory. The solution was to put ever larger caches onto the CPU. Cache is a very small amount of memory housed in the CPU chip, and running at the CPU’s clock speed.

Using the cache instead of main memory whenever possible helps with the RAM/CPU speed difference, but a larger cache takes up more room. That room could have otherwise gone to more processing power. With that and the heat/power question, the assembly line was running as fast as practically possible, so it was time to start another line.

Parallel Problems

Problems with Parallel: Race Condition Adding lines, like adding cores, introduces complexities that don’t occur when speeding up a single one. The jobs that need to be done have to be divided to make sure things are being done as efficiently as possible. Otherwise, cores sit idle, or worse still, things get done in the wrong order, or twice, or not at all.

Dividing the work results in a certain amount of overhead. People at the different lines often need information from someone at another line: the time they take communicating eats into the time they can spend processing. Even if they can look up the information they need without disturbing the people on the other line, they have to run around and look for it, called cache snooping in CPU terms.

The largest problems occur if two cores are doing different things to the same data at once, called a race condition. If one core is trundling through a database applying a currency conversion to all the numbers in several columns of several tables, and another is calculating column totals in the same tables, they can make quite a mess.

To prevent this, one process can lock the data it is using. This solves the problem, though at the cost of having the other core sit idle. If you lock every step of a program, it runs serially, not in parallel. Even if the work is processed by several cores, it will take just as long if they are each waiting for the previous steps to be completed. Actually, the overhead means it can take longer.

Up Next: The Slow Parallel Road to the Mainstream: When will your dual or quad core benefit from parallelism?

Showing page 1 of 2

 
blog comments powered by Disqus
Email to a friend