Parallel Tight-Rope Walk
We recently discussed how processors were using too much power and making too much heat to get any faster in terms of clock speed, and that improvements to processors since Pentium IV have been based on increasing the number of cores. That article also discussed some of the complexities in getting mainstream applications to use the several cores they now have available.
Creating mainstream software that takes advantage of several cores is a tall order. While locking processes (delaying one task until another is finished) to ensure correct results is complicated enough, where the science becomes an art is getting things to work right, with as few locks as possible. That is to ensure as many cores as possible can be active for the best bump in parallel performance scaling.
Intel parallelism guru James Reinders reminded developers last week that they needed to “think parallel.” He went on to suggest several tactics, among them: “avoid locks when possible.” Developers indeed have a lot on their plate, since applications that can take advantage of today’s multi-core CPUs will be at a massive advantage to those trundling along on a single core.
Intel and Microsoft Put Money Behind Talk
Getting mainstream applications to use multi-core processors to their fullest has been slow in coming. Even games, which usually push hardware to its limits, are just beginning to take advantage of several cores, thanks to the efforts of Valve and others.
Intel is trying to help, having just finalized a funding agreement with Microsoft, and the Universities of California at Berkeley, and of Illinois at Urbana-Champaign. That’s four very big names in tech, if you are keeping track. The deal sees Microsoft and Intel handing each school $20 million over the next five years.
The schools will throw in $7 or $8 million of their own state funds and set up Universal Parallel Computing Research Centers. They “are expected to create long term, high-impact breakthroughs in parallel programming languages, tools, and supporting architectural features that will enable entirely new classes of consumer and enterprise applications.”
I’m not sure I need an entirely new class of applications. I would settle for the next generation of application putting that second core under my heat sink to good use. Corporate enthusiasm aside, that kind of money in those kinds of places will get a lot of smart people working feverishly on your problem.
More Parallel Support from Intel
Intel has released several tools, like Parallel Studio, to help software programmers use all those cores without having to go back to school for two years. And we shouldn’t forget, Intel are the ones loading the CPUs with all the cores, though AMD has followed suit, arguably pulling ahead briefly in some areas.
It’s not just a matter of jamming cores together though. The Nehalem CPU architecture, having benefited from the time Core 2, and to some extent AMD CPUs spent in the field, came with many changes to push multi-core performance even further. Hyper Threading allows one core to process two threads at a time, offering some of the performance of twice as many cores for a much lower financial and power cost.
Nehalem’s three level cache has two levels dedicated to individual cores, and one much larger level of a larger, inclusive, shared cache. The idea behind combining shared and dedicated caches, and including the data from the dedicated caches in the shared cache, is to preserve dependencies while preventing data snooping.
Dedicated caches give each core its own little chunk of memory that it can use without interference or having to go very far. If all of the cache was discrete however, a core that needed to find something in a cache other than its own, would have to go check other caches until the needed data were found. This issue, called data snooping, obviously can slow things down. The shared cache, since it is also inclusive of the dedicated ones, is one stop shopping for data in another core’s cache.
More Parallel Hardware Coming from Intel
Looking beyond CPU applications, Intel sees an opportunity for many-core parallel computing in graphics processing. Many-core means double digits of cores, while the more common multi-core refers to processors with fewer cores, as is used in most computer CPUs.
GPUs are already highly parallel, using dozens of shader processors to produce frame after frame of graphics. Intel figures they might be able to use a bunch of modified, simple, CPUs in a many-core setup to perform the tasks of a GPU. This project, called Larrabee, is obviously not good news if you are Nvidia, since it means more competition for them.
But Nvidia isn’t standing still: like Intel, they are offering software tools, guidance, and investment to drive more parallel software. But Nvidia’s efforts revolve around getting applications to run on graphics processors with the help of development tools called CUDA. But building a GPU out of lots of little CPUs, and getting a GPU to run a non-graphics application, is not just a question of parallelism.
Microsoft is also interested in this side of mainstream parallelism, including provisions for “compute” shaders in Direct X 11. Compute shaders and CUDA both involves a newer, and therefore, more important buzzword (that’s how buzzwords work) than parallelism: GPGPU. This unfortunate acronym stretches into General Purpose computing on the GPU (Graphics Processing Unit). We will look at Larrabee, CUDA and GPGPU soon.