What Is GPGPU, and How Does It Affect Average Users?

What Is GPGPU, and How Does It Affect Average Users?
Page content

Intel and AMD have taken us to a market of multi-core processors. But getting these processors to use all those cores requires the development of more parallel software. What is easy to overlook is there is already a highly parallel processor, running thousands of threads at a time, in normal desk and laptops: the GPU.

Long before CUDA and GPGPU became talking points, and before normal users gave much thought to if their software was parallel enough to use all of the CPU, graphics processing was thought of in terms of a pipeline. Performance was increased by widening the pipeline, or more accurately, adding more identical, parallel, pipelines.

Some clever programmers and engineers saw all these commercially available, incredibly parallel chips, and realized that where the problem leant itself to parallelization in a way similar to computer graphics, there should be a way to trick a GPU into doing the work a CPU usually does… and doing it faster or for less money.

And GPGPU computing was born. Well, maybe it’s more appropriate to say that the aforementioned clever people had a GPGPU bun in the oven or a twinkle in the eye. There was a long way to go before you and I would start getting our hot little hands on it. But before we get into how GPGPU is finally reaching the mainstream, many of you are desperate to know:

What is GPGPU?

It stands for General Purpose computing on the GPU (Graphics Processing Unit). It refers to getting anything except graphics to run on a GPU. While all that parallel goodness was tempting, getting it to do anything but graphics was a very tall order. There were several problems with getting GPUs to handle “General Purpose” applications.

GPUs work with data types that represent colour and other information about pixels, like 32-bit colour. Normal CPU programs use 32 or 64 bit floating point numbers. And without a CPU type cache, the solution was to trick the texture units, which usually hold information about the colour and appearance to give certain areas of the screen, into acting like a cache.

More Programmable and Flexible GPU

As a result, GPGPU remained a largely academic design question with few applications. Evolution in graphics cards eventually closed the gap, however. Microsoft’s Direct X API (Application Programming Interface) is oriented to having Windows applications get the most from GPUs, traditionally in terms of graphics performance. With that in mind, Direct X 9 introduced support for floating-point numbers, not necessarily to help out GPGPU fans, but because graphics programmers wanted the flexibility.

DX 10 expanded and improved floating-point performance, but it more importantly introduced unified shaders, again in the name of more flexibility for graphics programming. This, however, was also a huge boon to the GPGPU movement.

Unified Shaders Get You Almost All the Way to GPGPU

Shaders, initially very rigid as to their position and function in the graphics pipeline, had become more and more flexible and powerful, but still had specific functions: vertex, geometry, and pixel. Unified shaders are even more flexible, since they have to do any of the jobs that used to be assigned to specialized shaders.

The problem remained, however, that while the GPU became much more flexible, that flexibility was still geared towards generating graphics. The hardware was there, but programmers couldn’t use it effectively for GPGPU, since they could only get at it from a high-level (API) through the graphics oriented Direct X 10 (Direct X 11 seeks to address this, as do new versions of the competing OpenGL graphics API). Even attacking the problem at a low-level left one with a graphics-driven instruction set.

Up Next: Nvidia’s CUDA

Nvidia realized that the GPGPU market was ready to go places, and launched CUDA to help it along. And sell more GPUs, obviously. CUDA is Nvidia’s bid to unlock the general purpose potential in the GPUs they sell in droves. We describe CUDA here.