What Is CUDA?
It stands for Compute Unified Device Architecture. It is Nvidia’s plan to take over the universe. That’s a bit of an overstatement, but CUDA represents a dramatic advance in what GPUs can do. Summed up, it’s a set of low level instructions for the hardware, a driver that allows software to access the hardware, and an API. The API is based on C with some CUDA specific instructions.
It allows programmers to develop general purpose applications for GPUs in a familiar C environment. The situation is more complicated than that, since standard C code wouldn’t run any faster in CUDA. Most or all of it would get performed by the CPU anyways (one of CUDA’s achievements is allowing the CPU or GPU to perform tasks). It would probably be slower, actually, due to overhead.
Getting the most out of GPGPU still requires a lot of knowledge in terms of parallel programming. CUDA does a lot of the work assigning threads at a hardware level, which it has to, there are thousands of threads available. But the programmer, instead of thinking in terms of threads, still has to turn their programs into grids of blocks. The hardware can then divide the blocks into threads.
Traditional Parallel Uses for CUDA
Because parallel programming in CUDA involves similar complications to parallel programming in general, and because the hardware is suited for very parallel calculations, the applications we have seen for CUDA have been restricted to common uses for parallel computing.
High Performance Computing has used large numbers of commercial CPUs in parallel for years on big, number crunching type problems. Medical research and various kinds of modeling (financial, weather, fluid dynamics, etc.) are some of the dozens of fields that use parallel HPC. And that is where we have seen most of CUDA’s applications.
So what’s so new and great about CUDA? Well, those parallel friendly applications are far better suited to GPUs in the first place. Users can get the same results from a few GPUs compared to far more CPUs, which costs a lot less money. A Tesla from Nvidia brings HPC performance in a rack mountable or even desktop PCI-E card format.
It still costs thousands of dollars, but can do things for researchers that would normally require an order of magnitude or two greater in expense. Letting community colleges offer Ivy League computing power to their students without the price is great, and Nvidia certainly found a great market for themselves. We’re still not talking about what CUDA can do for average computers, though.
Badaboom: CUDA for the Mainstream
Another feather in Nvidia’s CUDA-cap is the amount of already installed CUDA hardware: 100 million units. If you have an 8 series or later Nvidia graphics processor, congratulations, you own an itty-bitty super-computer. You can even use it for the fancy research described above with distributed parallel computing packages like SETI and Folding @home. Both use idle CPU resources volunteered by users all over the web, and have now used CUDA to use idle GPUs as well.
In yet another spin on HPC applications, Badaboom has led the way to stuff people use day to day. It’s a blessing for people who want to transcode their video files to a different format, like iPod or Zune. Badaboom has a slick interface that lets you do your encoding with the help of a CUDA GPU, routinely doubling the performance of a CPU working on the task alone. Unfortunately, Badaboom has been the lone representative of mainstream CUDA apps for a while now.
But Nvidia and its CUDA owners have something to get excited about again, now that Nero is using CUDA on their next release of Move It. Move It does essentially the same stuff as Badaboom, encoding, which isn’t as exciting as a completely new type of application for CUDA. But it does demonstrate that there are tasks that the CPU used to handle that are better handled in conjunction with a GPU, and that if other developers in an industry are making use of CUDA, they might be at an advantage.
CUDA’s Upstream Journey
CUDA still has a way to go. It was rumored Photoshop CS 4 was going to use CUDA, but Adobe ended up using the OpenGL graphics API instead. CUDA would have left all ATI users in the cold, which Adobe wasn’t willing to do. Microsoft’s DX 11 will not have that problem, but it will be a problem for Linux and Mac Users.
Even if CUDA loses out to another API, Nvidia still reaps the rewards of an increased GPGPU market, though they would have to share those rewards with AMD. Intel has also looked at this GPGPU issue. Since, unlike AMD who makes both CPUs and GPUs, Intel just makes CPUs, they can’t afford to be neutral about which chip consumers spend more on.
So they aren’t sitting on their hands in regard to CUDA and GPGPU. If Nvidia can get a GPU to act like a CPU, can Intel get a CPU to act like a GPU? They are trying just that with Larrabee, which we discus in a coming article.