Features of the New Nehalems: What is Jammed Into a Core i7? – Scalability and Bandwidth
written by: J. F. Amprimoz•edited by: Lamar Stonecypher•updated: 1/28/2012
Intel has been touting its Nehalem architecture for a while, and the first CPUs of the new family are just around the corner. We explain what is new about these processors.
slide 1 of 5
The new Nehalem architecture has many new features but Intel isn’t throwing out the successful facets of Core 2 entirely. The new architecture focuses not on radically changing Core 2, but on improving it in two ways: scalability and bandwidth.
slide 2 of 5
If you think of a CPU as an engine, and a core as a cylinder and piston in that engine, Intel has tried to do two things better: more scalability and bandwidth available to the retained elements of the preceding design. Scalability implies taking a good piston design, and finding a way to use it in everything from 2-cylinder motorcycle engines to 16-cylinder performance yacht engines, and making sure that if someone is using 2 or more engines in concert that they work well together.
To Intel, scalability is about making Nehalem based chips perfect for a broad range of applications. The initial i7 Bloomfield CPUs will be quad core, but next year’s value chips, Havendales for the desktop and Auburndales for laptops, will be duo core, while some server offerings will have 6 or 8. The X58 chipset for the desktop only has room for one CPU, but multiple CPU boards will be made available for servers and workstations. Part of scalability is modularity; how easily Intel can slap different components of a chip together. The slide from a presentation by Ronak Singhal (TPTS001, page 28) at the 2008 Taipei Intel Developer Forum (shown at right) indicates how a different number of cores, memory channels, and other factors, can be modified for different needs.
Several of the new features help scale not only what kind of hardware is available for sale from Intel, but how it handles different tasks to your advantage. A Power Control Unit can shut off unused cores and activate Turbo Boost Technology, automatically overclocking the cores that are working. Hyper Threading allows a core to run two streams of instructions, so the CPU can run extra streams (albeit more slowly) when needed, without the power or monetary costs of a physical core. These technologies and the new Loop Detection scheme are explained in this article.
slide 3 of 5
Quick Path Interconnect
Quick Path Interconnect (QPI) allows CPUs in multi-processor boards to talk directly to one another. CPUs will, in theory, have one QuickPath Interconnect for each other CPU on the board, and one that communicates with an I/O Hub which talks to the PCI-E cards and southbridge. The diagrams below, from a briefing by Intel’s Stephen L. Smith, show how this will work on a multiple CPU server (left) and the single CPU i7’s for the desktop (right). Intel also has a video demo of how QPI works here.
Obviously we’re getting into the chipset/motherboard side of things, so we’ll come back to QPI in the article after the next. For now, what is important to note is that QPI won’t only help out the multi-GPU server crowd’s scalability, it will also help desktop users get the other big benefit Intel wants to see from Nehalem…
slide 4 of 5
QPI: Server and Desktop
slide 5 of 5
To resume the engine analogy, bandwidth is somewhat like an engine’s valves. Improving the valve train and timing allows more optimal intake of fuel and air, and exhaust of spent gases, into and from each cylinder. Improving the bandwidth of the CPU allows more optimal intake of data and instructions, and exhaust of results from processes, into and from each core.
In addition to using QuickPath to communicate with the I/O Hub PCI-E cards, southbridge, and other CPUs, Nehalem processors will have an onboard memory controller and access memory directly as indicated by the above slide on the left.. These two features more or less spell out the end for the Front Side Bus and northbridge as we know them. Like QuickPath, this has a lot to do with the chipset and mobo on which it will run, so we’ll look at it more closely in the article after the next.
Some other changes were made to better use the new memory system, thusly optimizing bandwidth, we’ll look at those now.