Pin Me

Details Emerge About Nehalem Architecture

written by: Lamar Stonecypher•edited by: J. F. Amprimoz•updated: 6/6/2009

Fresh from the Intel Developer's Forum 2008 in Taipei, Republic of China come details about the architecture, design, and performance of the forthcoming Nehalem generation of Intel CPUs. In one way, it can be summed up simply as "Quad core comes of age." Get the essential details here.

  • slide 1 of 11

    On October 21, 2008 Intel Mobile Platforms Group manager Mooly Eden delivered the second day keynote address at the Intel Developer Forum in Taipei, Republic of China. His address was entitled “Expanding the Frontiers of Mobility," and he talked about, among other things, the forthcoming Nehalem quad-core technology and showed a prototype notebook running the associated Calpella architecture.

    Notebooks are an important area of concern for Intel. Industry analysts predict that 2009 will be the year that notebook sales overtake desktop sales. 2008, however, will be the year that Intel sells more mobile processors than desktop processors. Calpella is the version of Nehalem for notebooks, and that’s what we’ll look at here.

  • slide 2 of 11

    Switchable Cores

    The presentation started with Mr. Eden talking about the current Core 2 Duo Extreme quad-core processors. They have 800,000 million transistors, and he said it’s only a matter of time before the first one-billion transistor CPUs arrive. A demonstration of the Intel Extreme Tuning Utility was then shown on a specially cooled notebook, increasing the CPU speed from 2.5 GHz to 3.5 GHz. No mention was made of the effect of this on the warranty, however.

    Further into the presentation, Mr. Eden called for the attention of all “chipheads" and started describing Nehalem and Calpella. The basic features of the microarchitecture are quad cores, power-switching of the cores on and off, virtually no leakage between active and inactive cores, turbo boost technology that regulates the frequency of the cores, and loop detection and simplified logic for loops.

    As in the Core Duo 2 Extreme, Nehalem has four cores on the die. Unlike the Extreme, however, each core is equipped with integrated power switches (a power transistor) that can turn the individual cores on and off. The advantage this provides is essentially no leakage between active cores and disabled cores. It is efficient, too. The power switching transistors only use 35 milliwatts themselves.

  • slide 3 of 11

    For applications that only utilize two cores, two cores can be switched off to enhance notebook battery life. For an older application that only uses one core, three cores can be disabled.

  • slide 4 of 11

    Turbo Boost Technology

    As frequency goes up, more heat is generated. This is why new generations of processors often start at slower clock speeds than the previous generation. It happened with the dual cores; the first models were slower than the Pentium M (Dothan) models they replaced. When they went to four cores, they had to reduce the frequency levels of the cores in order to reduce the heat. At high workloads with four cores running, the CPU is at maximum heat dissipation – and maximum power consumption. This is powerful, but not a state one wants the CPU to stay in when operating on the battery.

    Previous Intel processors have had “Speed Step" technology that allows the CPU to slow down when the notebook is lightly loaded. Nehalem takes the frequency smarts further.

    Intel’s Turbo Boost Technology, or TBT, adjusts the frequency based on how many cores are active. It can do this because the inactive core or cores give it some overhead in heat dissipation. If some of the cores are inactive, the remaining cores can be run faster.

  • slide 5 of 11

    Let’s look at this. At high workload with four cores working, the CPU is at maximum performance and heat dissipation. With a highly threaded application, four cores can do more work than two cores, even running at a slower frequency, and get the work done sooner.

    So what happens if an application uses only two cores? Now, the CPU is less powerful than the previous generation, or is it? Not actually. Because there is no leakage in the two cores that are deactivated, it is safe to increase the frequency of the two cores that are working. This gives the active cores more headroom to work even faster. (In single-core operation, the frequency can be increased even more.)

    Eden notes that this automatic switching occurs outside the control of the operating system. So the application itself – the way it’s designed – becomes an important factor.

    Other than CPU power management, the design of Nehalem and frequency management depending on how many cores are active allow the user to complete his computational tasks more quickly so the device can be put in a low power state sooner.

  • slide 6 of 11

    Next: Hyper-threading, Loop Dectection, and Links to other Bright Hub articles about Developer Forum 08

  • slide 7 of 11
    This page in the article "New Details About Intel Nehalem and Calpella Platforms Emerge " addresses Nehalem's hyper-threading or simultaneous multiple threading ability and the Nehalem program loop detector that shuts down branch detection, fetch, and encode once a loop is running in a core. Nehalem hyper-threading, simultaneous multi-threading, Nehalem program loop detector
  • slide 8 of 11


    Nehalem also offers simultaneous multi-threading per core. This is called “Hyper-threading." Previous designs could only execute one program thread per core. Nehalem doubles that. This is a better design than adding another core because it requires very little additional power and reduces fabrication costs.

    The benefits to the user are better multi-tasking, faster media editing and encoding, and better productivity application performance.

  • slide 9 of 11

    Loop Detector

    The final new feature of the Nehalem microarchitecture described was the new loop sensor detector. This is intended to identify program loops – repetitive instructions. Computer applications are usually designed to operate in a big loop. Since they spend much of their time waiting for user intervention, this is very practical.

    The traditional method of handling loops in an x86 processor involved steps called branch prediction, fetch, and decode. Once a loop is detected in the Nehalem scheme, these three parts of processor operation can be turned off while the loop executes. It’s also better at identifying loops than previous generations. This also increases energy efficiency and improves performance.

  • slide 10 of 11

    Thanks for reading this. We hope that are enjoying reading and learning at the new Bright Hub.

  • slide 11 of 11

    Intel Developer Forum 2008

    Netbook vs. Notebook - from Intel's Perspective - Interested in a new mini-computer like a netbook? Wondering exactly what a netbook is and what you'd gain and have to give up over a notebook? We can help. At Intel Developer Forum 2008, Intel manager Mooly Eden gave us Intel's perspective on the question.

    Laptop Cooling - Intel Licenses Laminar Flow Tech - Finally got that powerful laptop you've wanted for so long? Surprised that the first time you heard the fan fully cut in it sounded like a jet taking off? Well, jet engines were the inspirations for Intel's new laminar flow cooling technology for laptops. Here we'll look at what they've announced.