Intel Core i7 (Nehalem Bloomsfield) Features: A New Cache Design

New Cache

Intel has been putting bigger and bigger caches on its CPUs to make up for the FSB to northbridge to memory route being rather slow compared to having a memory controller on the chip speaking directly to the main memory, which AMD has been doing for years. The bigger the cache, the less often the CPU needs to go looking for things in the memory. Now that Nehalem has an onboard memory controller, it can use a smaller cache, and use the reclaimed space and power for more processing

NehalemCahe

Intel didn’t just shrink the caches, they totally changed how they work. There is still a 64k L1 Cache on each core, but it is a hair slower than it was on Penryn. Instead of having a big, shared, L2, 6 MB (duo) or 12MB (quad) Cache, i7s will have one 256k L2 cache per core, and a big 8 MB, shared L3 cache. The diagram at right is is from a presentation slide used by Patrick K. Ng (TPDS001, page 6) at the 2008 Taipei Intel Developer Forum.

Also, the L3 cache is inclusive, meaning anything in any of the L1 or L2 caches is also in the L3 cache. This prevents data snooping, where a core that can’t find something in the L3 cache starts asking other cores if the data is in their buffer. Data snooping can really slow things down, but avoiding it this way has the disadvantage of wasting some L3 cache space on data that is in the faster caches, but in the i7’s case it works out to under 16% of the cache. It also increases the scalability factor Intel is after (discussed in the previous article) since core snooping traffic obviously gets worse the more cores you have.

If all this talk of memory controllers on the CPU and a shared L3 cache has you thinking “Don’t AMD’s Phenoms already do all that, and isn’t QuickPath interconnect kind of like HyperTransport?” the answers are yes and yes. Commercial success doesn’t go to the most elegant or sophisticated solution, it goes to the one that can be made more attractive to the consumer…

Deja CPVu

AMD and its supporters may have been right in arguing that Core 2 was a brute-force approach to the CPU and that the writing for a northbridge-housed memory controller was on the wall. But the faster FSBs and bigger caches were enough to make Intel’s chips the fastest at the time. Intel is at the end of that road, and feels now is the time to use a more elegant design that has some things in common with AMD’s.

The "I told you so now you are copying" vs. "you introduced your technology too early and it didn’t work" can go back and forth for a while, but what is interesting is that Intel is still going with a bigger cache than AMD. A quad core Phenom has a total cache size of 4.5 MB to the i7’s 9.25 MB.

New Translation Lookaside Buffer

If you don’t know much about the TLB, don’t worry; we’ll try to keep this short and simple.

Computers use virtual memory addresses. This allows the operating system to support far more tasks than it could if it were restricted to the actual physical memory by keeping unneeded information from memory on the hard disk. This, however, requires that the CPU check virtual address against a table to find the physical address. The table is kept in main memory, but can get so big that even part of it gets reassigned to the hard drive.

To keep from having to always go off the CPU to find out where something is, even if it is already in the cache, the CPU has a Translation Lookaside Buffer. This is a special cache that stores entries associating recently used virtual and physical memory addresses. It isn’t measured in raw size like the other caches, but by how many entries it can hold.

Nehalem changes the way the TLB works, so it’s hard to make a straight comparison, but, without getting bogged down in details, it looks like the new TLB will be able to keep track of about two to four times as many memory addresses as its predecessor.

The on CPU memory controller is the biggest bandwidth related change. We discuss it and its impact on the X58 chipset on which Core i7s run next.

This post is part of the series: Core i7 and X58: Nehalem and Tylersburg Hit the Streets

Intel’s new microarchitecture has been talked about for a long time. The time has come to really see what it is all about, how much better it is, who should get it, and where to it.
  1. Intel’s New Desktop CPUs: What You Need to Know about these Processors
  2. Features of the New Nehalems: What is Jammed Into a Core i7? – Scalability and Bandwidth
  3. Intel Core i7 (Nehalem Bloomsfield) Features: A New Cache Design and Translation Lookaside Buffer (TLB)
  4. X58 Tylersburg: Big Changes to Motherboards Are Coming
  5. Which Motherboard for a Shiny, New, Core i7?
  6. X58 Based Motherboards for Your New Core i7: Gigabyte and MSI
  7. Wrapping Up Our Look at the First Crop of X58 Motherboards
  8. How Fast is Core i7?
  9. Games Not Multithreaded Enough for Core i7 Yet
  10. Remember to Budget for Memory: Triple Channel DDR3 Kits
  11. Who Needs a Core i7?
  12. Core i7 for Professional Applications: Graphics, Audio/Video Editing, or Research
  13. Core i7 965XE Still Fastest, but Not by Much When it Comes to Gaming