Intel Core i7 (Nehalem Bloomsfield) Features: A New Cache Design and Translation Lookaside Buffer (TLB)
written by: J. F. Amprimoz•edited by: Lamar Stonecypher•updated: 5/24/2011
The CPUs formerly known as Nehalem Bloomsfield have been dubbed Core i7 by Intel; this article discusses the new technologies being used, such as the new L3 cache and Translation Lookaside Buffer.
slide 1 of 3
Intel has been putting bigger and bigger caches on its CPUs to make up for the FSB to northbridge to memory route being rather slow compared to having a memory controller on the chip speaking directly to the main memory, which AMD has been doing for years. The bigger the cache, the less often the CPU needs to go looking for things in the memory. Now that Nehalem has an onboard memory controller, it can use a smaller cache, and use the reclaimed space and power for more processing
Intel didn’t just shrink the caches, they totally changed how they work. There is still a 64k L1 Cache on each core, but it is a hair slower than it was on Penryn. Instead of having a big, shared, L2, 6 MB (duo) or 12MB (quad) Cache, i7s will have one 256k L2 cache per core, and a big 8 MB, shared L3 cache. The diagram at right is is from a presentation slide used by Patrick K. Ng (TPDS001, page 6) at the 2008 Taipei Intel Developer Forum.
Also, the L3 cache is inclusive, meaning anything in any of the L1 or L2 caches is also in the L3 cache. This prevents data snooping, where a core that can’t find something in the L3 cache starts asking other cores if the data is in their buffer. Data snooping can really slow things down, but avoiding it this way has the disadvantage of wasting some L3 cache space on data that is in the faster caches, but in the i7’s case it works out to under 16% of the cache. It also increases the scalability factor Intel is after (discussed in the previous article) since core snooping traffic obviously gets worse the more cores you have.
If all this talk of memory controllers on the CPU and a shared L3 cache has you thinking “Don’t AMD’s Phenoms already do all that, and isn’t QuickPath interconnect kind of like HyperTransport?" the answers are yes and yes. Commercial success doesn’t go to the most elegant or sophisticated solution, it goes to the one that can be made more attractive to the consumer…
slide 2 of 3
AMD and its supporters may have been right in arguing that Core 2 was a brute-force approach to the CPU and that the writing for a northbridge-housed memory controller was on the wall. But the faster FSBs and bigger caches were enough to make Intel’s chips the fastest at the time. Intel is at the end of that road, and feels now is the time to use a more elegant design that has some things in common with AMD’s.
The "I told you so now you are copying" vs. "you introduced your technology too early and it didn’t work" can go back and forth for a while, but what is interesting is that Intel is still going with a bigger cache than AMD. A quad core Phenom has a total cache size of 4.5 MB to the i7’s 9.25 MB.
slide 3 of 3
New Translation Lookaside Buffer
If you don’t know much about the TLB, don’t worry; we’ll try to keep this short and simple.
Computers use virtual memory addresses. This allows the operating system to support far more tasks than it could if it were restricted to the actual physical memory by keeping unneeded information from memory on the hard disk. This, however, requires that the CPU check virtual address against a table to find the physical address. The table is kept in main memory, but can get so big that even part of it gets reassigned to the hard drive.
To keep from having to always go off the CPU to find out where something is, even if it is already in the cache, the CPU has a Translation Lookaside Buffer. This is a special cache that stores entries associating recently used virtual and physical memory addresses. It isn’t measured in raw size like the other caches, but by how many entries it can hold.
Nehalem changes the way the TLB works, so it’s hard to make a straight comparison, but, without getting bogged down in details, it looks like the new TLB will be able to keep track of about two to four times as many memory addresses as its predecessor.
The on CPU memory controller is the biggest bandwidth related change. We discuss it and its impact on the X58 chipset on which Core i7s run next.