Intel has been putting bigger and bigger caches on its CPUs to make up for the FSB to northbridge to memory route being rather slow compared to having a memory controller on the chip speaking directly to the main memory, which AMD has been doing for years. The bigger the cache, the less often the CPU needs to go looking for things in the memory. Now that Nehalem has an onboard memory controller, it can use a smaller cache, and use the reclaimed space and power for more processing

click to enlarge
Intel didn’t just shrink the caches, they totally changed how they work. There is still a 64k L1 Cache on each core, but it is a hair slower than it was on Penryn. Instead of having a big, shared, L2, 6 MB (duo) or 12MB (quad) Cache, i7s will have one 256k L2 cache per core, and a big 8 MB, shared L3 cache. The diagram at right is is from a presentation
slide used by Patrick K. Ng (TPDS001, page 6) at the 2008 Taipei Intel Developer Forum.
Also, the L3 cache is inclusive, meaning anything in any of the L1 or L2 caches is also in the L3 cache. This prevents data snooping, where a core that can’t find something in the L3 cache starts asking other cores if the data is in their buffer. Data snooping can really slow things down, but avoiding it this way has the disadvantage of wasting some L3 cache space on data that is in the faster caches, but in the i7’s case it works out to under 16% of the cache. It also increases the scalability factor Intel is after (discussed in the previous article) since core snooping traffic obviously gets worse the more cores you have.
If all this talk of memory controllers on the CPU and a shared L3 cache has you thinking “Don’t AMD’s Phenoms already do all that, and isn’t QuickPath interconnect kind of like HyperTransport?” the answers are yes and yes. Commercial success doesn’t go to the most elegant or sophisticated solution, it goes to the one that can be made more attractive to the consumer…