Quick Path
Interconnect - QPI
In order to unlock Nehalem's magic, Intel first had to remove some of the limitations
that held back last generation's Core 2 Duo and Quad processors. The first thing to
go was the Front Side Bus, the old system interconnect that allowed the CPU, motherboard
and memory to talk to one another. With processors getting faster and memory sizes getting
larger, the FSB became increasingly choked up transferring data between the CPU and RAM.
Intel's solution is a new point-to-point bi-directional bus called Quick Path
Interconnect that transfers data directly between the CPU and the chipset. Every Core i7
processor is actually equipped with two QPI links, which could potentially allow for future multi-processor systems.
It's exciting, but we're not quite there yet.
For now though, QPI simply connects the CPU to the motherboard over a set of
20-bit wide connections that operate at either 4.8GHz (Core i7) or 6.4GHz (Core i7 Extreme Edition).
Since these links are bi-directional and allow the CPU and the chipset to both send and
receive information simultaneously, the end result is 19.2GB/s of bandwidth between the
Intel Core i7 920 and the Intel X58 Express chipset. The important thing to take away
from all this math is that the Core i7 won't become bandwidth bottlenecked anytime soon,
thanks to QPI.
Memory Controller
The other part of Intel's effort to eliminate bottlenecking was to reduce the amount of bandwidth
that's transferred over the system bus. Following a move AMD made back in the days
of the original Athlon 64, Intel has added a memory controller to the die of the Core i7.
Removing the memory from the system bus means that there's no barriers stopping the CPU
from sending information to the chipset at full speed.
Intel has of course taken the opportunity to kick the Core i7's memory controller up a notch.
The DDR3-exclusive memory controller is designed to work with triple-channel memory, officially
supporting speeds from 800MHz up to 1066Mhz.
BCLK and Clock Speeds
With the FSB gone, adjusting the speed of the Core i7 is now done a little differently. The base clock,
or BCLK, is the primary means of over/underclocking the Core i7 920. The processor has a locked
multiplier of 20, and multiplying that by the standard 133MHz BCLK yields its core clock speed,
2.66GHz. The BCLK also affects the speed of the Uncore (the memory controller and L3 cache),
the speed of the DDR3 memory, and the speed of the QPI link.
Overclockers will be able to fiddle around with the QPI and Memory controller multipliers on the
Core i7 920, although the CPU multiplier stays locked unless you're willing to pay
for the $999 Core i7 Extreme processor.
Intel Socket 1366 Core i7 processors |
Processor Models |
Thermal Design
Power |
Clock Speed
(GHz) |
QPI (GT/s) |
Cache |
Price (USD) |
Intel Core i7 975 Extreme |
130W |
3.33 |
6.4 |
1MB L2 + 8MB L3 |
$999 |
Intel Core i7 965 Extreme |
130W |
3.2 |
6.4 |
1MB L2 +
8MB L3 |
$985 |
Intel Core i7 950 |
130W |
3.06 |
4.8 |
1MB L2 + 8MB L3 |
$562 |
Intel Core i7 940 |
130W |
2.93 |
4.8 |
1MB L2 + 8MB L3 |
$538 |
Intel Core i7
920 |
130W |
2.66 |
4.8 |
1MB L2 + 8MB L3 |
$273 |
Cache
In order to keep its four cores supplied with a constant stream of useful data, the Core i7 is
equipped with 256KB of L2 cache per core, and 8MB of shared L3 cache. Intel's adoption of smaller,
individual L2 caches reduces the amount of time each core spends looking for data,
and in the event of a cache miss, there's a good chance the large L3 cache will produce
a hit. While the memory latency on the Core i7 is greatly improved thanks to the
on-board controller, it's still nowhere near as fast as on-die cache.
HyperThreading
To take advantage of all this processor-to-system bandwidth,
Intel has brought back an old friend in the form of HyperThreading.
This multi-threading technique was actually introduced six years ago
with the Pentium 4, but has made a comeback with the Core i7. In standard
multi-core processors, the individual cores often have to sit idle while
waiting for tasks to be passed to them along the system bus. HyperThreading
allows the processor to work on a second task (also called a thread),
during that downtime, and swap between the two of them on the fly. This
means that even though the Intel Core i7 920 only has four physical cores,
it can actually process eight threads at once. Later on in this article
we'll be looking at the effects of HyperThreading in applications that
really take advantage of parallel processing.
Thermal Design Power
All of this microprocessing goodness is packed into a 231mm2 die, which houses the
731 million transistors that make up the Core i7. The Nehalem family is produced
on the same 45nm node that was used to produce Penryn, which
is both cost and energy efficient.
A core tenet of Intel's tick-tock strategy is that any increase in power
requirements has to be matched by an equal rise in performance.
During the development of the Nehalem architecture, Intel's engineers raised
the stakes even higher, requiring a two fold increase in performance
for every increase in power usage.
Intel rates the Core i7 line with a Thermal
Design Power (TDP) of 130W, which is coincidentally the same wattage use as
the old Pentium 4's I mentioned above. Let's take a look at some power-usage
scenarios to see if Intel's power-performance rules have really worked out.