Originally dubbed
Prescott New Instructions, Intel now refers to the core's 13 additional instructions as SSE3.
Of course, you won't see any immediate benefit from the technology until
software developers begin employing the instructions in their code. Intel's C++
Compiler for Windows 8.0 supports development with SSE3, so it's only a matter
of time before optimized applications start emerging.
Representatives at Intel surmise that media applications
will be the first with SSE3 support. There are five principle areas that SSE3
improves upon: x87-to-integer conversion, complex arithmetic, video encoding,
graphics, and thread synchronization. Intel's own technical documentation
speculates that SSE3 will enable significant performance gains in the
aforementioned fields; however it remains the responsibility of software
developers to implement the changes.
Two of the instructions contained within SSE3 deal specifically with
Hyper-Threading, a technology that allows one physical processor to be recognized
as two logical processors by a compatible operating system.
The basic idea behind Hyper-Threading is that by permitting two threads to execute
simultaneously, more of a processor's resources will be utilized under load. Intel
also improved the way Hyper-Threading handles parallel operations.
Previous versions of the technology
limited the processor to working on one thread or another, causing a bottleneck
that would impede performance. With Prescott, Intel made additions to the types
of operations that may be conducted in parallel.
Core Improvements
Beyond the marketable additions
to the Pentium 4, Intel's engineers also spent time improving parts of
Prescott's core. For example, both the static and dynamic branch prediction
algorithms have been enhanced. Within the execution core itself, Intel claims to
have reduced latency on the Pentium 4's double-pumped ALUs and improved
scheduler performance. The hardware prefetcher is also more
efficient.
Given the
number of enhancements Intel is discussing, you'd expect a sizeable performance
increase. However, another of the engineering team's design considerations goes
a long way in hampering performance. The
Pentium 4's execution pipeline was designed for scalability, which is why the
same micro-architecture has so gracefully aged from 1.5GHz to 3.4GHz.
In order to procure frequencies as high as 4GHz by
the end of this year, Intel is now employing a 31-stage pipeline rather than the
previous 20-stage implementation. In building such a deep pipeline, Intel has
enabled those higher frequencies, but at the cost of IPC, the number of
instructions successfully executed each clock cycle. Fortunately, the other core
enhancements nearly compensate for the loss, leaving Intel in a position where
it hopes to scale Prescott quickly to improve actual performance.
At one
point, it was speculated that Prescott would sport the execution protection (NX) technology featured in
AMD's Athlon 64 family and Intel's Itanium. NX separates application
code and data in order to prevent buffer overflows, a common technique employed
by virus authors. Microsoft will enable the feature in Windows XP Service
Pack 2; however, AMD will be the only one to support it, as Intel has not yet
implemented NX in Prescott. According to an Intel representative, extensive
testing needs to be conducted to ensure NX doesn't break compatibility with any
existing software titles.