As seen in previous nVidia nForce chipsets, the new
nForce 4 is a single chip solution. This means that instead of dividing the
chipset into a Northbridge (for video and memory input which communicates
directly with the processor) and Southbridge (peripheral and drive input,
communicates with Northbridge), all functions have been placed on a single
integrated core logic circuit.
This design innovation helps to reduce data
bottlenecks by eliminating the data bus between the Northbridge and Southbridge
chipsets completely. Given the unique architecture of the Athlon 64 processor,
in which the memory controller is on the CPU itself rather than the chipset,
this kind of one-chip solution makes a lot of sense.
nVidia's nForce 4 platform is currently available
in three distinct variants, with a fourth version apparently following in the
near future.
The three main variants available now are:
nVidia nForce 4 - a socket 754 solution which supports both Athlon 64 and
Sempron processors
nVidia nForce 4 Ultra - a solution for Athlon 64 and FX
processors in the socket 939 formfactor
nVidia nForce 4 SLI (Scaleable Link
Interface) - a solution for socket 939 Athlon 64 and FX processors, featuring
dual PCI-Express slots for dual SLI operation with compatible video cards.
As you
might expect, there are abundant similarities between each of these nForce 4
chipsets. Each of the three versions currently released handles different
situations, but each also offers a similar feature set to the user, the biggest
of which is PCI Express.
nVidia's
new Scaleable Link Interface
(SLI) technology is used to link two nVidia based
cards together, splitting the rendering load between them to increase 3D
performance. The technology requires a pair of compatible videocards (Nvidia
Geforce 6600GT models and above) with SLI connectors (must be implemented by the
video card manufacturer) and an Nforce 4 SLI chipset-based motherboard.
Typical PCI-Express-based motherboards use the PCI-Express x16 slot to interface
with video cards. As you'd imagine, this provides 16 PCI Express lanes to the
single card for a total available bandwidth of 8GB/s. The Nvidia Nforce 4 SLI
solution provides two physical PCI-Express video slots, and uses a switch to divert
8 PCI-Express data lanes to serve each slot. A single card can also be used
in either slot, and in this case the full 16 PCI-Express lanes are available. In
a typical SLI solution, the cards themselves are also linked by way of an
SLI cable attached to the special MIO 'video bus' connector on the top of each
card.
In the
nForce 4 motherboards we have seen, the SLI switch is implemented on a small
card which must be physically switched around to go from 'normal mode' in which
the full 16 lanes of PCI-Express goodness are available to a single card and
'SLI mode' in which 8 lanes are directed to each physical slot. We're not sure
if this can be replaced with an auto-sensing switch or not, but we hope
so.
Nvidia's
SLI works by allowing the two graphical processors to share the rendering
workload, governed by the Nvidia Detonator software drivers. The CPU passes all
neccessary 3D information to the 'primary' GPU, which then shares the
information with the second card via the video bus interface cable. This removes
the overhead of synchronizing the two processors from the PCI-Express bus,
allowing improved performance. The video bus link itself apparently runs at up
to 10GB/s, though we doubt that this bandwidth is fully utilized.
Currently, the only nVidia SLI-compatible video processors
are the Geforce 6600GT, 6800, 6800GT and 6800 Ultra. The graphical processors in
each video card must be identical, as must the video BIOS revisions, though the
cards can run at separate speeds (the SLI system will assume the lowest clock
speeds for both cards). This means that it is going to be pretty much essential
to have two identical cards from the same manufacturer to get SLI working
correctly. Nvidia has introduced a certification program to ensure that users
can find compatible products.
The
actual SLI rendering process uses one of two modes: Alternate Frame
Rendering (AFR) and Split Frame Rendering (SFR). AFR has
each video card render a separate frame, while SFR, the method that has gotten
more publicity, uses each GPU to render part of one frame. Interestingly, the
choice of which method to use in which games is pre-programmed into the
Detonator driver suite, meaning that if there is no existing profile for the
game you are playing, SLI will not work with that game. In these cases, a compatibility mode is used, which cuts off the SLI
process completely, using only a single GPU (and we'd assume only 8 PCI-Express
lanes) for all rendering tasks. Nvidia claims that they have already created
profiles for more than 100 of the most popular 3D games, and more will follow
with Detonator driver updates.
The
Split Frame Rendering
mode is probably the most interesting part of Nvidia's SLI
technology. Using the Detonator driver to balance and allocate the video load,
each GPU shares about half of the rendering work for each frame, then the
completed frame is assembled by the first primary GPU and output to the
PCI-Express x16 bus. Obviously this will not be 100% efficient, as different
parts of each graphical frame will vary in complexity and some overhead is added
in assembling the frame at the end, but overall this method should result in a
considerable performance increase. You can expect CPU load to increase as well,
since the Detonator software is responsible for balancing the video load to each
card at all times.
Alternate Frame Rendering mode,
where a frame is rendered separately on each video card, should give even higher
performance, but this technology cannot always be used on modern 3D games due to
certain graphical effects which require multiple frames to be blended together.
Split Frame Rendering has no such limitation as both cards are always working on
a single graphical frame.
The major
benefit of Nvidia's SLI is its ability to more fully utilize the massive
bandwidth of the PCI-Express x16 video solution. A pair of GPUs can process
information twice as fast (minus the overhead of the communication between them)
and use the available bandwidth more efficiently, considerably boosting 3D
performance. This should also enable users to get top-tier performance out of a
pair of mid-range 6600GT cards. Interested users should note that having two
videocards also considerably increases power consumption, and using a pair of
6800 Ultras will mandate a hefty power supply.