PCI Express ® : Enabling New Opportunities for Graphics

PCI Express®: Enabling New Opportunities for Graphics Barry Wagner Director of Technical Marketing NVIDIA Corporation

Session Outline • Increased bandwidth of PCI Express® enables a new direction for GPU design • Scalability of PCI Express creates a new category of gaming machines • PCI Express transition creates an opportunity for a new open standard for laptop graphics upgrades

PCI Express – Raising the Bar on Mainstream Graphics • Graphics companies utilize PCI Express bandwidth in place of dedicated graphics memory • Substantial cost saving for consumers • Graphics pipeline changed to allow rendering directly to system memory • Combination of HW and SW used to manage surface allocation • Users get better compatibility, stability, reliability from new, frequently improved drivers and GPU architectures at a lower price.

Rendering to System Memory • Fundamental new direction in discrete GPU design • Direct rendering to system memory significantly reduces local graphics memory requirements • Better laptop performance at a lower power and cost due to fewer memories • Delivers latest graphics architectures and performance to affordable price points • Enabled by PCI Express

PCI Express Bandwidth Improvements • Improved Write Bandwidth is Critical for Rendering Directly to System Memory 1 READ from system memory to GPU 2 WRITE from GPU to system memory

Benefits of PCI Express System Cache vs. Conventional AGP Architecture 96MB dynamically allocated for graphics System DRAM System DRAM 13.6GB/s Peak bandwidth 4GB/s Bi-directional PCI Express data paths enable efficient rendering to system memory Core Logic Core Logic 128MB Graphics Memory cannot render efficiently to system memory 4GB/s 4GB/s 4GB/s 16Mx16 Traditional GPU PCI Express Graphics Device 4Mx32 3.2 GB/S 5.6 GB/S 16Mx16 32MB 128MB Graphics Memory

Typical 3D Pipeline Triangle Setup Z-Cull Shader Instruction Dispatch L2 Tex Fragment Crossbar Graphics Memory Graphics Memory

Shared Memory Through MMU Triangle Setup Z-Cull Shader Instruction Dispatch MMU L2 Tex Fragment Crossbar System Memory Graphics Memory Graphics Memory • Render directly to system memory at full speed • Texture from system memory at full speed • Dynamically allocate surfacesanywhere • Present NVIDIA TurboCacheTM architecture

Shared Memory Enables Higher Performance at Lower Price 1DRAM pricing changes constantly. A variety of market sources are available to confirm the approximate pricing reflected here. http://www.dramexchange.com is one such source. 3DMark05 10x7 0x AA / 0x Aniso 256MB Shared Memory 128MB Shared Memory 256MB 128-bit Local Memory 128MB 64-bit Local Memory Integrated Graphics

No Negative Impact to System Performance Shared System Memory Design Outperforms Integrated Graphics and Traditional GPU architectures in a Variety of System Benchmarks SysMark 2004 PCMark 2004 Business Winstone 2004 Biz WS CC WB 99 Business Disk WB 99 High-End Disk

More Usable PCI Express Bandwidth = Faster Performance 3DMark05 10x7 0x AA / 0x Aniso 6 GB/s 4 GB/s No need to wait for new applications to use the new bandwidth like we experienced with AGP

Key Points • PCI Express has enabled a new direction in GPU Design • Graphics performance scales with usable PCI Express bandwidth improvements • Cost reductions for graphics memory enable more consumers to experience games the way they were meant to be played. • Laptops get better performance, lower costs, and longer battery life with system memory cache

PCI Express Returns Scalable Graphics Performance to the PC • AGP was NOT very scalable • Architected for a single graphics slot • Scaled frequency over the bus life-cycle • PCI Express planned for scalability • Scalable bus widths : x1, x2, x4, x8, x16 • Port Splitting • Down-shifting • Frequency scaling: 2.5GHz (Gen1), 5GHz (Gen2) • General purpose bus means multiple slots available for scaling performance

Port-Splitting in the Core Logic • Dividing a slot into multiple independent links • Benefits • Allows 2 or more GPU on a single x16 slot without the need for a bridge chip • Does not require multiple x16 slots on the motherboard • Enables performance for wider install base of motherboards • Drawbacks • Higher power density on a single card • May require larger than standard form factor cards • Optional feature in PCI Express specification so risk that a specific card may not work in every system.

Down-Shifting on the Motherboard • Wiring fewer lanes to a slot than the max • x8 slots wired for x4 link width • x16 slots wired for x8 link widths • Why? • There will always be a limited number of lanes • Benefits • Cheapest way to get 2 slots that are capable of fitting the x16 edge connector used on graphics cards • 2x the GPU performance with standard cards • Avoid inventory management of both x8 and x16 cards • Drawback • Motherboard has to support slot power level

Scalable Graphics Impact • PCI Express High-End Graphics Specification • 75W from x16 slot • 75W from HE power connector • Scalable solutions demand more power for the graphics subsystem • Graphics companies have already started to request the PCI-SIG® begin addressing a demand for greater than 150W cards.

When do Systems Benefit from Dual GPU Designs? • When you play Modern GPU intensive games at: • High Resolutions(1600x1200) with: • Anti-Aliasing (4xAA) • Anisotropic Filtering (8xAF) Example Applications that Scale: Battlefield Vietnam, Doom3, Far Cry Game images are the property of their respective owners. All Rights Reserved.

Nvidia’s Design With Scalable PCI Express • GPU-to-GPU Interconnect • 1GB/s digital link • Provides pixel and synchronization information • Multiple system implementations supported • Rigid PCB with standard edge connectors • Flexible cables with standard edge connectors • Embedded designs • Optimized within Software • Can be enabled or disabled by driver and/or user • User may prefer to enable 4 displays for some applications in non-scaled mode

Solutions Possible for Any Compatible System • Dual x16 slots fully-wired motherboard: • 2 compatible standard graphics cards with scalability support • Rigid or flexible cable between the cards • Single x16 slot without Port Splitting: • 2 GPUs on a single card plus a x16 to x8 PCI Express bridge • Inter-connect embedded in PCB • Single x16 slot with Port Splitting: • 2 GPUs on a single card • Inter-connect embedded in PCB • Dual x16 slots each down-shifted to x8: • 2 compatible graphics cards with scalability support • Rigid or flexible cable between the cards

Nvidia’s Dynamic Load Balancing Technology • HW and Driver work together to determine best algorithms to share workload between cards • Alternate Frame Rendering (AFR) • Split Frame Rendering (SFR) • Performance Improvement depends on ability to share work across cards • Works with almost any 3D application • Today some applications benefit more than others • Some popular applications show 1.7x – 2x increase • Future games will trend toward 2x increase as indicated by 3dMark05

NVIDIA SLITMA new class of gaming performance Benchmarks run at 1600x1200 4x/8x on nForce 4 SLI Motherboard with AMD Athlon 64FX

Key Points: • PCI Express provides for better scalability • Factor in dual GPU architectures into your graphic power budgets

A New Opportunity to Address Mobile Graphics Form Factor • Desktop PC standard form factors has enabled growth and innovation • Rapid time to market • Range of choice addresses all market segments • Drives competition • The Mobile PC Platform has not benefited from a common form factor • Mobile PC development cycle is long • Per platform custom graphics integration is expensive • Custom design limits OEM/ODM choices & ultimately the consumer’s

Notebook Graphics Modules Today • Engineering resources are not leveraged • Custom BIOS • Custom connectors • Custom power delivery • Custom cooling • Custom Power Management

PCI Express is a catalyst for change • PCI Express electrical changes require a new look at the platform • Robust differential links can tolerate modular design • No longer able to just send out the old footprint compatible laptop. • PCI Express brings new power management options • Scalable link widths • Power down links you don’t need • Notebook Platform has matured substantially • Platform requirements are better understood • Lessons learned from current module experience

What issues must be addressed by a new Mobile PC Graphics Architecture? • Broad industry participation • Open Architecture • A common host interface (PCI Express) • Enable broad range of power, thermal, & mechanical boundaries • Cover a variety of display output technologies • VGA, DVI, LVDS, Video (HDCP) • A common software layer and partitioning for the video BIOS and system BIOS

MXM – Mobile PCI Express Module • Open architecture co-developed with many leading ODM/OEM • AOpen, Arima, Asustek, Clevo, FIC, Mitac, Quanta, Tatung, Uniwill, Wistron and more. • Enables a consistent graphics interface across all PCI Express notebooks • Supports up to x16 PCI Express • One design, many notebooks • Use different graphics solutions, from ANY vendor • Potential for consumer upgradeable graphics • Modules already designed for various NVIDIA, ATI, and S3 products

Key Points: • The PCI Express transition is creating new opportunities for competition in the notebook PC • The notebook platform is evolving toward a common modular graphics form factor in many segments • MXM is one open architecture developed with the notebook industry to leverage the limited engineering resources. • Now is the time for the industry to come together on a common form factor.

Call To Action • Consider system memory cache solutions for your mainstream desktop and mobile markets • For maximum graphics performance, build systems with multiple x16 slots to enable best scalability of graphics hardware • Consider whether MXM is right for your next laptop design or purchasing decision • Let your PCI-SIG reps know you want the graphics industry power and scalability issues to continue to be addressed by the PCI-SIG.

For More Information • PCI Express Specifications • www.pcisig.com • NVIDIA PCI Express Product Information • http://www.nvidia.com/page/pci_express.html • NVIDIA TurboCache Technology • http://www.nvidia.com/page/turbocache.html • NVIDIA SLI Technology • http://www.nzone.com/object/nzone_sli_home.html • MXM Technology • http://www.nvidia.com/page/mxm.html

Community Resources • Windows Hardware & Driver Central (WHDC) • www.microsoft.com/whdc/default.mspx • Technical Communities • www.microsoft.com/communities/products/default.mspx • Non-Microsoft Community Sites • www.microsoft.com/communities/related/default.mspx • Microsoft Public Newsgroups • www.microsoft.com/communities/newsgroups • Technical Chats and Webcasts • www.microsoft.com/communities/chats/default.mspx • www.microsoft.com/webcasts • Microsoft Blogs • www.microsoft.com/communities/blogs

PCI Express ® : Enabling New Opportunities for Graphics