170 likes | 329 Views
RADEON ™ 9700 Architecture and 3D Performance. Gordon Elder. RADEON ™ 9700. What is the RADEON ™ 9700 ? Programmability(SMARTSHADER ™ 2.0) First Full Floating Point Graphics Pipeline Enables Compilation of High Level Shading Languages Performance High Bandwidth Parallelism
E N D
RADEON™ 9700Architecture and 3D Performance Gordon Elder
RADEON™ 9700 What is the RADEON™ 9700 ? • Programmability(SMARTSHADER™ 2.0) • First Full Floating Point Graphics Pipeline • Enables Compilation of High Level Shading Languages • Performance • High Bandwidth • Parallelism • Efficiency • Image Quality (SMOOTHVISION™ 2.0) • Multisample Antialiasing • Anisotropic Texture Filtering
Image Generation with Image Mapping 1st Generation Programmability Idea: Texture Mapping, Blinn and Newell 1976 Implementation: SGI VGXT 1990 Hardwired Vertex Processing Hardwired Fragment Processing with a Single Texture Result: Environment Mapping and other effects Blinn, J. F. and Newell, M. E. Texture and reflection in computer generated images. Communications of the ACM Vol. 19, No. 10 (October 1976), 542-547
Image Generation with Texture Composition2nd Generation Programmability Idea: Shade trees, R. Cook 1984 Implementation: RADEON™8500 2001 Limited Vertex Programmability Limited Fragment Processing • Multiple Textures • Fixed Point Data • Short Programs Result: Current generation of effects. Robert L. Cook Shade Trees. Computer Graphics Vol. 18, No. 3, (July 1984), 223-231
Image Generation with General Purpose Floating Point Math & Texturing 3rd Generation Programmability Idea: RenderMan®, Pixar 1987 Implementation: ATI RADEON™9700 2002 Advanced Vertex Programmability Advanced Fragment Programmability • Floating Point Data • Rich Instruction Set • Large Instruction Store Result: Enabling Cinematic Rendering Compiling RenderMan®, Maya, etc. Willina T. Reeves, David H. Salesin, Robert L. Cook Rendering Antialiased Shadows with Depth Maps. Computer Graphics Vol. 21, No. 4, (July 1987), 283-291
SMARTSHADER™ 2.0 • Next-generation programmable shader technology • Enabling cinema-quality effects in real time • First complete DirectX® 9.0 feature support • 2.0 Vertex and Pixel Shaders • Floating Point Pixel Pipelines • 128-bit Floating Point Texture and Frame Buffer Formats • Two-Sided Stencil Shadow Acceleration • High Precision 32-bpp (10:10:10:2) Display Mode • Higher Order Surface Enhancements • Full feature set also available for OpenGL® • OpenGL® Shading Language Support
Vertex Shaders (SMARTSHADER™ 2.0) • Flow Control • Loops, jumps and subroutines • Allow re-use of certain parts of theshader code • Avoids repetition and saves instructions • More Instructions, More Complex Effects • Up to 65,280 instructions per pass • Vertex shaders can be much more complex than they were in DX8
Pixel Shaders (SMARTSHADER™ 2.0) • More Complex Shaders by an Order of Magnitude • Up to 160 instructions per pass • 32 address ops, 64 color ops, 64 alpha ops • Compared with 12 instructions total in DX8.0 • Multi-pass rendering support • High precision 128-bit floating point data formats for storing intermediate results between passes • Shaders can now effectively be thousands of instructions long – performance is the only limitation • 24-bit per component floating point precision for all pixel shader operations - necessary for cinema-quality effects • Allows shaders written in any present or future language to run on hardware with SMARTSHADER™ 2.0 • Even high level languages like RenderMan® can now be compiled to run on RADEON™ 9700 in real time • Pixel shader can also implement complex Image Processing algorithms
RADEON 9700 Performance Key design elements for best performance: High Bandwidth, Parallelism, & Efficiency High Bandwidth • AGP 8x provides 2 GB/sec transfers to or from the CPU or system memory. • 310 MHz 256-bit DDR Memory Interface provides 20 GB/sec access to the Frame Buffer • Internal 256-bit data busses for Color, Texture and Z Parallelism • 4 Vertex Engines running at 325MHz provides 325 Mtriangles/sec (4 clocks per vertex per engine) • 8 Pixels/Clock Rasterization Architecture running at 325MHz provides a peak fill rate of 2.6 Gpix/sec
RADEON 9700 Performance (cont.) Efficiency Graphics systems tend to be Memory Bandwidth limited. The RADEON™9700 is no exception. So it is important to use the bandwidth efficiently. • Hierarchical and Early Z checking allows pixels to be rejected before the pixel shader. This is very important when shader programs are long. • Color, Texture and Z caches reduce memory bandwidth utilization. Benefit from spacial and temporal locality. • Lossless Color and Z data compression reduce memory bandwidth utilization. • Compressed Textures can be utilized to reduce memory bandwidth utilization. • Fast Color and Z clears eliminate need to access memory for clears HyperZ III
RADEON™9700 Performance (cont.) One more interesting thing…….. Scalability • The RADEON™9700 Architecture is capable of scaling up to 256 simultaneous units
Image Quality (SMOOTHVISION™ 2.0 ) Performance matters too Pixel antialiasing and anisotropic texture filtering improve image quality only if they are enabled. Just going to higher resolutions isn’t the answer for improved image quality. • Artifacts due to poor texture sampling remain. • Dynamic antialiasing artifacts are still very visible. Sufficient performance for high resolution display, high quality texture filtering, and antialiasing is needed. The RADEON™9700 was architected to do all three simultaneously.
Standard Edge Gradient Output Input Gamma Corrected Edge Gradient Output Input Anti-Aliasing (SMOOTHVISION™ 2.0) • Non-Grid Programmable Multi-Sampling • 2, 4, or 6 samples per pixel • Sample positions provide the maximum quality per sample • Lossless Z and Color compression minimizes bandwidth cost of higher sample counts. • Per Sample Gamma Correction • Takes gamma into account when blending samples • Creates smoother edge transitions
Anisotropic Filtering (SMOOTHVISION™ 2.0) • Improved Adaptive Algorithm • Up to 16 Trilinear Samples (128-tap) • Calculates optimal number of samples foreach polygon • Delivers full image quality benefit while conserving memory bandwidth