500 likes | 689 Views
Radeon Graphics Architecture. By Paul Zimmons 10/26/2000. Organization. Chip Specs Memory TnL Pipeline Shading Pipeline Other Features. Overview: What is a Radeon?. Chip introduced by ATI in April (announced) but more like August Designed by ATI As opposed to ArtX (next gen/Dolphin)
E N D
Radeon GraphicsArchitecture By Paul Zimmons 10/26/2000
Organization • Chip Specs • Memory • TnL Pipeline • Shading Pipeline • Other Features
Overview: What is a Radeon? • Chip introduced by ATI in April (announced) but more like August • Designed by ATI • As opposed to ArtX (next gen/Dolphin) • Compete with nVidia • Pathway to DX8
Radeon Chip Specifications • 30 M Transistor, 100 mm2 • 256-bit = 2 x 128-bit units • 0.18 micron fabrication • Handles up to 128 Megs memory • 200 Mhz core and 200 Mhz memory • DDR makes that more like 400 Mhz mem • 2 rendering pipelines (@ 200 Mhz) • 8 Hardware lights • 350 Mhz DAC
More Chip Specifications • Dual Processor Enabled • 3 textures per clock • 6.4 GB/s memory bandwidth • But how much of this is used? • Programmable TCL (Charisma Engine) • Programmable Pixel Shading (Pixel Tapestry) • Z compression (Hyper Z) • Optimized for 32 bpp rendering
Memory • 6.4 GB/s? That’s a lot right? • PCI = 132 MB/s • AGP (@ 66Mhz) = 528 MB/s • 2x = 1056 MB/s, 4x = 2112 MB/s • System Memory ~ 800 MB/s • Radeon Graphics Card Memory = 5GB/s • 366-400 Mhz => 2.7 ns -> 2.5 ns • But since DDR can use 5ns DDR memory
More Memory Although now PC 133 with DDR ram (266 Mhz effective) can provide 2.1 GB/s (PC100 SDRAM is 1.6 GB/s)
More Memory • Radeon supports AGP 2x, 4x (1, 2 GB/s) • Supports AGP Fast Writes • The system memory is bypassed completely allowing the CPU to talk directly with the Radeon • Chip must accept data at 4x processor write speed • Improvement depends on triangle throughput (might be worse with fast writes on if not enough triangles)
Charisma Engine • Tranformation, Lighting, and Clipping • CPU generates OGL commands and provides vertex data, etc. • Transform and Light 30 M Tri/s • In reality only a fraction is delivered
Charisma: Vertex Skinning • Animations are defined by hierarchy of bones modifying a mesh • A vertex by default follows one of the bones which causes problems when joints have large angle • No way to smoothly blend • Introduce 2 world matrices one for the nearest bone and one for a neighbor
Skinning Continued • Now transforms are applied to the bones but the vertex gets a mixture of both (weighted by user) • Radeon allows up to 4 transforms per vertex (but can’t be changed within a triangle?)
After Transformation -> Setup • Triangle setup takes the x,y,z coordinates of the triangle and fills in the x,y pixels and z value • Set up time is proportional to triangle size • Idle time is proportional to the number of rendering pipelines • 4 pipelines and 2 pixels tri => 50% idle
Before going on • Pixel Fill Rate = Graphic Core clock speed * # of rendering pipelines • Texel Fill Rate = Graphic Core clock speed * # of texture units * filtering samples per clock • 1 unfiltered, 4 bilinear, 8 trilinear • Effective Fill Rate = Graphics Core clock * # of rendering pipelines * # textures in 1 cycle
Setup and Triangle Size • Becoming more of a problem • Small triangles reduce fill rate • Smaller resolutions have less effective fill rate than larger ones • Overdraw also reduces fill rate • Drawing stuff that will be drawn over again • All visible pixels accesses z twice • Plus all those reads for non-visible ones • Plus clearing the z buffer if necessary
Z buffer • Big Memory bottleneck • 1600x1200x32 = 7.68 Megs • Read and written at least twice per frame • Say 7.68*2*60 = 921.6 MB/s • The most frequently accessed part of local memory
Hyper Z • Three methods • Hierarchical Z • Z-compression • Fast Z clear
Hierarchical Z • After Triangle Setup but before rendering • Look up into a coarser representation of a part of the Z buffer • The area is kept in a special cache to avoids unnecessary Z-buffer reads
Z compression • Lossless compression of Z buffer coordinates • Well really areas of the Z buffer • Well really the Z buffer cache
Fast Z Clear • 50-64 times faster than conventional Z buffer clearing • Has something to do with the cache • Without writing to the Z buffer
How bad is it? • From an experiment on Tom’s Hardware on a GeForce 2 GTS, a standard GeForce 2 GTS (200 Mhz) performs as fast as a 100 Mhz GeForce 2 GTS with ‘infinite’ memory speed • Similar for ATI but Hyper Z provides relief (about 20% more)
Pixel Tapestry Exposed in OpenGL with an extended EXT_texture_env_combine And ATIX_texture_env_dot3
Pixel Tapestry Ops • Dot product per pixel • Diffuse bump mapping • 3 Textures per pixel per clock • 3D Textures • Cube environment mapping • Environment Mapped Bump Mapping • Projective Texturing • Priority Buffers • Shadow Mapping • Range Based Fog
Impact on Fill Rate • Pixel Fill rate is about the same • 366 Mpixel vs. 800Mpixel GTS • Texel Fill rate • 1100 Mtexel vs. claimed 1600 Mtexel • Because of 3 textures per clock • Hyper Z can push this higher
3 Textures Per Clock Provides basic accumulation effects also such as soft shadows Reduces the number of texture memory accesses
3D Textures • Self shadowing BRDF lookups • Volumetric fog/shadows/lighting • General 3D look up table • Also 3D texture compression
Texture Compression • Problem is that lightmaps and sky are low color and low resolution • Hard case for compression nVidia ATI
Bump Mapping • Several types • Emboss Bump Mapping • Dot Product 3 Bump Mapping (diffuse) • nVida more sophisticated • DX8 self-shadowing bump mapping • Environment Mapped Bump Mapping • Single level chained texture look up • Du/dv maps (?)
Emboss Bump Mapping Difuse Map Half Intensity Height Field Inverse HIHF
Dot Product 3 dot = Normal map is derived from a height map Light is represented at a cube map in world space Diffuse color only
EMBM • Perturb the eye space reflected ray according to some other map • Grayscale
Projective Texture Mapping • Project 3D geometry into 2D texture map • Like generating screen coordinates into texture space • Uses projective texture matrix • Can work in conjunction with a priority buffer to achieve special effects
Priority Buffer • Like Z • Provides a number (starting at 1) to each polygon depending on how close they are to the viewer • Allows for shadow mapping • Can project a light and cast shadows at the same time • Only method that supports easy self shadowing • Radeon is the first consumer hardware with this
Range Based Fog • Uses Euclidean distance rather than depth
Anisotropic Filtering • 16 tap vs. 2 tap for nVidia • Makes a noticeable difference • Especially with text
What does this all mean? • More complete/complicated lighting
N-patches • Triangular Cubic Bezier Surfaces • Supply a triangle (with normals) and a subdivision amount • N new points along each edge
N-patches continued • Project each new vertex into the plane defined by the normal
Video (Rage Theater) • Not in Radeon itself but usually on the same board • “on-chip motion compensation, run-level decode, de-zigzag and IDCT hardware, acceleration of MPEG-2, 8-bit per-pixel alpha blending of video and graphics, 4x4-tap filtered scaling, hardware subpicture acceleration,per-pixel de-interlacing and the ability to directly drive component video”
Future Radeon • Can have two Radeons on one board • Names • Radeon II, Radeon MAXX, Radeon Pro? • Rumors of 128 MB board (probably with 2 chips)