1 / 46

Radeon Graphics Architecture

Radeon Graphics Architecture. By Paul Zimmons 10/26/2000. Organization. Chip Specs Memory TnL Pipeline Shading Pipeline Other Features. Overview: What is a Radeon?. Chip introduced by ATI in April (announced) but more like August Designed by ATI As opposed to ArtX (next gen/Dolphin)

avel
Download Presentation

Radeon Graphics Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Radeon GraphicsArchitecture By Paul Zimmons 10/26/2000

  2. Organization • Chip Specs • Memory • TnL Pipeline • Shading Pipeline • Other Features

  3. Overview: What is a Radeon? • Chip introduced by ATI in April (announced) but more like August • Designed by ATI • As opposed to ArtX (next gen/Dolphin) • Compete with nVidia • Pathway to DX8

  4. Radeon Chip Specifications • 30 M Transistor, 100 mm2 • 256-bit = 2 x 128-bit units • 0.18 micron fabrication • Handles up to 128 Megs memory • 200 Mhz core and 200 Mhz memory • DDR makes that more like 400 Mhz mem • 2 rendering pipelines (@ 200 Mhz) • 8 Hardware lights • 350 Mhz DAC

  5. More Chip Specifications • Dual Processor Enabled • 3 textures per clock • 6.4 GB/s memory bandwidth • But how much of this is used? • Programmable TCL (Charisma Engine) • Programmable Pixel Shading (Pixel Tapestry) • Z compression (Hyper Z) • Optimized for 32 bpp rendering

  6. Chip Block Diagram

  7. Memory • 6.4 GB/s? That’s a lot right? • PCI = 132 MB/s • AGP (@ 66Mhz) = 528 MB/s • 2x = 1056 MB/s, 4x = 2112 MB/s • System Memory ~ 800 MB/s • Radeon Graphics Card Memory = 5GB/s • 366-400 Mhz => 2.7 ns -> 2.5 ns • But since DDR can use 5ns DDR memory

  8. More Memory Although now PC 133 with DDR ram (266 Mhz effective) can provide 2.1 GB/s (PC100 SDRAM is 1.6 GB/s)

  9. More Memory • Radeon supports AGP 2x, 4x (1, 2 GB/s) • Supports AGP Fast Writes • The system memory is bypassed completely allowing the CPU to talk directly with the Radeon • Chip must accept data at 4x processor write speed • Improvement depends on triangle throughput (might be worse with fast writes on if not enough triangles)

  10. Charisma Engine • Tranformation, Lighting, and Clipping • CPU generates OGL commands and provides vertex data, etc. • Transform and Light 30 M Tri/s • In reality only a fraction is delivered

  11. Charisma: Vertex Skinning • Animations are defined by hierarchy of bones modifying a mesh • A vertex by default follows one of the bones which causes problems when joints have large angle • No way to smoothly blend • Introduce 2 world matrices one for the nearest bone and one for a neighbor

  12. Skinning Continued • Now transforms are applied to the bones but the vertex gets a mixture of both (weighted by user) • Radeon allows up to 4 transforms per vertex (but can’t be changed within a triangle?)

  13. After Transformation -> Setup • Triangle setup takes the x,y,z coordinates of the triangle and fills in the x,y pixels and z value • Set up time is proportional to triangle size • Idle time is proportional to the number of rendering pipelines • 4 pipelines and 2 pixels tri => 50% idle

  14. Before going on • Pixel Fill Rate = Graphic Core clock speed * # of rendering pipelines • Texel Fill Rate = Graphic Core clock speed * # of texture units * filtering samples per clock • 1 unfiltered, 4 bilinear, 8 trilinear • Effective Fill Rate = Graphics Core clock * # of rendering pipelines * # textures in 1 cycle

  15. Last Aside (Comparison)

  16. Setup and Triangle Size • Becoming more of a problem • Small triangles reduce fill rate • Smaller resolutions have less effective fill rate than larger ones • Overdraw also reduces fill rate • Drawing stuff that will be drawn over again • All visible pixels accesses z twice • Plus all those reads for non-visible ones • Plus clearing the z buffer if necessary

  17. Setup and Triangle Size

  18. Z buffer • Big Memory bottleneck • 1600x1200x32 = 7.68 Megs • Read and written at least twice per frame • Say 7.68*2*60 = 921.6 MB/s • The most frequently accessed part of local memory

  19. Hyper Z • Three methods • Hierarchical Z • Z-compression • Fast Z clear

  20. Hierarchical Z • After Triangle Setup but before rendering • Look up into a coarser representation of a part of the Z buffer • The area is kept in a special cache to avoids unnecessary Z-buffer reads

  21. Z compression • Lossless compression of Z buffer coordinates • Well really areas of the Z buffer • Well really the Z buffer cache

  22. Fast Z Clear • 50-64 times faster than conventional Z buffer clearing • Has something to do with the cache • Without writing to the Z buffer

  23. How bad is it? • From an experiment on Tom’s Hardware on a GeForce 2 GTS, a standard GeForce 2 GTS (200 Mhz) performs as fast as a 100 Mhz GeForce 2 GTS with ‘infinite’ memory speed • Similar for ATI but Hyper Z provides relief (about 20% more)

  24. Pixel Tapestry Exposed in OpenGL with an extended EXT_texture_env_combine And ATIX_texture_env_dot3

  25. Pixel Tapestry Ops • Dot product per pixel • Diffuse bump mapping • 3 Textures per pixel per clock • 3D Textures • Cube environment mapping • Environment Mapped Bump Mapping • Projective Texturing • Priority Buffers • Shadow Mapping • Range Based Fog

  26. Impact on Fill Rate • Pixel Fill rate is about the same • 366 Mpixel vs. 800Mpixel GTS • Texel Fill rate • 1100 Mtexel vs. claimed 1600 Mtexel • Because of 3 textures per clock • Hyper Z can push this higher

  27. 3 Textures Per Clock Provides basic accumulation effects also such as soft shadows Reduces the number of texture memory accesses

  28. Example

  29. 3D Textures • Self shadowing BRDF lookups • Volumetric fog/shadows/lighting • General 3D look up table • Also 3D texture compression

  30. Texture Compression • Problem is that lightmaps and sky are low color and low resolution • Hard case for compression nVidia ATI

  31. Bump Mapping • Several types • Emboss Bump Mapping • Dot Product 3 Bump Mapping (diffuse) • nVida more sophisticated • DX8 self-shadowing bump mapping • Environment Mapped Bump Mapping • Single level chained texture look up • Du/dv maps (?)

  32. Emboss Bump Mapping Difuse Map Half Intensity Height Field Inverse HIHF

  33. Dot Product 3 dot = Normal map is derived from a height map Light is represented at a cube map in world space Diffuse color only

  34. EMBM • Perturb the eye space reflected ray according to some other map • Grayscale

  35. Projective Texture Mapping • Project 3D geometry into 2D texture map • Like generating screen coordinates into texture space • Uses projective texture matrix • Can work in conjunction with a priority buffer to achieve special effects

  36. Priority Buffer • Like Z • Provides a number (starting at 1) to each polygon depending on how close they are to the viewer • Allows for shadow mapping • Can project a light and cast shadows at the same time • Only method that supports easy self shadowing • Radeon is the first consumer hardware with this

  37. Range Based Fog • Uses Euclidean distance rather than depth

  38. Anisotropic Filtering • 16 tap vs. 2 tap for nVidia • Makes a noticeable difference • Especially with text

  39. What does this all mean? • More complete/complicated lighting

  40. N-patches • Triangular Cubic Bezier Surfaces • Supply a triangle (with normals) and a subdivision amount • N new points along each edge

  41. N-patches continued • Project each new vertex into the plane defined by the normal

  42. N-patch example

  43. N-patch example 2

  44. Video (Rage Theater) • Not in Radeon itself but usually on the same board • “on-chip motion compensation, run-level decode, de-zigzag and IDCT hardware, acceleration of MPEG-2, 8-bit per-pixel alpha blending of video and graphics, 4x4-tap filtered scaling, hardware subpicture acceleration,per-pixel de-interlacing and the ability to directly drive component video”

  45. Future Radeon • Can have two Radeons on one board • Names • Radeon II, Radeon MAXX, Radeon Pro? • Rumors of 128 MB board (probably with 2 chips)

More Related