1 / 89

Commodity Components 1

Commodity Components 1. Greg Humphreys October 10, 2002. Life Is But a (Graphics) Stream. C. C. C. C. T. T. T. T. S. S. S. S. “There’s a bug somewhere …”. You Spent How Much Money On What Exactly?. Several Years of Failed Experiments. 1980-1982. 1975-1980. 1982-1986.

overton
Download Presentation

Commodity Components 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Commodity Components 1 Greg HumphreysOctober 10, 2002

  2. Life Is But a (Graphics) Stream C C C C T T T T S S S S “There’s a bug somewhere…”

  3. You Spent How Much Money On What Exactly?

  4. Several Years of Failed Experiments 1980-1982 1975-1980 1982-1986 1986-Present

  5. The Problem • Scalable graphics solutions are rare and expensive • Commodity technology is getting faster • But it tends not to scale • Cluster graphics solutions have been inflexible

  6. Big Models Scans of Saint Matthew (386 MPolys) and the David (2 GPolys) Stanford Digital Michelangelo Project

  7. Large Displays Window system and large-screen interaction metaphors François Guimbretière, Stanford University HCI group

  8. Modern Graphics Architecture • NVIDIA GeForce4 Ti 4600 • Amazing technology: • 4.8 Gpix/sec (antialiased) • 136 Mtri/sec • Programmable pipeline stages • Capabilities increasing at roughly • 225% per year GeForce4 Die Plot Courtesy NVIDIA • But it doesn’t scale: • Triangle rate is bus-limited – 136 Mtri/sec mostly unachievable • Display resolution growing very slowly • Result: There is a serious gap between dataset complexity and processing power

  9. Why Clusters? • Commodity parts • Complete graphics pipeline on a single chip • Extremely fast product cycle • Flexibility • Configurable building blocks • Cost • Driven by consumer demand • Economies of scale • Availability • Insufficient demand for “big iron” solutions • Little or no ongoing innovation in graphics “supercomputers”

  10. Stanford/DOE Visualization Cluster • 32 nodes, each with graphics • Compaq SP750 • Dual 800 MHz PIII Xeon • i840 logic • 256 MB memory • 18 GB disk • 64-bit 66 MHz PCI • AGP-4x • Graphics • 16 NVIDIA Quadro2 Pro • 16 NVIDIA GeForce 3 • Network • Myrinet (LANai 7 ~ 100 MB/sec)

  11. Virginia Cluster?

  12. Ideas • Technology for driving tiled displays • Unmodified applications • Efficient network usage • Scalable rendering rates on clusters • 161 Mtri/sec at interactive rates to a display wall • 1.6 Gvox/sec at interactive rates to a single display • Cluster graphics as stream processing • Virtual graphics interface • Flexible mechanism for non-invasive transformations

  13. The Name Game • One idea, two systems: • WireGL • Sort-first parallel rendering for tiled displays • Released to the public in 2000 • Chromium • General stream processing framework • Multiple parallel rendering architectures • Open-source project started in June 2001 • Alpha release September 2001 • Beta release April 2002 • 1.0 release September 2002

  14. General Approach • Replace system’s OpenGL driver • Industry standard API • Support existing unmodified applications • Manipulate streams of API commands • Route commands over a network • Track state! • Render commands using graphics hardware • Allow parallel applications to issue OpenGL • Constrain ordering between multiple streams

  15. App Graphics Hardware Display App Graphics Hardware Display App Graphics Hardware Display App Graphics Hardware Display Cluster Graphics • Raw scalability is easy (just add more pipelines) • One of our goals is to expose that scalability to an application

  16. App Server Display App Server Display App Server Display App Server Display Cluster Graphics • Graphics hardware is indivisible • Each graphics pipeline managed by a network server

  17. Cluster Graphics Server App Server Display App Server Display App Server • Flexible number of clients, servers and displays • Compute limited = more clients • Graphics limited = more servers • Interface/network limited = more of both

  18. Server Display Server Display . . . . . . Server Display Output Scalability • Larger displays with unmodifiedapplications • Other possibilities: broadcast, ring network App

  19. 1.0 3.0 2.0 0.5 1.0 0.5 3.0 12 bytes 2.0 1.0 0.5 0.5 1.0 1 byte COLOR3F VERTEX3F COLOR3F VERTEX3F Protocol Design • 1 byte overhead per function call glColor3f( 1.0, 0.5, 0.5 ); glVertex3f( 1.0, 2.0, 3.0 ); glColor3f( 0.5, 1.0, 0.5 ); glVertex3f( 2.0, 3.0, 1.0 );

  20. 1.0 3.0 2.0 0.5 1.0 Data 0.5 3.0 2.0 1.0 0.5 0.5 1.0 COLOR3F Opcodes VERTEX3F COLOR3F VERTEX3F Protocol Design • 1 byte overhead per function call glColor3f( 1.0, 0.5, 0.5 ); glVertex3f( 1.0, 2.0, 3.0 ); glColor3f( 0.5, 1.0, 0.5 ); glVertex3f( 2.0, 3.0, 1.0 );

  21. Efficient Remote Rendering Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle

  22. Efficient Remote Rendering Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle

  23. Efficient Remote Rendering Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle

  24. Efficient Remote Rendering Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle

  25. Efficient Remote Rendering Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle *None: discard packets, measuring pack rate

  26. Sort-first Stream Specialization • Update bounding box per-vertex • Transform bounds to screen-space • Assign primitives to servers (with overlap)

  27. Graphics State • OpenGL is a big state machine • State encapsulates control for geometric operations • Lighting/shading parameters • Texture maps and texture mapping parameters • Boolean enables/disables • Rendering modes • Example: glColor3f( 1.0, 1.0, 1.0 ) • Sets the current color to white • Any subsequent primitives will appear white

  28. glTexImage2D(…) glBlendFunc(…) glEnable(…) glLightfv(…) glMaterialf(…) glEnable(…) Lazy State Update • Track entire OpenGL state • Precede a tile’s geometry with state deltas Ian Buck, Greg Humphreys and Pat Hanrahan, Graphics Hardware Workshop 2000

  29. How Does State Tracking Work? • Tracking state is a no-brainer, it’s the frequent context differences that complicate things • Need to quickly find the elements that are different • Represent state as a hierarchy of dirty bits • 18 top-level categories: buffer, transformation, lighting, texture, stencil, etc. • Actually, use dirty bit-vectors. Each bit corresponds to a rendering server

  30. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 Texture Light 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Fog Line Polygon Buffer Scissor Viewport Light 1 1 1 1 1 1 1 1 1 Light 2 Ambient color (0,0,0,1) 0 0 0 0 0 0 0 0 Light 3 Diffuse color (1,1,1,1) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Light 4 Spot cutoff 180 1 1 1 1 1 1 1 1 Light 5 Light 2 Texture Inside State Tracking glLightf( GL_LIGHT1, GL_SPOT_CUTOFF, 45) Transformation Pixel Lighting 45

  31. Bit 2 set? Bit 2 set? Bit 2 set? Bit 2 set? Bit 2 set? Bit 2 set? Equal Not Equal! Pack Command! Equal 45 GL_SPOT_CUTOFF GL_LIGHT1 LIGHTF_OP Context Comparison Client State Server 2’s State No, skip it No, skip it Yes, drill down No, skip it Yes, drill down (0,0,0,1) (0,0,0,1) (1,1,1,1) (1,1,1,1) 45 180 45 No, skip it

  32. Output Scalability Results • Marching Cubes Point-to-point “broadcast” doesn’t scale at all However, it’s still the commercial solution [SGI Cluster]

  33. Output Scalability Results • Quake III: Arena Larger polygons overlap more tiles

  34. WireGL on the SIGGRAPH show floor

  35. App App Display . . . App Input Scalability • Parallel geometry extraction • Parallel data submission • Ordering? Server

  36. Parallel OpenGL API • Express ordering constraints between multiple independent graphics contexts • Don’t block the application, just encode them like any other graphics command • Ordering is resolved by the rendering server • Introduce new OpenGL commands: • glBarrierExec • glSemaphoreP • glSemaphoreV Homan Igehy, Gordon Stoll and Pat Hanrahan, SIGGRAPH 98

  37. Serial OpenGL Example def Display: glClear(…) DrawOpaqueGeometry() DrawTransparentGeometry() SwapBuffers()

  38. Optional in Chromium Optional in Chromium Parallel OpenGL Example def Init: glBarrierInit(barrier, 2) glSemaphoreInit(sema, 0) def Display: if my_client_id == 1: glClear(…) glBarrierExec(barrier) DrawOpaqueGeometry(my_client_id) glBarrierExec(barrier) if my_client_id == 1: glSemaphoreP(sema) DrawTransparentGeometry1() else: DrawTransparentGeometry2() glSemaphoreV(sema) glBarrierExec(barrier) if my_client_id == 1: SwapBuffers()

  39. Clear Barrier Barrier Opaque Geom 1 Opaque Geom 1 Barrier Barrier SemaV SemaP Transparent Geom 2 Transparent Geom 1 Barrier Barrier Swap Inside a Rendering Server Client 1 Client 2

  40. Barrier Barrier Opaque Geom 1 Opaque Geom 1 Barrier Barrier SemaP SemaV Transparent Geom 1 Transparent Geom 2 Barrier Barrier Swap Inside a Rendering Server Client 1 Client 2

  41. Barrier Opaque Geom 1 Opaque Geom 1 Barrier Barrier SemaP SemaV Transparent Geom 1 Transparent Geom 2 Barrier Barrier Swap Inside a Rendering Server Client 1 Client 2

  42. Barrier Opaque Geom 1 Opaque Geom 1 Barrier Barrier SemaP SemaV Transparent Geom 1 Transparent Geom 2 Barrier Barrier Swap Inside a Rendering Server Client 1 Client 2

  43. Opaque Geom 1 Opaque Geom 1 Barrier Barrier SemaP SemaV Transparent Geom 1 Transparent Geom 2 Barrier Barrier Swap Inside a Rendering Server Client 1 Client 2

  44. Opaque Geom 1 Barrier Barrier SemaP SemaV Transparent Geom 1 Transparent Geom 2 Barrier Barrier Swap Inside a Rendering Server Client 1 Client 2

  45. Opaque Geom 1 Barrier SemaP SemaV Transparent Geom 1 Transparent Geom 2 Barrier Barrier Swap Inside a Rendering Server Client 1 Client 2

  46. Opaque Geom 1 Barrier SemaP SemaV Transparent Geom 1 Transparent Geom 2 Barrier Barrier Swap Inside a Rendering Server Client 1 Client 2

  47. Barrier SemaP SemaV Transparent Geom 1 Transparent Geom 2 Barrier Barrier Swap Inside a Rendering Server Client 1 Client 2

  48. SemaP SemaV Transparent Geom 1 Transparent Geom 2 Barrier Barrier Swap Inside a Rendering Server Client 1 Client 2

  49. SemaV Transparent Geom 1 Transparent Geom 2 Barrier Barrier Swap Inside a Rendering Server Client 1 Client 2

More Related