540 likes | 555 Views
Join Jonathan Blow at the Game Developers Conference on October 21, 2002 in Seoul, Korea as he discusses various 3D techniques including LOD management, triangle strip generation, vertex cache optimization, normal map generation, and ordered rendering. He will provide a skeptical review of these techniques and offer insights on using them effectively in game development.
E N D
Making the PiecesFit Together Jonathan Blow Game Developers Conference Reception October 21, 2002 Seoul, Korea
3D Techniques I Will Discuss • Level-of-Detail Management (LOD) • Triangle Strip Generation • Vertex Cache Optimization • Normal Map Generation • Ordered Rendering (sorted output geometry)
How I will discuss them • You can read about these techniques on the internet: hardware vendor sites, programmer hobbyist sites. There is a lot of hype. • Most of this stuff is not written by people actually making ambitious games (they’re busy!). • Most of it is ill-advised. • I want to provide a hype-free, skeptical review.
Lecture in Three Parts • Part 1: A Sense of Perspective • What is 3D rendering for games, today? • Part 2: The Techniques • Explained by a Skeptic… • Part 3: Making Games • How to use 3D techniques without going out of business or building a horrible game.
3D rendering for games is a complicated subject • Partially because we have accomplished a lot • Recent demos, and a few games, are graphically very impressive • (but the demos look much better than the games – why is that??) • The games all draw worlds by projecting a bunch of triangles onto the screen.
Primary Rendering Paradigm • Projecting triangles – but very fancy triangles • Texture maps, normal maps, complex lighting • Alternative representations exist • NURBS, N-Patches, subdivision surfaces • These are used in preprocesses, translated into triangles for the realtime pipeline.
Why have triangles dominated? They are simple and robust.
Suppose we’re inventingrealtime rendering from scratch • Project every point of solid object to the screen, use depth buffer • We waste a lot of resources drawing everything inside the solid, which will inevitably be hidden! • Cull out interior points (same result) • Now we have a bunch of solid 2D shells to draw, but each still has a large number of points • We want a more compressed way to represent 2D subsets of 3D
Introducing the Triangle • The triangle is the simplest way to denote a closed region of 2D space.
Start with a point • We have a 0-dimensional space P
Define one more point • Suddenly we have a 1D space! P + t(Q-P) • That is a lot bigger than 0D. Q P
Add a third point • Now we have a 2D space! P + t(Q-P) + s(R-P) • In a way, the concept of a triangle is the same as the concept of two dimensions. R Q P
The linearity of the triangleis tremendously useful! • Easy to: • Interpolate • Clip • Intersection test • Bounding volume • Linear equations are the most basic and well-understood kind (see, for example, linearizing differential equations!) • If you are doing something unconventional, the triangle probably won’t get in your way.
Higher-order surfaces causemore problems. • Clipping a curved surface is annoying. • Bounding volumes are also annoying. • The offset of a Bezier surface is not a Bezier surface • So what happens if spline parameters are your base representation, and you need to offset? Green surface is a spline Red is not
Among linear polygons,triangles are the simplest. • Quads can be noncoplanar (vertex lighting will fail!) • Pipeline must handle primitives of varying vertices • Games had brief dalliances with quads / n-gons around 1996, but nobody uses them any more to represent general geometry.
In Summary • The impressiveness of our current graphics techniques depends on us being able to draw a lot of triangles.
Question: “So how do I draw a lot of triangles?” (Answer: “very carefully.”) Part 2:The Techniques
Rendering Techniquesthat people like to hear about…but first: • There are two basic kinds of 3D techniques • #1: We would think about them if we had infinitely fast hardware (e.g. projective transform, BRDF) • #2: The kind we only care about because hardware is slow • Type #2 usually introduces complications, and we need to manage those complications
Drawing a lot of triangles:Reduce Data Size • Fancy Triangles = big vertices (60 bytes each) • XYZ position (12 bytes) • Texture UV coordinates (8 bytes) • RGBA color (4 bytes) • Tangent frame (36 bytes; maybe smaller) • 180 bytes per triangle if you just list vertices! (5000 triangles = 900 kbytes) • This makes the hardware run slowly
Indexed Triangle List • A mesh has a lot of shared vertices • Put the vertices into an array • The triangles are described by indices into this array • Shrinks total amount of data • F = 2V ; S0 = 3kF; S1 = kV + 3iF; S1 – S0 = V(5k – 6i) Bonus: Separates topology from position data 3 2 4 0 1 0, 1, 2 1, 2, 3 1, 3, 4
Triangles in a mesh sharenot only vertices, but edges too 3 3 3 2 2 2 4 4 0 1 1 0 1 1 0, 1, 2 1, 2, 3 3, 1, 4
Triangle Strips • We can compress a list of indexed triangles by forming “strips” that run along the shared edges. 6 4 012, 123, 234, 345, 456 2 5 012, 3, 4, 5, 6 3 0 1
Cost analysis of triangle stripsis often somewhat wrong • 3 indices for the 1st triangle, 1 for each thereafter • Incomplete because there also needs to be a way to delimit strips 3 strips: 01234, 567, 89241 Index buffer: 0123456789241 But where do they start and end?
Delimiting Triangle Strips 3 strips: 01234, 567, 89241 • Explicitly add numbers to describe strip length • DirectX8-style separate API calls (impact on CPU usage, AND adds numbers behind the scenes) • Strips start out worse than lists, and have to catch up… the longer the strip, the better you catch up Index buffer: 5012343567589241 Index buffer: 0123456789241 DrawIndexedPrimitive 0, 5DrawIndexedPrimitive 5, 3DrawIndexedPrimitive 8, 5 Output stream: 5012343567589241
Because triangle strips are limited, we need to add swaps 6 6 5 4 4 2 2 5 3 3 0 0 1 1 012, 3, 4, 5, 6 012, 3, 2, 4, 5, 6 012, 123, 234, 345, 456 012, 123, 232, 324, 245, 456
Triangle Strip Efficiency • Depends on strip length, which depends on your data • It takes a complicated algorithm to make good strips. 4 strips, 40 indices 10 strips, 52 indices (no swaps yet)
Triangle Strip Skepticism • In a full game, performance numbers don’t necessarily validate triangle strips… we’ll see why • Strips make implementation complications • Even with perfect stripping, you only reduce index data (minority of total data) from 6iV to 2iV+2. You won’t have perfect stripping. • Degenerate triangles can cost you.
If you want to make a strip algorithm… • Most papers give you the basic idea, but are not very good in the end • Old SGI source code • STRIPE papers • You really want a non-greedy algorithm • Heuristics based on strip length and cache • Tunneling operator
Vertex Cacheand Vertex Shader • We want to cache vertex memory for fast access… • Vertex Shader is a small hardware program that runs for each vertex • Compute lighting, transform, skinning, etc • Hardware caches the results of the evaluated vertex shader • A cache miss means running the shader again • (More expensive than traditional CPU cache miss!) memory shader vertex cache
You want to order verticesby cache efficiency • Mostly use vertices you just used recently • But this conflicts with triangle strip efficiency! Can’t even do the red path in one triangle strip without inserting a teleport (very expensive!)
Vertex cache effectscan be dominant • Multi-pass rendering –you skin the guy multiple times, so shader is expensive! • Or do you skin on the CPU? • Now you begin to have a lot of optimization choices; these can determine who’s dominant • The “right answer” depends on your game and target platform
How do we resolve the conflict between strips and cache? • Maybe you write a triangle stripper that tries to deal with the vertex cache • Complicated to write, degraded results on both sides; Nvidia’s does this • Maybe you ignore vertex caching • Might be okay if your shaders are cheap • Maybe you ignore triangle strips, and just use triangle lists
Quirks of some architectures make strips better • Nvidia triangle setup (Xbox, etc) • Nvidia push buffer bottleneck also makes strips more effective.
Now… we need some kind of LOD • Because even perfect triangle strips / cache hits still draws way too many triangles… we need to go from O(n) to O(log n). • Several types of LOD available: • Dynamic (view-dependent): FORGET IT • Static mesh switching (simple) • Progressive mesh (best algorithm: VIPM)
View Independent Progressive Mesh • Collapse vertices due to base-plane error metric. • Generate one sequence of collapses that takes us from high-res to low-res. • Popping in VIPM is subtle, which is good. • VIPM draws fewer triangles than static switching, since we usually push static switching away in Z to avoid popping.
Problem with VIPM • VIPM slides a window across the index buffer, doing fix-ups. • Need to sort vertices by LOD collapse order • This conflicts with strip / cache sorting • You can’t do all three at once (though you can do stripped VIPM or cache-sorted VIPM) index buffer fix-up record
Sorting Score Card(more items will be added here) • Triangle strip efficiency order • Vertex cache order • LOD collapse order (if VIPM)
Normal Map Generation • Approximate huge amounts of geometry by per-texel normals • Generate the maps by crunching a high-res mesh down onto a low-res one… • When rendering, transform texture normal by iterated tangent frame, and you get the normal of the high-res model (almost) • Object or tangent space?
Normal Map Generationinfluences LOD choice • With static switching, you just have an array of meshes • With VIPM, you are forced to use object-space normal maps, which probably don’t compress as well as tangent-space maps. • Normal mapping to a high-res model makes static mesh switching look better (much less popping… most popping was due to light)
More Sorting • To render quickly, we want to sort by render state (multiple materials on the same object means we break that object into several passes, decreasing triangle strip and vertex cache effectiveness) • To render quickly, we want to draw front-to-back (fast z-fail) • To render transparent things correctly, we need to draw those back-to-front (break these into a separate pass, decrease stripping and cache effectiveness) • We are robbing ourselves of the benefits we got earlier… so hopefully we didn’t pay very much for them (more on this later)
Sorting Score Card • Triangle strip efficiency order • Vertex cache order • LOD collapse order (if PM) • Sort by shader • Front-to-back (opaque things) • Back-to-front (translucent things)
How do you LOD a guy with multiple materials? • Materials usually done by one pixel / vertex shader pair, per material • Can only combine triangles so much (can’t cross material boundary) • Can’t combine textures into one (lose lighting effects) • Everybody just kind of punts… this is an important problem to solve for the future.
Trade-Offs • As computer scientists and engineers we are accustomed to the idea of engineering trade-offs (time for space, etc) • Must consider code complexity to be a FINITE RESOURCE that can be traded with time, space, etc.
Complexity as Resource • Every extra line of code or ‘if’ statement must be maintained through the life of the project and must interact with new features • IMPORTANT: most new features are not orthogonal; they will FIGHT with your existing code. • You only have so much complexity to spend over the course of your project; too much and your project will fail.
Cultural Problem • At least in America, many programmers try to prove themselves by doing complicated, impressive-sounding things. • Try to make 3D engine that is the “next big cool thing” • The successful paths of the past have been things that are NOT complicated (triangles are simple!) • Successful paths of the future will probably also be the simpler ones. So…
A Thought • If your engine / algorithms seem very complicated…. • …. they are unlikely to be on a path that history will make successful • They will NOT be the next big thing
Good Art and Levelsare more importantthan a good engine • If you are adding engine features that make it more difficult to create levels / content (without making the content a lot richer), this is probably a mistake Max Payne
Cost-Benefit Analysis • Don’t forget to account for opportunity cost … every minute you spend working on A is a minute not working on B • You need to be an economist, deciding how to get the most net worth out of the resources you have to spend. • YOU need to do it, not just the managers • It is a multiscale (fractal) phenomenon