180 likes | 192 Views
Learn about the concepts of instancing and stream-out in game development, including how to optimize graphics performance, use instance buffers, and utilize vertex shaders for instancing. Discover how these techniques can be applied to rendering large numbers of similar models, such as particle systems.
E N D
Emerging Technologies for GamesInstancing / Stream-Out for Particles CO3301 Week 12
Today’s Lecture • Batch Performance Recap • Instancing • Particle Systems • Advanced Instancing • Particle Update using Stream-Out
Recap / Rationale • Changing graphics state carries an overhead • E.g. switching texture, shader, blending mode etc. • Each list of triangles sent to the GPU is called a batch and sending each batch carries an overhead • In addition to the time taken to render the triangles • So to optimise graphics performance: • Use a minimum of state changes • Maximise the batch size / reduce the number of batches per frame • These are similar goals • Saw methods to reduce state changes (bucket rendering) • But batch size increase is more difficult to achieve…
Instancing Overview • Instancing is a method to render many models or sprites in a single API draw call (and so a single batch) • We have previously rendered each model one at a time • Each model requiring its own batch • We send a list of instances with the vertex & index data • This list contains what is required to render each model • E.g. a list of positions, rotations, colours etc. • This removes per-model state changes • And allows for massively increased batch sizes • Huge numbers of models drawn in a single API call • Great performance increase – see graph from batching lecture
Instance Buffers / State • The instance data is stored on the GPU in an instance buffer (just an array of data, like a vertex/index buffer) • The simplest instance buffer might just contain a list of instance positions. • The model defined by the vertex/index data is rendered once at each position in this buffer - all in one batch • All other model state must be the same - textures, shader etc. • This state requirement for instancing can be an issue • E.g. Often cannot instance a model with multiple materials • However, we can instance its sub-meshes, e.g. render all the tyres on all the cars in a scene in a single batch…
Vertex Shaders for Instancing • Vertex shaders are often unusual when instancing, depending on what is stored in the instance buffer • For example if we only have a position for each instance, then each instance cannot have a specific rotation or scale. • So no need for a world matrix, just add the instance position to model vertex • Very common to store some per-instance data, and randomise other elements • E.g. render 1000 trees at positions given in the instance buffer, but randomise their y-rotation and scale for variety • The vertex shader for this would clearly be unusual
Instance Buffer Data • Of course we can store more than just position in an instance buffer to give each instance a different look: • Rotation(s), scaling • Or store an entire world matrix per-instance so we can use the normal vertex shader techniques • Although the instance buffer would be large – see issue below • Also can store colour, animation info etc. • Can also store more unusual data: • E.g. A seed value to randomise each instance • Or entity / particle data (such as velocity, or life) to allow the model to be updated on the GPU using stream-out – see later
CPU / GPU Instancing 1 • Simple instancing is processed using both CPU & GPU • The GPU will render the instances • The CPU will update the instances (movement etc.) • So the instance buffer must made available to both the CPU & GPU • The CPU updates the positions in the instance buffer • The updated buffer must be copied in full to the GPU • Any buffer that changes per-frame is called a dynamic buffer • So space is reserved for instance data in both CPU and GPU memory • The (slower) data bus between GPU and CPU is used per-frame
CPU / GPU Instancing 2 • This constant copying of the instance buffer between GPU and CPU means performance is lower than normal • Whenever using dynamic buffers, we should keep the buffer contents to a minimum • This is why we might not want to store a world matrix for each instance. Instead the data is often compressed: • World matrix uses too much memory, so instead: • Position only, position + rotation, compressed matrix… • Implies vertex shader may have to do additional work to derive the full instance data • Trade-off between shader speed and CPU->GPU copying speed
Using Instancing • Instancing suits the rendering of large numbers of similar models: • Armies, trees, vegetation, particle systems etc.
Instancing Example: Particle Systems • A particle system is a large number of small elements used to simulate a complex dynamic effect • Particles are all similar • Often camera-facing sprites • But can be models • Each particle moves fairly simply – i.e. update code is simple • Particle systems are an ideal candidate for instancing
Particle System Basics • Each particle stores rendering data such as: • Position, rotation, scale, colour, alpha • Each particle also requires data to update its position/rotation each frame: • Velocity, spin speed, scale/colour/ alpha change • Particles are spawned from emitters • Particles have a life time after which they die • There may be attractors, repulsors and other features added for system complexity / flexibility
Particle Systems – Instancing? • Naïve approach: • Store all data in CPU memory, update using the CPU • Create particle vertex/index buffer on GPU • GPU renders particles one at a time (send a world matrix each) • This is slow, massive batch overhead when many particles • Simple instancing approach • Store render data (position, rotation etc.) in an instance buffer • Shared/copied between CPU/GPU as discussed above • Store update data (velocity, spin speed) in CPU memory • Update particles using CPU then copy entire buffer to GPU • Render particles in one batch using instancing • Much faster, but still requires CPU/GPU copy
Sprite-based Particle Systems • Smart approach for camera-facing sprite particles: • A camera facing sprite can be stored as a single point (the centre) • Use the geometry shader to expand this point into a quad • Geometry shader will input one point and output 2 triangles • Since each particle is just a vertex, then the vertex buffer is effectively an instance buffer • We don’t need instancing at all in this case • Store all data for particle with vertex: position, colour, spin etc. • No instance or index buffer required • However, this will method can’t be used if the particles are models (e.g. rocks coming out of a volcano) • Then we actually need a full vertex/index buffer for the mesh • And an instance buffer to say where each one should be
Advanced Instancing • Instancing can look poor due to a lack of variety • So complex instancing techniques store more state • E.g. Animation data, texture offsets, material settings • Able to render models in different poses, with different textures and material tweaks • Ideal for crowds, complex vegetation etc. • More complex shaders can help here • Flexible “uber-shaders” that can render a variety of techniques in one shader using more advanced shader coding • Latest GPUs deal well with this kind of shader • But it is one area where GPU optimisation is critical (later lecture)
Particles without CPU/GPU Copy • Instancing can be slow due to the CPU update / copy • May be inevitable when instancing entities since we probably need to update them on the CPU anyway (AI, game logic) • But a major slowdown for huge particle systems • One simple workaround – avoid updates: • Each particle follows a mathematically defined path • Path is randomised with a random initial seed • E.g. Parabola with random initial velocity to create fountain • Don’t need particle position, it can be calculated each frame from the current time and an initial seed (stored in the instance data) • No position, nothing to update – can do it all on GPU • Drawback is inflexibility, paths always same • E.g a fountain can’t be affected by the wind
GPU Stream-Out for Particle Update • [Note: Stream output and instancing are separate techniques, only discussing them together since they both relate to particle systems] • DirectX10 supports stream output • Allows the GPU to output vertex data back into a vertex buffer instead of sending it on for rendering • Using stream output the GPU can be used to update particles or entities position, rotation etc. • Both render and update data is stored GPU only • Typically we “render” the models twice: • Pass 1: Render the models/particles using instancing or similar • Pass 2: Update models/particles with stream-out (no actual rendering)
Stream Output Considerations Stream-Output reads from a GPU buffer and writes back to one, but can’t output to same buffer that is being input from However, this is usually what we want to do Work around this by using double buffering Create two identical buffers of data Input from one, and output to the other Swap the buffers between frames Stream-out allows GPU only entities, which is especially effective for particles. Works especially well with the sprite-based particles technique shown on the earlier slide Vertex buffer, no instancing