230 likes | 366 Views
CSL 859: Advanced Computer Graphics. Dept of Computer Sc. & Engg. IIT Delhi. Adrianne Demo. Skin shader 1,400 instructions per pixel 15 render passes Five bump maps Physically-based lighting with sub-surface scattering Three skin layers with different scattering properties.
E N D
CSL 859: Advanced Computer Graphics Dept of Computer Sc. & Engg. IIT Delhi
Adrianne Demo • Skin shader • 1,400 instructions per pixel • 15 render passes • Five bump maps • Physically-based lighting with sub-surface scattering • Three skin layers with different scattering properties. • Complex anisotropic hair shader • Real geometry • GPU-accelerated character skinning • Blendshapes • Sculpt deformers • Skeletal-driven bump maps
Transform Light Clip Setup Framebuffer Rasterize Blend Texture Z-test Graphics Pipeline Geometry Picture
Framebuffer Graphics Pipeline Vertex Connectivity Textures Texture Vertex Shader Clip & Setup Primitive Assembly Rasterize Blend Raster OPs Fragment Shader Picture
Bottlenecks • Too many operations • Parallelize • Too many memory accesses • Parallelize SCREEN TILE FRAGMENT OPERATIONS GEOMETRY OPERATIONS XBAR SCREEN TILE SCREEN TILE
Parallelization • Distribute computation to processors • Work allocation • Distribute texture to memory banks • Tile Screen-pixels into memory banks • Do all processors have access to all memory • Distribute access/Replicate data
Sorting Taxonomy • Sort first • Allocate to processor, which is responsible for only a given area of the screen • Sort middle • Optimally perform geometry ops and then distribute to the responsible processor • Sort last • No-screen subdivision. • Optimally perform geometry and fragment ops and then compose results
Memory Considerations • Highly pipelined • Guard against stalls • Memory bandwidth • How many accesses per second? • Latency • Latency hiding buffers • Larger memory atoms • e.g., 32 byte atoms
Graphics Architecture: A Brief History • Evans & Sutherland • Ikonas • UNC Chapel Hill • Silicon Graphics (Mushroom: Smart VGA controllers) • nVIDIA, AMD
IKONAS • 32 bit data, 24 bit address bus backbone • Everything memory mapped • Host interface = address registers to access anything on the bus. • Frame buffer resolution and timing could be set via control registers. • Graphics processor • (micro)Programmable • 32 bit integer ALU and 16x16 bit integer multiplier • Address counters, Loop counters and • 64 bit instruction word. • Plug-in boards • 16 bit graphics processor with 16 pixel-at-once parallel write • microprogrammed 16x16 bit matrix multiplier • microprogrammed floating point matrix multiplier • hardware Z-buffer • real-time alpha-blend hardware for two RGB images • real-time RGB video frame grabber
Pixel-planes 5 1989 2 GPs per board 1 128x128 array per board Upto 32 GPs, i860, and upto 8 Renderers
Pixel-planes 5 Renderer 1 board had 64 mini-chips: Each with 2 columns of 128 pixel processors (w/memory)
Renderer • 64 chips of • 256 pixel processing elements (PE • Each PE has 208 bits of memory, the chip contains a • Quadratic expression evaluator (QEE) • Ax+By+C+Dx2+Exy+Fy2 simultaneously at each pixel
Basic Algorithm • Host app transmits model database and new frame requests to MGP • Screen divided statically into bins of 128x128 pixels • MGP allocates Renderers to screen regions • MGP broadcasts database commands to all GPs. • GPs generate Renderer commands for each prim • Commands inserted into appropriate bins • GPs send the bins Round-robin • The Renderers send computed pixels to the frame buffer.
SGI RealityEngine • Kurt Akely 1993: The implementation is near-massively parallel, employing 353 independent processors in its fullest configuration, resulting in a measured fill rate of over 240 million antialiased, texture mapped pixels per second. Rendering performance exceeds 1 million antialiased, texture mapped triangles per second.
RealityEngine Architecture Input FIFO, Command Processor 6, 8, or 12 Geom Engines 1, 2, or 4 raster boards 5 Fragment Generators (Each has texture replica) 80 Image Engines 1280x1024 Framebuffer 256 bits/pixel
RealityEngine Algorithm • FIFO geometry distributed by CP to GEs • GEs do geometry ops including setup • GEs broadcast triangles to FG (Raster) • Finely interleaved pixel assignment • FG distribute fragments to IE • IEs do raster ops • IEs are the framebuffer
RealityEngine GE FG IE
PC Architecture (Upto 2.5Gbps bi-directional per lane) PCI Express North Bridge MEM BUS CPU FSB South Bridge ATA BUS PCI BUS