780 likes | 919 Views
Energy-Precision Tradeoffs in the Graphics Pipeline. Jeff Pool March 19 th , 2012. Motivation. Why energy? It matters everywhere: - Mobile devices - Desktop computers - Servers, data centers It’s a bottleneck to performance!.
E N D
Energy-Precision Tradeoffs in the Graphics Pipeline Jeff Pool March 19th, 2012
Motivation Why energy? It matters everywhere: - Mobile devices - Desktop computers - Servers, data centers It’s a bottleneck to performance! http://www.ornl.gov/ornlhome/images/casl/TVA%20Watts%20Bar.jpg http://img717.imageshack.us/img717/3936/1101771coolitomni.jpg
Motivation Why precision? Sign Exponent Mantissa IEEE 754-2008 Single-Precision Floating-Point Representation
Don’t do Unnecessary Work • Max precision isn’t needed: • 8-10 bit color buffers • FP32 => 24 bits of precision • Potentially lots of wasted effort! • It’s certainly more complicated, but worth exploring
My Approach Variable-precision computations - Reduce the precision when possible: 12.5 mantissa bits used - Save energy in arithmetic: 70% less energy - Low errors: 0.086% difference Full-Precision Arithmetic Reduced-Precision Arithmetic
My Approach Communicate fewer bits - Since fewer bits are used in computation - Most DRAM traffic is already compressed • Variable-precision compression: • (on sample frame) • Geometry improved by 12% • Depth improved by 83% Crysis, 2007
The Graphics Pipeline GPU Global Memory Texture Frame- Data Buffer Background
GPUs: A Brief History Programmability Capability Fixed-Function CUDA, Stream, OpenCL GPGPU (NOT to scale!) Time GPU Shader Program Compute Program 1.53, 32.8, …
Thesis Statement Reducing the work done in the modern graphics pipeline through novel communication and variable-precision computation techniques can enable a tradeoff between energy savings and image fidelity, leading to significant energy savings without perceptible loss of image quality.
How? Proving this thesis: • Show that induced errors are imperceptible • Show significant energy savings • Find energy consumed by entire pipeline • Find energy savings possible in each stage
Roadmap • My work • Energy model • Energy savings in computation • Energy savings in communication • Conclusions • Future work
Roadmap • My work • Energy model • Energy savings in computation • Energy savings in communication • Conclusions • Future work
Why an Energy Model? So I’ll know how much difference saving energy in different stages actually makes, know where to focus • Provides researchers/developers a tool to predict energy usage
Strategy • Model construction • Experimentally measure energy for each operation • Energy prediction • Profile a scene for operations performed • Predict total energy consumption (dot product) • Validation • Compare prediction with measured energy
What Operations? • Arithmetic • ADD, MUL, SIN/COS, POW, LOG, … • Memory • Local/Global Load/Store • Programmable • Vertex/Pixel Shaders • Fixed-function • Rasterization, Texture filtering Explicit Implicit
Measuring Energy in the GPU Explicit Implicit OpenGL Enable/Disable operation in question Difference in energy is the operation’s contribution Not as straightforward Ex.: Texture filtering • GPGPU • Runs on same hardware as graphics • No ambiguity in operations • Simple microkernels • Little/no overhead • 10s runtime • Directed tests per operation
Experimental Setup • NVIDIA 8300GS graphics card • Adex Electronics’ PEX16LX PCI riser to interrupt power from motherboard • Supply metered power to the card • 12V • 3.3V • 12V (fan, not counted in energy) • Log runtimes/framerates, measure current as tests run http://www.pretaktovanie.sk/obr/spotreba/eng/PICTURES/P1010283_ENG.jpg
Profiling Operations Performed • Use Microsoft’s PIX to log a frame of a running application: • Framebuffer contents • Vertex data • Render states • Vertex shaders • Pixel shaders • Per draw call (100-1000s per frame) • From all this data, extract operations
Validation • Three different applications, four scenes • Real-world games to test the developed model • Harvested data, predict energy usage • Measured real energy usage, compare Half Life 2: Lost Coast (High/Low Rendering Qualities) Batman: Arkham Asylum Mass Effect
Validation Results Overheads
Roadmap • My work • Energy model • Energy savings in computation • Energy savings in communication • Conclusions • Future work
Where Does the Power Go? Power CMOS Inverter Ground Ptotal = Pdynamic+ Pstatic
Energy-Saving Techniques Clock gating (Park et al., 2010) Signal gating (Huang and Ercegovac, 2003) Power gating • Coarse (Usami et al., 2009, Sjalander et al., 2005) • Fine (My work) Ptotal = Pdynamic+ Pstatic
Example: 1-Bit Adder !Enable Cin S A Cout B
HW Results SPICE simulations of: Adders: linear savings Multipliers: quadratic savings
Precision in Rendering Variable-Precision fixed-function CPU rendering • Hao and Varshney, 2001 • 3 key differences: GPU, FP32, programmability Depth buffer comparator • Hensley, Singh, and Lastra, 2005 Triangle separation for correct occlusion • Akeley and Su, 2006
So, we have hardware, let’s see what happens in Variable-Precision pixel shaders
Exaggerated Texture Coordinate Errors Original frame (24 mantissa bits) Blocky textures (8 mantissa bits)
Arithmetic Errors Original frame (24 mantissa bits) … Different? (8 mantissa bits)
Exaggerated Arithmetic Errors Original frame (24 mantissa bits) Clearly different (4 mantissa bits)
Different Errors,Different Tolerances • Colors can be pushed far lower • 12, 10, 8 bits for color components (plus one for rounding) • Texture coordinates may need to be fully precise!
So, Treat Them Separately Could contribute to texture coordinates A
So, Treat Them Separately Could contribute to texture coordinates A B Will NOT contribute to texture coordinates
Precision Selection Strategies • Statically • Artist-directed • Automatic closed-loop
Static Program Analysis And so on… 9 bits 10 bits 12 bits 11 bits 10 bits 9 bits
Artist-Directed Precisions Precisions are chosen as the effect is designed
Automatic Closed-Loop Precision Selection Run time feedback control Per-shader error detection and precision control Reduced Pixel Error Detection Reduced Pixel Display Renderer Full Pixel (sparsely sampled) Precision Error Controller
Experimental Setup Static analysis • Analyze shaders to find minimum safe operating precision Artist-directed • Modify several demo applications • Allow the artist to choose precisions Automatic closed-loop • Modify the ATTILA GPU simulator • Apply several feedback control schemes • Several test scenes
Results: Precisions Lower is Better!
Results: Closed-Loop Errors Unnoticeable in practice
Results: % Energy Savings Overall Energy: 2/31/5 Higher is Better!
Directed Approach • High savings • 70-80% in arithmetic • 10-20% overall GPU energy • (by arithmetic alone!) • Low errors • Acceptable by design • Quantitatively low (PSNR, % error)
Variable Precision Geometry • Vertex shaders • Similarly high savings (55-80%) • Different types of errors • XY Screen-space • Depth