370 likes | 540 Views
Using Loop Perforation to Dynamically Adapt Application Behavior to Meet Real-Time Deadlines. Henry Hoffmann, Sasa Misailovic, Stelios Sidiroglou, Anant Agawal and Martin Rinard CSAIL Massachusetts Institute of Technology Cambridge, MA 02139. Outline. Introduction/Motivation Problem
E N D
Using Loop Perforation to Dynamically Adapt Application Behavior to Meet Real-Time Deadlines Henry Hoffmann, Sasa Misailovic, Stelios Sidiroglou, Anant Agawal and Martin Rinard CSAIL Massachusetts Institute of Technology Cambridge, MA 02139
Outline • Introduction/Motivation • Problem • Solution: Loop Perforation • Loop Perforation • Finding Loops to Perforate • Controlling Perforation Dynamically • Experiments • Using Perforation to Adapt to Faults • Conclusion
Problem • Program is too slow • Misses real-time deadlines
Solution: Loop Perforation Perforate: to make a hole through an object or structure • Loop Perforation: • Do not execute all iterations • Skip some instead Profile Program Find loops that take the most time Perforate those loops for (i = 0; i < n; i++){ … } for (i = 0; i < n; i += 2){ … } • A Perforated Program: • Consumes fewer computational resources • Runs faster, consumes less energy, or both • Can meet its real-time deadlines!
Loop Perforation(cont’d) Q: Won’t perforation change the result? A: Yes, so we target applications that have a range of acceptable outputs Maintain Acceptable Quality of Service Increase Speed ? Perforate Don’t Perforate
Static vs. Dynamic Perforation • Static loop perforation • Speeds up an application for some QoS loss • Allows applications to be repurposed • E.g., a broadcast video encoder can be transitioned to video conferencing • Dynamic loop perforation • Allows full QoS unless something bad happens • When something bad happens system adapts to maintain speed • Determine which loops to perforate using profiling • Our implemented system supports both static and dynamic perforation, • this talk focuses on dynamic perforation
Outline • Introduction/Motivation • Problem • Solution: Loop Perforation • Loop Perforation • Finding Loops to Perforate • Controlling Perforation Dynamically • Experiments • Using Perforation to Adapt to Faults • Conclusion
A Perforating Compiler Responsibility of User Provided as input to the perforating compiler QoS bound – the maximum acceptable loss of QoS C/C++ Program Representative Inputs QoS Metric & Bound • Perforating Compiler • Maximizes speedup for QoS bound • Discards loops which cause: • Slow down • Unacceptable QoS loss • Dynamic errors in Valgrind Find costly loops Profile Program Analyze QoS Perforate Perforatable Loops • Result • Set of Perforatable Loops • Speedup application given QoS bound • Perforation may be dynamic This process is discussed in detail in: Misailovic, Sidiroglou, Hoffmann, Rinard. Quality of Service Profiling. To Appear, ICSE 2010
Use PARSEC Benchmarks to Test Approach *http://parsec.cs.princeton.edu/ • PARSEC Benchmarks* represent emerging workloads • We pick seven benchmark applications for which we can define QoS metric • x264 (H.264 video encoding) • bodytrack (human movement tracking) • swaptions (financial analysis) • ferret (content-based similarity search) • canneal (engineering – circuit place & route) • blackscholes (financial analysis) • streamcluster (online approx. of k-means) • We augment the benchmark suite with additional data sets and divide into • Training (about 25% of inputs) • Production (remaining 75% of inputs)
Dynamically Controlling Perforation Application Heartbeat API Heartbeat API • Application registers a heartbeat using Application Heartbeats API* • Runtime monitors heartbeat • Heartbeat too slow? • Increase perforation to trade QoS for increased performance • Heartbeat too fast? • Decrease perforation to reclaim QoS Loop 1 Runtime Monitor Loop 2 Perforation Selection Perforation Selection Loop i *Hoffmann, Eastep, Santambrogio, Miller, Agarwal. Application Heartbeats for Software Performance and Health. PPoPP 2010
Outline • Introduction/Motivation • Problem • Solution: Loop Perforation • Loop Perforation • Finding Loops to Perforate • Controlling Perforation Dynamically • Experiments • Using Perforation to Adapt to Faults • Conclusion
Evaluation Methodology • Two applications (from PARSEC benchmark suite): • x264 (media application performs H.264 video encoding) • bodytrack (computer vision application tracks a body through a scene) • Two changing environments: • Core Failure: During execution 3 of 8 cores fail • Frequency Scaling: During execution clock frequency rises and falls • For each app and scenario: • Goal: keep performance within .95 to 1.1x that of system with no failures • Measure: • Baseline performance (no failure) • Performance with failure and no perforation • Performance with failure and dynamic perforation
x264 Core Loss Experiment Lose 3 of 8 cores
bodytrack Core Loss Experiment Lose 3 of 8 cores
bodytrack Results (Core Failure) • Maintains track on head, chest, and legs despite loss of 37.5% of compute
x264 Frequency Scaling Experiment Frequency Rises (1.6 GHz → 2.53 GHz) Frequency Drops (2.53 GHz → 1.6 GHz)
bodytrack Frequency Scaling Experiment Frequency Rises (1.6 GHz → 2.53 GHz) Frequency Drops (2.53 GHz → 1.6 GHz)
bodytrack Results (Frequency Scaling) • Perforation allows app to maintain track while frequency is low. • When frequency rises again, high-quality track is reestablished.
Conclusion • Presented loop perforation • Speedup programs by making performance/QoS tradeoffs • Showed as much as 2x speedup for 5% degradation in QoS • Presented dynamic loop perforation • Allow system to detect performance loss and respond by perforating loops • Maintain performance in changing environment • Can respond to any environmental change that affects performance More detail on dynamic perforation available in: Hoffmann, Misailovic, Sidiroglou, Agarwal, Rinard. Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures. MIT-CSAIL-TR-2209-042. August, 2009.
Perforatable Loops in PARSEC Benchmarks Number of loops
x264 Uncompressed Video Frame Sequence Encoder Compressed Video Stream
Motion Estimation Reference Frame Current Frame ? All Perforated Loops Are In Motion Estimation Computation
x264 Loop Nest Sum of Hadamard transformed differences loop nest (computes match metric between cur and ref blocks) short temp[4][4]; for (i = 0; i < h;i += 4) { for (j = 0; j < w; j += 4) { element_wise_subtract(temp, cur, ref, cs, rs); hadamard_transform(temp, 4); value += sum_abs_matrix(temp, 4); } cur += 4*cs; ref += 4*rs; } return value;
Perforated x264 Loop Nest • Perforation Effect • New block match metric • Uses block with best match(as measured by metric) • New metric works fine Sum of Hadamard transformed differences loop nest (computes match metric between cur and ref blocks) short temp[4][4]; for (i = 0; i < h; i += 8) { for (j = 0; j < w; j += 8) { element_wise_subtract(temp, cur, ref, cs, rs); hadamard_transform(temp, 4); value += sum_abs_matrix(temp, 4); } cur += 4*cs; ref += 4*rs; } return value;
Why Not Just Skip Motion Estimation? Runs 6.8 times faster But encoded video is 3.55 times bigger!
bodytrack • Particle method • Annealing layers • Dispersed particles • Compute with particles
bodytrack • Next annealing layer • Particle dispersion affected by previous layer • Continue until done with annealing layers
bodytrack Loop for (i = 0; i < layers; i++) { disperse particles for layer do particle computation }
Perforated bodytrack Loop • Perforation Effect • Perform fewer annealing layers • Perform less work, finish faster for (i = 0; i < layers; i += 2) { disperse particles for layer do particle computation }
Other Perforated Loops in bodytrack • Concepts • bodytrack maintains probabilistic model of where body parts are in previous frame • Reads image data from 4 cameras • Performs image processing to get information about where it thinks body is in current frame • Computes probabilistic model for current frame • Many perforated loops in error calculations • Between probabilistic model from previous frame • And image data from current frame • Used to obtain probabilistic model for current frame
Perforated Image Quality Panning camera