350 likes | 475 Views
Discovering and Exploiting Program Phases. Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma. 400 Million Instructions. Non-Existent Processor. New Processor. New Compiler. Spec2000. Benchmark. Simulator. 400 Million Instructions.
E N D
Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma
400 Million Instructions Non-Existent Processor New Processor New Compiler Spec2000 Benchmark Simulator
400 Million Instructions • Suppose you have a time budget… • Less than half second of execution time • What would you simulate? • Beginning? • Middle? • End?
400 Million Instructions Programs exhibit diverse modes of behavior gzip gcc
400 Million Instructions • Suppose you have a time budget… • Less than half second of execution time • What would you simulate? • Beginning? • Middle? • End? • Samples of different modes of behavior
Program Phases • Observation: programs exhibit various modes of periodic behavior • These modes are program phases • Challenge: Extract these automatically
Phase Basics • Intervals – slices in times • Phases – intervals with similar behavior IPC Time (Instruction Count)
Phase Basics • Intervals – slices in times • Phases – intervals with similar behavior IPC Time (Instruction Count)
Defining “Similar Behavior” • Metric for comparing intervals? • Cache misses? • IPC? • Branch misprediction rates? • Problem: Performance alone is too architecture dependent
Defining “Similar Behavior” • Code path traversal • Directly affects time-varying behavior • Execute same code, same performance • Architecture independent • Metrics for code path traversal • Frequency of branches • Frequency of function calls • Frequency of basic block calls
Basic Block Vector B1 Time t 0 0 0 0 B1 B2 B3 B4 B2 B3 B4
Basic Block Vector B1 Time t 1 1 0 1 B1 B2 B3 B4 B2 B3 B4
Basic Block Vector B1 Time t 2 1 1 2 B1 B2 B3 B4 B2 B3 B4
Basic Block Vector B1 Time t 2 1 1 2 B1 B2 B3 B4 B2 B3 Time t + 1 0 0 0 0 B1 B2 B3 B4 B4
Basic Block Vector B1 Time t 2 1 1 2 B1 B2 B3 B4 B2 B3 Time t + 1 1 1 0 1 B1 B2 B3 B4 B4
Basic Block Vector B1 Time t 2 1 1 2 B1 B2 B3 B4 B2 B3 Time t + 1 2 2 0 2 B1 B2 B3 B4 B4 Manhattan Distance = |1 – 2| + |1 – 0| = 2 Euclidian Distance = sqrt((1 – 2)2 + (1 – 0)2) = sqrt(2)
Basic Block Similarity Matrix • gcc BBV similarity between intervals reflects performance similarity
Automatic Phase Classification • Classify intervals into phases • We do not know which BBVs correspond to particular phases a priori • k-means clustering • Iterative clustering algorithm • Dimension Reduction • Random Linear Projection • Try different k values • Use BIC to choose best
Automatic Phase Classification Clustering accurately distinguishes phases automatically
SimPoint • Simulate large programs on a budget • Perform detailed simulation on representative code snippets • Choose centroid interval from each phase (10 million instructions) • Extrapolate large program performance • Weighted by frequency of phase
SimPoint • Simulate 400 million instructions total Accurate estimate despite instruction budget
Why SimPoint Succeeds • Program behavior varies over time • SimPoint intelligently chooses which intervals to simulate • Regularity within program phases allows accurate extrapolation
Online Classification • Detect phases as program is running • Applications • Thread scheduling • Power management • Predicting future phases • Challenges • One pass of input • Limited storage
Online Classification High variance in metrics across full trace Low variance shows online classification succeeds in finding phases
Conclusions • Phases are a vital abstraction • Performance varies greatly w/in program • Attributable to different modes of behavior • Can discover phases automatically • Offline: k-means clustering • Online • Code path characterization • Strong correlation with actual performance • SimPointexploitsthis with great success
Outline • Introduction (motivate) • Basics (definitions, BBV, BBMatrix) • Offline Phase Classification • SimPoints • Online Phase Classification • Conclusions
Bayesian Information Criterion • Fit to Gaussians
Self-Modifying Code Self-modifying code 85o Program Phases