240 likes | 357 Views
Program Performance Spectrum. Sudipta Chattopadhyay Lee Kee Chong Abhik Roychoudhury. Insight. i = 0; while (i++ < 100) { if (x - y < 5) //low performance else //high performance } . x = 10, y = 8. x = 100, y = 1. x,y are inputs.
E N D
Program Performance Spectrum Sudipta Chattopadhyay Lee Kee Chong Abhik Roychoudhury LCTES 2013, Seattle
Insight i = 0; while (i++ < 100) { if (x - y < 5) //low performance else //high performance } x = 10, y = 8 x = 100, y = 1 x,y are inputs • Varying performances for different input values • Performance bottlenecks might be exposed only • for selected inputs (e.g. x=10, y=8) LCTES 2013, Seattle
Motivation ? Input Performance profiler Program Can we automatically generate such inputs? Performance bottleneck (e.g. hot-spots) LCTES 2013, Seattle
Objective P3 P1 All possible combinations of x and y values P2 P4 P5 Input domain space (inputs: x,y) Incrementally compute a performance footprint of the program x –y > 5 -> performance P1 x- y < 2 -> performance P2 . . . LCTES 2013, Seattle
System Architecture Cache performance Processor Cache RAM In a single chip In a typical modern processor, caches are 100 times faster than main memory LCTES 2013, Seattle
Objective Cache miss Cache miss All possible combinations of x and y values Cache miss Cache miss Cache miss Input domain space (inputs: x,y) Incrementally compute a cache performance signature of the program x –y > 5 -> cache miss x- y < 2 -> cache miss . . . LCTES 2013, Seattle
Defining a Partition Abstracts out loop iterations v m1,m2 conflict in cache Cache thrashing m1 v m2 Groups unbounded paths m1,m2 do not conflict in cache Program • - Follow same control flow edges • - But may vary in loop iterations LCTES 2013, Seattle
Path Programs v v v v v v v v LCTES 2013, Seattle
Encoding Partitions pred1 pred1 pred2 pred2 v v v v pred2 pred1 pred1 pred2 v v v v LCTES 2013, Seattle
Encoding Partitions v v pred1 pred2 (pred1 pred2) (pred2 pred1) (pred1 pred2) • Initial encoding captures control flow • pred1 pred2 • Time Complexity = O(N2), N = number of basic blocks v v v v LCTES 2013, Seattle pred1 pred2 pred2 pred1
Constructing a partition x = 4 x >= 0 x >= 0 v m1 pred1 pred2 m1 v m1 m2 Project Execute v pred1 pred2 m1 program Partition Execution trace x is an input Partitions are constructed on-the-fly LCTES 2013, Seattle
Embedding Cache Performance Input P P’ Instrumentation Computes cache performance on-the-fly Output Cache Miss of P Input I: P(I) = Output of P P’(I) = Cache miss of P P’(I) exactly captures the cache miss suffered on P(I) LCTES 2013, Seattle
Input I Program (P) x = 4 m1 Execute P’ Instrumentation v x >= 0 m1 pred1 pred2 Cache miss on P(I) v m1 m2 Encode v P’ pred1 pred2 (pred1 pred2) (pred2 pred1) (pred1 pred2) x >= 0 Project on P’ m1 (pred1 pred2) Analyze update LCTES 2013, Seattle
Analysis of a partition • Analyzing cache behavior of a partition • Deriving a symbolic input condition that capture the partition P v v P’ x >= 0 x >= 0 pred1 pred2 pred1 pred2 f(miss) m1 m1 instrumented code Original code miss = out(P’) = cache miss of P LCTES 2013, Seattle
Analysis of a partition Symbolic input condition Instrumented code x >= 0 P’ Static analysis x >= 0 f(miss) v Symbolic execution + Interval abstract domain min <= miss = out(P’) <= max m1 pred1 pred2 [min,max] = cache miss range for the path program pred1 pred2 miss = out(P’) = cache miss of P LCTES 2013, Seattle
Implementation KLEE symbolic execution engine LLVM compiler Primary driver and instrumentations Generates symbolic input conditions Minisat satisfiability solver STP constraint solver Encodes partitions Generate inputs LCTES 2013, Seattle
Output of framework Symbolic input condition Cache miss range 1(x, y) [min1, max1] 2(x, y) [min2, max2] 3(x, y) [min3, max3] 4(x, y) [min4, max4] Input domain space (inputs: x,y) LCTES 2013, Seattle
Performance prediction Arbitrary input 1(x, y) = 2(x, y) = 1(x, y) [min1, max1] 3(x, y) = 2(x, y) [min2, max2] 3(x, y) [min3, max3] 4(x, y) [min4, max4] Input domain space (inputs: x,y) Prediction = [min3, max3] LCTES 2013, Seattle
Actual cache miss Prediction interval LCTES 2013, Seattle
Performance testing & debugging Hotspots solve 2(x, y) 1(x, y) [min1, max1] 2(x, y) [min2, max2] 3(x, y) [min3, max3] 4(x, y) [min4, max4] Input domain space (inputs: x,y) LCTES 2013, Seattle
Design space exploration LCTES 2013, Seattle
Design space exploration Showing cache thrashing LCTES 2013, Seattle
Remarks • Salient features: • Computes a cache performance signature of the program • Systematically groups paths based on cache performance • Usage in performance testing, debugging, prediction and design space exploration • Future work: • Consider other performance metrics • Consider interrupted programs LCTES 2013, Seattle
Thank You LCTES 2013, Seattle