1 / 24

Sudipta Chattopadhyay Lee Kee Chong Abhik Roychoudhury

Program Performance Spectrum. Sudipta Chattopadhyay Lee Kee Chong Abhik Roychoudhury. Insight. i = 0; while (i++ < 100) { if (x - y < 5) //low performance else //high performance } . x = 10, y = 8. x = 100, y = 1. x,y are inputs.

zelig
Download Presentation

Sudipta Chattopadhyay Lee Kee Chong Abhik Roychoudhury

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Program Performance Spectrum Sudipta Chattopadhyay Lee Kee Chong Abhik Roychoudhury LCTES 2013, Seattle

  2. Insight i = 0; while (i++ < 100) { if (x - y < 5) //low performance else //high performance } x = 10, y = 8 x = 100, y = 1 x,y are inputs • Varying performances for different input values • Performance bottlenecks might be exposed only • for selected inputs (e.g. x=10, y=8) LCTES 2013, Seattle

  3. Motivation ? Input Performance profiler Program Can we automatically generate such inputs? Performance bottleneck (e.g. hot-spots) LCTES 2013, Seattle

  4. Objective P3 P1 All possible combinations of x and y values P2 P4 P5 Input domain space (inputs: x,y) Incrementally compute a performance footprint of the program x –y > 5 -> performance P1 x- y < 2 -> performance P2 . . . LCTES 2013, Seattle

  5. System Architecture Cache performance Processor Cache RAM In a single chip In a typical modern processor, caches are 100 times faster than main memory LCTES 2013, Seattle

  6. Objective Cache miss Cache miss All possible combinations of x and y values Cache miss Cache miss Cache miss Input domain space (inputs: x,y) Incrementally compute a cache performance signature of the program x –y > 5 -> cache miss x- y < 2 -> cache miss . . . LCTES 2013, Seattle

  7. Defining a Partition Abstracts out loop iterations v m1,m2 conflict in cache Cache thrashing m1 v m2 Groups unbounded paths m1,m2 do not conflict in cache Program • - Follow same control flow edges • - But may vary in loop iterations LCTES 2013, Seattle

  8. Path Programs v v v v v v v v LCTES 2013, Seattle

  9. Encoding Partitions pred1 pred1  pred2 pred2 v v v v pred2  pred1 pred1  pred2 v v v v LCTES 2013, Seattle

  10. Encoding Partitions v v pred1  pred2 (pred1  pred2)  (pred2  pred1)  (pred1  pred2) • Initial encoding captures control flow • pred1  pred2 • Time Complexity = O(N2), N = number of basic blocks v v v v LCTES 2013, Seattle pred1  pred2 pred2  pred1

  11. Constructing a partition x = 4 x >= 0 x >= 0 v m1 pred1 pred2 m1 v m1 m2 Project Execute v pred1  pred2 m1 program Partition Execution trace x is an input Partitions are constructed on-the-fly LCTES 2013, Seattle

  12. Embedding Cache Performance Input P P’ Instrumentation Computes cache performance on-the-fly Output Cache Miss of P Input I: P(I) = Output of P P’(I) = Cache miss of P P’(I) exactly captures the cache miss suffered on P(I) LCTES 2013, Seattle

  13. Input I Program (P) x = 4 m1 Execute P’ Instrumentation v x >= 0 m1 pred1 pred2 Cache miss on P(I) v m1 m2 Encode v P’ pred1  pred2 (pred1  pred2)  (pred2  pred1)  (pred1  pred2) x >= 0 Project on P’ m1   (pred1  pred2) Analyze update LCTES 2013, Seattle

  14. Analysis of a partition • Analyzing cache behavior of a partition • Deriving a symbolic input condition that capture the partition P v v P’ x >= 0 x >= 0 pred1  pred2 pred1  pred2 f(miss) m1 m1 instrumented code Original code miss = out(P’) = cache miss of P LCTES 2013, Seattle

  15. Analysis of a partition Symbolic input condition Instrumented code x >= 0 P’ Static analysis x >= 0 f(miss) v Symbolic execution + Interval abstract domain min <= miss = out(P’) <= max m1 pred1  pred2 [min,max] = cache miss range for the path program pred1  pred2 miss = out(P’) = cache miss of P LCTES 2013, Seattle

  16. Implementation KLEE symbolic execution engine LLVM compiler Primary driver and instrumentations Generates symbolic input conditions Minisat satisfiability solver STP constraint solver Encodes partitions Generate inputs LCTES 2013, Seattle

  17. Output of framework Symbolic input condition Cache miss range 1(x, y) [min1, max1] 2(x, y) [min2, max2] 3(x, y) [min3, max3] 4(x, y) [min4, max4] Input domain space (inputs: x,y) LCTES 2013, Seattle

  18. Performance prediction Arbitrary input  1(x, y)   =  2(x, y)   =  1(x, y) [min1, max1] 3(x, y)   =  2(x, y) [min2, max2] 3(x, y) [min3, max3] 4(x, y) [min4, max4] Input domain space (inputs: x,y) Prediction = [min3, max3] LCTES 2013, Seattle

  19. Actual cache miss Prediction interval LCTES 2013, Seattle

  20. Performance testing & debugging Hotspots solve 2(x, y) 1(x, y) [min1, max1] 2(x, y) [min2, max2] 3(x, y) [min3, max3] 4(x, y) [min4, max4] Input domain space (inputs: x,y) LCTES 2013, Seattle

  21. Design space exploration LCTES 2013, Seattle

  22. Design space exploration Showing cache thrashing LCTES 2013, Seattle

  23. Remarks • Salient features: • Computes a cache performance signature of the program • Systematically groups paths based on cache performance • Usage in performance testing, debugging, prediction and design space exploration • Future work: • Consider other performance metrics • Consider interrupted programs LCTES 2013, Seattle

  24. Thank You LCTES 2013, Seattle

More Related