1 / 26

Analysis of Path Profiling Information Generated with Performance Monitoring Hardware

Analysis of Path Profiling Information Generated with Performance Monitoring Hardware. Alex Shye, Matt Iyer, Tipp Moseley, Dave Hodgdon Dan Fay, Vijay Janapa Reddi, Dan Connors University of Colorado at Boulder Department of Electrical and Computer Engineering

christmas
Download Presentation

Analysis of Path Profiling Information Generated with Performance Monitoring Hardware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Path Profiling Information Generated with Performance Monitoring Hardware Alex Shye, Matt Iyer, Tipp Moseley, Dave Hodgdon Dan Fay, Vijay Janapa Reddi, Dan Connors University of Colorado at Boulder Department of Electrical and Computer Engineering DRACO Architecture Research Group

  2. Introduction A 80 • Profile information is critical to success of optimizers • Point Profile - BBs count, edge profiles, etc. • Path Profile - correlated branches • Off-line Path Profiling Methods: • Use static/dynamic instrumentation to gather full path profile[Ball96][Joshi04][Bond05] • On-line Path Profiling Method: • Interpretation: MRET[Bala00][Bruen03] • Both incur high overhead!! • For run-time systems, overhead unacceptable 20 B C D 30 70 E F G Edge Profile: ABDFG 70-50 Path Profile: ABDFG 60 ACDFG 10 …

  3. Performance Monitoring • Modern processors contain on-chip Performance Monitoring Units(PMUs) • Itanium, Pentium 4, Power PC support branch vectors • Sampling PMU • Less information • Non-deterministic, phase behavior • Branch Execution Information • Itanium-2 PMU Branch Trace Buffer(BTB) - up to four branches • Different configurations: Last-4 branches, Last-4 taken branches, etc • Compiler can expand this information

  4. PMU-based Path Profiling BTB Trace • Goal: Combine compiler analysis and PMU branch vectors to generate a path profile • In order for PMU-based path profiling to effective, it must to comparable to a full path profile ex. Ball Larus PP[Ball96] • Other forms of PMU-based profile information have been shown to be effective at run-time optimization - ADORE[Chen03][Lu04] Hot Path

  5. Hardware Profiling Approaches • Proposed Techniques: • BTB profile buffer [Conte94] • OS coupled with BTB hardware to fill out an edge profile • Hot Spot Detection[Merten00] • Proposed Branch Behavior Buffer to store branch information to fill out edge profile • Programmable Path Profiler [Vaswani05] • Hardware Path Stack and Path Detector • Performance Monitoring Unit Techniques • Continuous Profiling/Optimization Systems • Simple PMU - event counters • ADORE Dynamic Optimizer [Chen03][Lu04] • Sampling Itanium-2 PMU to drive memory optimizations

  6. Motivation Characteristics of the Ideal Run-time Profiler Accuracy - Ability to reflect run-time execution well Single-Stage - Can profile binary on-the-fly without extra compilation stages Low Overhead - Incurs little to no overhead • Unfortunately, most existing techniques are only able to accomplish one or two of these. • This project aims to combine the accuracy of path profiling with low-overhead utilizing existing performance monitoring hardware.

  7. Itanium-2 PMU Path Profiling Compiler-Aided Offline Analysis Processor • 2 Phases • Online • BTB Trace Collection • Offline • Partial Path Creation • Region Formation • Path Profile Generation • Path Matching • Path Crediting BTB Traces … PMU Partial Paths Path Matching/ Crediting PATHS! Region Formation Terminology BTB Trace: Series of addresses from BTB Partial Path: Path of ops in compiler IR Region: Single Entrance region in CFG Path: Complete path through a region

  8. BTB Trace Collection • BTB Trace: Sequence of four branches per sample • Configured to sample only taken branches • Allows for longer partial paths to be built • The not taken path is trivial to follow • BTB Trace placed into specialized hash table every sample • If BTB Trace exists, increment count • At the end of execution, BTB Traces and counts are dumped to a file

  9. Partial Path Creation • Partial Path: List of low-level IR ops • Partial Path Formation • Recreate path from BTB Trace • Partial Path weight = count • Perform Partial Path Extensions • Up until Join Point • Down until Branch Point Join Point BTB Trace Branch Partial Path from BTB Trace Extended Partial Path Branch Point

  10. Path Matching and Crediting A • Path Matching • Find list of all paths that contain partial path • Path Crediting • Distribute partial path weight equally among matched paths • Example: • Challenge: • Number of paths grows exponentially • Large control flow graphs present a problem B C D L E M N F G O H P Q I J R K S T U V W X Y

  11. Region 2 Region 1 Region 3 Region Formation A • We use region-based paths • Makes total # paths more manageable • Limits number of matching paths • Rules for Region R: • R must be single entry • R may not cross loop boundaries • Loop Regions created first • R may not cross function boundaries • Total # paths in R is limited by a threshold • R must be as large as possible • Side Effects of Region Formation • Partial Paths must be split at: • Loop boundaries • Function boundaries • Region boundaries B C D L E M N F G O H P Q I J R K S T U V W X Y

  12. Region 2 Region 1 Region 3 Path Generation Example A • Suppose we encounter these paths: • ABDLMOP • ABDEFHIK • Split into ABD, EFHIK • OPRSUVX B C D L E M N F G O H P Q I J R K S T U V W X Y

  13. Methodology • Experiments run on Itanium 2 • Developed tool using perfmon kernel interface and libpfm[perfmon] to interface with PMU • Benchmarks • Set of SPEC2000 benchmarks • Compiled with the OpenIMPACT Research Compiler[oicc] • Without aggressive profile-directed optimizations • Off-line analysis with OpenIMPACT module • Compared to full path profile gathered with a PIN path profiling tool

  14. Effect of Sampling Period • Knee of Overhead curve ~500K • Number of Unique Paths consistently grows as sampling period decreases • Levels off some between 50K and 100K

  15. Accuracy Results • Accuracy measured similar to Wall’s weight matching scheme[Wall91]

  16. Region 2 Region 1 Region 3 Incorrectly Detected Paths A • With our path crediting technique: • We can distinguish hot paths in a regions • May incorrectly detect hot paths in program • May be crediting cold paths enough for them to seem hot compared to rest of program B C D L E M N F G O H P Q I J R K S T U V W X Y

  17. Partial Path Length • Length of Partial Paths drops drastically from splitting on function on loop back edges

  18. Function Correlation • MANY partial paths cross function boundaries • Should use function correlation

  19. Multiple Runs • May be possible to use multiple runs to provide more accurate path profile data

  20. Future Work • Region Formation • Characterize quality of our regions • Important because no correlation between regions • Regions stretching across function boundaries • Noise Elimination • Crucial to removing false positives due to path crediting • Effects of Optimization • Find effects of superblocks, inlining, etc. on partial paths and accuracy of path profile

  21. Conclusion • We introduce rationale and initial data of PMU-based path profiling • PMU-based profiling shows promise • At Sampling Period = 5M cycles • ~85% accurate • ~1% overhead Questions?

  22. References [Bala00]V. Bala, E. Duesterwald and S. Banerjia. “Dynamo: A Trasparent Dynamic Optimization System” PLDI 2000. [Ball92]T. Ball and J.R. Larus. “Optimally Profiling and Tracing Programs” TOPLAS 1992. [Ball96]T. Ball and J.R. Larus. “Efficient Path Profiling” MICRO-29, 1996. [Bond05] M.D. Bond and K.S. McKinley. “Practical Path Profiling for Dynamic Optimizers”, CGO 2005. [Bruen03]D. Bruening, R. Garnett and S. Amarasinghe. “An Infrastructure for Adaptive Dynamic Optimization” CGO 2003. [Chen03]H. Chen, W.C. Hsu, J. Lu, P.C. Yew and D.Y. Chen. “Dynamic Trace Selection Using Performance Monitoring Hardware Sampling” CGO 2003. [Conte94]T.M. Conte, B.A. Patel and J.S. Cox. “Using Branch Handling Hardware to Support Profile-Driven Optimization” MICRO-27, 1994.

  23. References (cont) [Intel04]Intel, “Intel Itanium 2 Processor Reference Manual: For Software Development and Optimization” May 2004. [Joshi04]R. Joshi, M.D. Bond and C. Zilles. “Targeted Path Profiling: Lower Overhead Path Profiling for Staged Dynamic Optimization Systems” CGO 2004. [Kistler01]T. Kistler and M. Franz. “Continuous Program Optimization” IEEE Trans. On Computers v50 no6 June 2001. [Lu04]J. Lu, H. Chen, P.C. Yew and W.C. Hsu. “Design and Implementation of a Lightweight Dynamic Optimization System” Journal of ILP 6, 2004 [Merten00]M.C. Merten, A.R. Trick, E.M. Nystrom, R.D. Barnes, and W.W. Hwu. “A Hardware Mechanism for Dynamic Extraction and Relayout of Program Hot Spots” ISCA 2000. [oicc] http://gelato.uiuc.edu [pin] http://rogue.colorado.edu/Pin

  24. Extra Slides

  25. ADORE Trace Selection BTB Trace Hot Path • Goal: Gather hot traces with many cache misses to add pre-fetches • However, hot traces may not be enough to detect full hot paths • Compiler can perform further analysis • Correlate BTB based traces into longer paths Sample PMU last 4 taken branches Itanium 2 Branch Trace Table

  26. Partial Path Characteristics Function Boundaries Spanned Average Partial Path Lengths • Partial Path extensions increase length ~20% • However, splitting drastically decreases lengths • ~30% on function boundaries, ~20% more on loop back edges • Many paths span 1 or more function boundaries • Indicates a great amount of function correlation is being thrown away

More Related