1 / 9

Path Profile Estimation and Superblock Formation

Path Profile Estimation and Superblock Formation. Jeff Pang Jimeng Sun. Motivation. Compile. Optimize. Run. Why Continuous Profiling? Continuous Optimization Dynamic Optimization Realistic Profiles. Profile. Challenges: Automated Low overhead Accuracy.

iden
Download Presentation

Path Profile Estimation and Superblock Formation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Path Profile Estimation and Superblock Formation Jeff Pang Jimeng Sun

  2. Motivation Compile Optimize Run Why Continuous Profiling? • Continuous Optimization • Dynamic Optimization • Realistic Profiles Profile Challenges: • Automated • Low overhead • Accuracy Related Work:H. Chen, et al. Dynamic Trace Selection Using Performance Hardware Sampling. CGO, 2003.A. Shye, et al. Analysis of Path Profiling Information Gathered with Performance Monitoring Hardware. ICCA, 2005.

  3. Goals Superblock Formation Run with Simulated PMU • Take advantage of modern Performance Monitoring Units • Like in Pentium 4, Itanium, PPC 970, etc. • Allows sampling of last couple branches • “Simulated” for our project using instrumentation • Estimate full path profile using samples • Validate by doing Superblock formation • Optimization to improve scheduling on VLIW processors • Path-based Superblocks based on Young (1997) Path Profile Sample Path Profile Estimation

  4. Design Overview instrument (pmu sim) instrumented program • Implemented PMU simulator and Superblock optimization as SUIF passes • Implemented Estimator offline using sampled branch profiles and SUIF CFG source frontend superblock backend optimized program estimatedpath profile Offline estimator sampledprofile

  5. Path Sampling Exact paths: ABDEG ACDFG A 50 50 B C • Exact path profile: • Accurate • But expensive • Edge profile • Inaccurate (due to the independence assumption) • Cheap • It is hard (impossible) to reconstruct the path information • Sampling path profile • Periodically sample 4 consecutive branches (branch trace buffer) • Cheap to collect and more accurate than edge profile 50 50 D Edge Profile: ABDEG ACDFG and ABDFG ACDEG 50 50 E F 50 50 G Sampling: {AB, DE} {AC, DF} => ABDEG ACDFG

  6. Hot Path Formation • Sampling paths are short • Sampling paths => longer paths • Join 2 paths if they can merge into one simple path and the frequencies about both paths are large • e.g. 5000 ABCD, 4000 CDEF => 4000 ABCDEF

  7. Path Estimation Accuracy • We compare the top 100 paths captured by the exact path profile and the estimated path profile • The success rate is Σest ∩ act cycleact / Σact cycleact

  8. Superblock Formation A A • Creates larger regions to schedule over for hot paths A A B F B F A A A A C C C B B A A D G D D G A B E E E Tail Duplication Loop Unrolling Combinations

  9. Superblock Performance • Performance results pending • Waiting for CASH simulator setup… • Superblock formation on P4 useless • Causes 0-5% slowdown on tested benchmarks (probably due to icache misses) • Need multi-issue architecture to see sched. benefits?

More Related