90 likes | 227 Views
Path Profile Estimation and Superblock Formation. Jeff Pang Jimeng Sun. Motivation. Compile. Optimize. Run. Why Continuous Profiling? Continuous Optimization Dynamic Optimization Realistic Profiles. Profile. Challenges: Automated Low overhead Accuracy.
E N D
Path Profile Estimation and Superblock Formation Jeff Pang Jimeng Sun
Motivation Compile Optimize Run Why Continuous Profiling? • Continuous Optimization • Dynamic Optimization • Realistic Profiles Profile Challenges: • Automated • Low overhead • Accuracy Related Work:H. Chen, et al. Dynamic Trace Selection Using Performance Hardware Sampling. CGO, 2003.A. Shye, et al. Analysis of Path Profiling Information Gathered with Performance Monitoring Hardware. ICCA, 2005.
Goals Superblock Formation Run with Simulated PMU • Take advantage of modern Performance Monitoring Units • Like in Pentium 4, Itanium, PPC 970, etc. • Allows sampling of last couple branches • “Simulated” for our project using instrumentation • Estimate full path profile using samples • Validate by doing Superblock formation • Optimization to improve scheduling on VLIW processors • Path-based Superblocks based on Young (1997) Path Profile Sample Path Profile Estimation
Design Overview instrument (pmu sim) instrumented program • Implemented PMU simulator and Superblock optimization as SUIF passes • Implemented Estimator offline using sampled branch profiles and SUIF CFG source frontend superblock backend optimized program estimatedpath profile Offline estimator sampledprofile
Path Sampling Exact paths: ABDEG ACDFG A 50 50 B C • Exact path profile: • Accurate • But expensive • Edge profile • Inaccurate (due to the independence assumption) • Cheap • It is hard (impossible) to reconstruct the path information • Sampling path profile • Periodically sample 4 consecutive branches (branch trace buffer) • Cheap to collect and more accurate than edge profile 50 50 D Edge Profile: ABDEG ACDFG and ABDFG ACDEG 50 50 E F 50 50 G Sampling: {AB, DE} {AC, DF} => ABDEG ACDFG
Hot Path Formation • Sampling paths are short • Sampling paths => longer paths • Join 2 paths if they can merge into one simple path and the frequencies about both paths are large • e.g. 5000 ABCD, 4000 CDEF => 4000 ABCDEF
Path Estimation Accuracy • We compare the top 100 paths captured by the exact path profile and the estimated path profile • The success rate is Σest ∩ act cycleact / Σact cycleact
Superblock Formation A A • Creates larger regions to schedule over for hot paths A A B F B F A A A A C C C B B A A D G D D G A B E E E Tail Duplication Loop Unrolling Combinations
Superblock Performance • Performance results pending • Waiting for CASH simulator setup… • Superblock formation on P4 useless • Causes 0-5% slowdown on tested benchmarks (probably due to icache misses) • Need multi-issue architecture to see sched. benefits?