Lengthening Traces to Improve Opportunities for Dynamic Optimization

Lengthening Traces to Improve Opportunities for Dynamic Optimization Chuck Zhao, Cristiana Amza, Greg Steffan, University of Toronto Youfeng Wu Intel Research Feb. 16, 2007 Interact-12, HPCA

Intel’s StarDBT Project • StarDBT • A Dynamic Binary Translation framework • Operates on traces, optimizes hot traces • Long term goal: Use StarDBT to allow legacy apps to exploit TM support • (NOT by automatically parallelizing legacy apps) • Allow speculative sequential optimizations • Use hardware TM’s checkpoint/restore • Problem: default traces are too small • TM overheads would overwhelm benefits Challenge: lengthening traces can be tricky

Trace Formation basic-block profile trace profile A A B B C C D D F E E F G G off-trace stub on-trace blocks Control flow that goes off-trace can be costly

Trade-offs when Lengthening Traces • Completion ratio: • likelihood of execution • staying on trace • percentage of execution • reaching trace tail side-exit ratio A B 5% D A F 5% B G 5% D 5% A F B 5% G 5% Tradeoffs: longer traces have more optimization opportunities longer traces have more side-exit branches D F 5% G 100% - 10% = 90% 100% - 25% = 75% completion ratio Sweet spot exits in between, can we find it?

Our Work So Far (i.e., this talk) • Lengthening traces while maintaining completion ratios • Through unrolling and straightening • A characterization of the impact on traces • length, completion ratio, unroll factor, … • Improving optimization opportunities on longer traces • Improve Local Value Numbering (LVN) hits • Measurement of impact on performance is pending • Performing on-the-fly actions by DBT system • Decisions made by instrumenting/sampling code online

Related Work • Binary Translation Systems • Dynamo • DynamoRIO • PIN • StarDBT • transparent translation • x86 legacy code • Trace Collection and Optimizations • Java JIT • Dynamo, DynamoRIO, Mojo • StarDBT • x86 binary level • MRET2 to improve trace formation • aggressive trace optimizations First full analysis of trace-lengthening issues for DBT systems

StarDBT Trace Types c dispatcher b d a self type other trace type elsewhere type

a a a Lengthening Traces Through Unrolling 90% 81% 72.9% a completion ratio: 90% Unrolling increases trace’s length, but reduces completion ratio

a a a a a Finding the Sweet-Spot Unroll Factor ... chosen by system designer given porig = 99% and ptarget = 90% Traces with 100% completion ratio: set N = 10

d Lengthening Traces Through Straightening c b b c We don’t yet implement/evaluate straightening

Evaluation

Distribution of Original Completion Ratios original completion ratio Original Completion Ratios Majority of hot traces have completion ratios in 90%-100%

Impact of Unrolling on Hot Trace Size 36% longer completion ratio Average Number of Instructions Select SPECIntCPU 2000 bmarks with MinneSpec input Lengthening increases hot trace size by more than 36%

How Much are Traces Unrolled? Target completion ratio Average Unroll Factor 1.38-1.58x Not unrolled Hot traces are unrolled on average by 1.38x or more

Average Completion Ratio After Lengthening 90% 80% <0.5%  70% completion ratio 60% 50% Completion Ratio 40% 30% 20% 10% Lengthening traces reduces completion ratio by < 0.5%

Impact of Lengthening on Optimizations

Local Value Numbering (LVN) • No need to build Control Flow Graph (CFG) • Partial info • No need to perform Data Flow Analysis (DFA) • Expensive, rely on CFG • Can be arranged into a single-pass scan • Ease of implementation • Relatively light weight algorithm • Performs three optimizations: • Common Subexpression Elimination (CSE) • Copy Propagation (CP) • Dead-Code Elimination (DCE) LVN is common in JIT optimizers

Ex: LVN On a Lengthened Trace Original Traces Lengthened Trace Optimized Trace … c = a + b d = a e = b … c3 = a1 + b2 d1 = a1 e2 = b2 f3 = d1 + e2 f3 = c3 d4 = x4 … … c = a + b e = b f = c d = x … DCE hit f = d + e d = x … CSE hit

LVN Hits Improvement (%) 35% 30% target completion ratio 25% 20% % Increase in LVN Hits 15% 10% 5% 10+% more LVN hits are available through lengthening

Ongoing Work • Complete DBT Optimization Framework • Evaluate speculative optimizations on long hot traces with high completion ratios • Automatically determine optimal transaction granularity • Use HTM to support trace-based speculative optimizations

cmp 90+% 10-% ld x=[y] … Control Speculation A Compiler Framework for Speculative Analysis and Optimizations: Lin et. al, PLDI 03 ld.s x = [y] if(c){ chk.s x, recovery next: … } recovery: ld x=[y] jmp next

cmp 90+% 10-% ld x=[y] … Use HTM to Support Trace-based Speculative Optimizations start_tx ld x = [y] if(c){ chk x, abort_tx … } commit_tx Use longer traces with high completion ratio as tx granularity HTM hardware support simplifies speculative optimization

Conclusion • Traces can be effectively lengthened • increase in trace size by 36+% • decrease completion ratio by less than 0.5% • Longer traces provide better opportunities for optimization • increase in LVN hits by 10%+

Q + A

Complete StarDBT Optimization Framework • X86 CISIC ISA • code patching won’t work • Really need a code generator and IR • Design + implement a low-level Runtime IR • close to hardware • capture + represent all necessary low-level info • easy to convert from/to machine code • easy to implement analysis and optimizations • Starting point • Dynamo IR • LLVM IR • GCC RTL • …

StarDBT Overall Structure

Trace Formation Heuristics • MRET: Most Recent Execution Tail • originally proposed by Dynamo • Trace head • loop head (backward branch target) • sampling counter reaches a certain threshold • Trace tail • satisfy certain trace-tail conditions • MRET2: 2-pass MRET • perform 2 independent MRET trace formation • intersect traces with common head

Traces and Hot Traces • Trace • MRET2 recognize trace heads • Trace tails satisfy certain conditions • Blocks in between become a trace • Hot Trace • Based on recognized Traces • Put in additional software counters • head: head counter • each early-exit branch: off-trace counters • sampling: hot-trace’s completion ratio

Lengthening Traces to Improve Opportunities for Dynamic Optimization

Lengthening Traces to Improve Opportunities for Dynamic Optimization

Presentation Transcript

Dynamic Batch Bayesian Optimization

Opportunities to Improve Service and Outcomes

Dynamic Optimization for Interactive Computing Systems

Mining Gigabytes of Dynamic Traces for Test Generation Suresh Thummalapenta

Dynamic Optimization and Learning for Renewal Systems

Lengthening Shampoo

Dynamic Optimization and Learning for Renewal Systems --

Techniques for lengthening

Dynamic Compilation and Optimization

Optimization in Dynamic Environments

Dynamic Binary Optimization

Dynamic Binary Optimization

TRACES:

Dynamic Optimization

Opportunities for optimization in digital

Dynamic Query Optimization

Passivity Approach to Dynamic Distributed Optimization for Network Traffic Management

Debunking Dynamic Optimization Myths

Dynamic Route Optimization

How to Improve Search Engine Optimization

Tips to Improve Conversion Rate Optimization

Planning using dynamic optimization