1 / 37

Trace Fragment Selection within Method-based JVMs

Trace Fragment Selection within Method-based JVMs. Duane Merrill Kim Hazelwood. VEE ‘08. Overview. Would trace fragment dispatch benefit VMs with JITs? Fragment-dispatch as a feedback-directed optimization Why? Improve VM performance via better instruction layout Overview

tilly
Download Presentation

Trace Fragment Selection within Method-based JVMs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trace Fragment Selection within Method-based JVMs Duane Merrill Kim Hazelwood VEE ‘08

  2. Overview • Would trace fragment dispatch benefit VMs with JITs? • Fragment-dispatch as a feedback-directed optimization • Why? • Improve VM performance via better instruction layout • Overview • Motivation • New scheme for trace selection • Viability in JikesRVM • Evaluate opportunities for code improvement • Evaluate trace selection overhead

  3. Traditional VM Adaptive Code Generation Phase 3: More Advanced JIT Compilation Update Class/TOC dispatch tables, perform OSR Phase 2: JIT Method compilation Compilation Shape: Source Method Dispatch Shape: Corresponding MC Code Array & Machine Code Trace Fragment Phase 1: Interpreter Compilation Shape: Source Instruction Dispatch Shape: Corresponding MC Instruction(s) Machine Code Trace Fragment

  4. SDT/ DBI/ Embedded VM Adaptive Code Generation Phase 3: More Advanced JIT Compilation Update Class/TOC dispatch tables, perform OSR Phase 2: JIT Method compilation Compilation Shape: Source Method Dispatch Shape: Corresponding MC Code Array& Machine Code Trace Fragment Phase 1: Interpreter Compilation Shape: Source Instruction Dispatch Shape: Corresponding MC Instruction(s) Machine Code Trace Fragment

  5. Proposed VM Adaptive Code Generation Phase 3: More Advanced JIT Compilation Update Class/TOC dispatch tables, perform OSR Phase 2: JIT Method compilation Compilation Shape: Source Method Dispatch Shape(s): Corresponding MC Code Array & Machine Code Trace Fragment Phase 1: Interpreter Compilation Shape: Source Instruction Dispatch Shape: Corresponding MC Instruction(s) Machine Code Trace Fragment

  6. Trace Fragment Dispatch • Trace • A specific sequence of instructions observed at runtime • Span: • Branches • Procedure calls and returns • Potentially arbitrary number of instructions • Trace Fragment • A finite, linear sequence of machine code instructions • Single-entry, multiple-exit (viz. superblock) • Cached, linked foo() A B C bar() D M N O E P A B D M O P E to C to N

  7. Trace Fragment Dispatch: The Good • Location, Location, Location • “Inlining-like”: • Context sensitive • Partial • Spatial locality provides most of achieved speedup • Simple, low-cost “local” optimizations • Redundancy elimination • Nimbly adjusts to changing behavior • Efficient • Lots of early-exits? Discard fragment and re-trace foo() A B C bar() D M N O E P A B D M O P E to C to N

  8. Trace Fragment Dispatch: The Bad foo() A B C bar() • Lacks optimization power • Data flow analysis • Code motion & loop optimizations • Code expansion • Tail duplication • Exponential growth (if all paths maintained indefinitely) D M N O E P A B D M O P E to C to N

  9. Trace Fragment Dispatch: The Bad foo() A B C bar() • Lacks optimization power • Data flow analysis • Code motion & loop optimizations • Code expansion • Tail duplication • Exponential growth (if all paths maintained indefinitely) D M N O E P A B D M O P E to C to N C D M O P E to A to N

  10. Trace Fragment Dispatch: The Bad foo() A B C bar() • Lacks optimization power • Data flow analysis • Code motion & loop optimizations • Code expansion • Tail duplication • Exponential growth (if all paths maintained indefinitely) D M N O E P A B D M O P E to C to N C D M O P E to A to N N P E to A

  11. Supplement Method Dispatch with Trace Dispatch • Why? • Improve VM performance via better instruction layout • Easily-disposable fragments reflect current program behavior • How? • JIT compiler inserts instrumentation into method code arrays: • Monitor potential “hot trace headers” • Record control flow • VM runtime assembles & patches trace fragments: • Blocks “scavenged” from compiled code arrays • Conditionals adjusted for proper fallthoughs • Method code arrays patched to transfer control to fragments • New fragments linked to existing fragments

  12. Easy Fragment Management • Improved trace selection • JIT to identify trace starting • VM to determine trace stopping locations • “Friendly” encoding of instructions • Patch spots built-in • Avoid pesky PC-relative jumps (e.g., switch statements) • Knowledge of language implementation features: • Calling conventions • Stack layout • Virtual method dispatch tables

  13. Efficient Fragment Management • “Mixed-mode” scheme: • Execution in both method code arrays & trace fragments • Share the same register allocation • Control flows off-trace into method code arrays • Fewer trace fragments • Manageable code expansion • JVM control is already built into yield points • Disposable trace fragments • No need to redo expensive analysis as behavior changes

  14. Our Work: Trace Fragment Selection • Develop new trace selection methodology • Leverage JIT global analysis, VM runtime • Implement trace selection in JikesRVM and evaluate viability • Do recorded traces indicate room for code improvement? • Do the traces exhibit good characteristics? • Is instrumentation overhead reasonable?

  15. Improved Trace Selection: Starting Locations foo() A B C bar() • Loop Header Locations • Identified by JIT loop analysis • More accurate than “target of backward branch” heuristic • “Early exit” blocks • Allows trace fragments to be “layered” • Method prologue • Catches recursive execution D M N O E P A B D M O P E to C to N

  16. Improved Trace Selection: Starting Locations foo() A B C bar() • Loop Header Locations • Identified by JIT loop analysis • More accurate than “target of backward branch” heuristic • “Early exit” blocks • Allows trace fragments to be “layered” • Method prologue • Catches recursive execution D M N O E P A B D M O P E to C to N N P E to A

  17. Improved Trace Selection: Starting Locations foo() • Loop Header Locations • Identified by JIT loop analysis • More accurate than “target of backward branch” heuristic • “Early exit” blocks • Allows trace fragments to be “layered” • Method prologue • Catches recursive execution A B C D A B D to Epilogue to C

  18. Improved Trace Selection: Stopping Criteria foo() A B C bar() • Cycle Returned to the loop header • Abutted Arrived at another loop header • Length Limited (unusual) 128 basic blocks encountered • Rejoined (unusual) Returned to a basic block already in trace • Exited (unusual) Exited the method without meeting above conditions. (Identifiable by stack height.) D M N O E P A B D M O P E to C to N N P E to A

  19. Improved Trace Selection: Stopping Criteria foo() A B C bar() • Cycle Returned to the loop header • Abutted Arrived at another loop header • Length Limited (unusual) 128 basic blocks encountered • Rejoined (unusual) Returned to a basic block already in trace • Exited (unusual) Exited the method without meeting above conditions. (Identifiable by stack height.) D M N O E P A B D M O P E to C to N N P E to A

  20. A B C D JIT-Inserted Instrumentation (a) Assembly of original method code-block (Loop header) • (b) Assembly of code-block to be used for tracing Low-fidelity Instrumentation High-fidelity Instrumentation JUMP_BLOCK TRACE_HEAD_A A TRACE_HEAD_B B C D TRAMPOLINE_A TRAMPOLINE_B INSTRUM_A A’ INSTRUM_B B’ INSTRUM_C C’ INSTRUM_D D’ TRAMPOLINE_A’ TRAMPOLINE_B’ TRAMPOLINE_C’ TRAMPOLINE_D’ Loop header counters Paths through blocks

  21. A B C D Low-fidelity Instrumentation High-fidelity Instrumentation JUMP_BLOCK TRACE_HEAD_A A TRACE_HEAD_B B C D TRAMPOLINE_A TRAMPOLINE_B INSTRUM_A A’ INSTRUM_B B’ INSTRUM_C C’ INSTRUM_D D’ TRAMPOLINE_A’ TRAMPOLINE_B’ TRAMPOLINE_C’ TRAMPOLINE_D’ JIT-Inserted Instrumentation (a) Assembly of original method code-block (Loop header) • (b) Assembly of code-block to be used for tracing Loop header counters Paths through blocks

  22. A B C D JIT-Inserted Instrumentation (a) Assembly of original method code-block (Loop header) • (b) Assembly of code-block to be used for tracing Low-fidelity Instrumentation High-fidelity Instrumentation JUMP_BLOCK TRACE_HEAD_A A TRACE_HEAD_B B C D TRAMPOLINE_A TRAMPOLINE_B INSTRUM_A A’ INSTRUM_B B’ INSTRUM_C C’ INSTRUM_D D’ TRAMPOLINE_A’ TRAMPOLINE_B’ TRAMPOLINE_C’ TRAMPOLINE_D’ Loop header counters Paths through blocks

  23. A B C D JIT-Inserted Instrumentation (a) Assembly of original method code-block (Loop header) • (b) Assembly of code-block to be used for tracing Low-fidelity Instrumentation High-fidelity Instrumentation JUMP_BLOCK A TRACE_HEAD_B B C D TRAMPOLINE_A TRAMPOLINE_B INSTRUM_A A’ INSTRUM_B B’ INSTRUM_C C’ INSTRUM_D D’ TRAMPOLINE_A’ TRAMPOLINE_B’ TRAMPOLINE_C’ TRAMPOLINE_D’ Loop header counters Paths through blocks

  24. foo() A B C bar() D M N O E P Improvement Opportunity A B D E C M N P O

  25. foo() A B C bar() D M N O E P Improvement Opportunity A B D E C M N P O Virtual Address Space (1GB) 5B0480C6 (Low) 9BFE8D1F (High)

  26. Trace Layouts in Address Space (227_MTRT) Traces Virtual Address Space (1GB) 5B0480C6 (Low) 9BFE8D1F (High)

  27. foo() A B C bar() D M N O E P Improvement Opportunity A B D E C M N P O Gap Transition Fallthrough Transition

  28. Trace ContinuityDaCapo & SpecJVM98 Benchmarks • 1/3 traces necessarily fragmented (inter-procedural) • Most intra-procedural traces non-contiguous

  29. Transitions between basic blocks • Appropriate fallthough block 80% of the time • 15% misprediction rate for local control flow. • 20% of all transitions could benefit from trace fragment dispatch

  30. Trace Characteristics • Cycle and abutted traces make the majority • Few length-limited, rejoined traces • Surprisingly large number of exited traces • Sporadic loops

  31. Instrumentation Overhead (Startup) • One-iteration tests. (40x) • Mixed slowdown results: 7.4% (jython), -6.5% (_227_mtrt) • Average startup overhead: 1.7%

  32. Instrumentation Overhead (Steady State) • 40-iteration tests. (8x) • Average steady-state overhead: 1.7%

  33. Summary • Envision trace fragment dispatch as a feedback-directed optimization • Locality optimizations not addressed by JIT compiler • Adapt to changing behavior without recompilation • More accurate trace selection • Enabled by the co-location with the JIT and VM runtime • Evaluated opportunity and cost • 20% of basic block transitions do not use sequential fallthough. • 25% of taken branches/calls transfer control flow to locations outside the VM page • Minimal startup and maintenance overhead for trace selection

  34. Questions?

  35. Improved Trace Selection: Starting Locations foo() A • Loop Header Locations • Identified by JIT loop analysis • More accurate than “target of backward branch” heuristic • “Early exit” blocks • Allows trace fragments to be “layered” • Method prologue • Catches recursive execution B C D B C to D

  36. Improved Trace Selection: Starting Locations foo() A • Loop Header Locations • Identified by JIT loop analysis • More accurate than “target of backward branch” heuristic • “Early exit” blocks • Allows trace fragments to be “layered” • Method prologue • Catches recursive execution B C D B C to D D A to A

  37. Normalized Trace Layouts (227_MTRT) Traces

More Related