1 / 20

Performance Optimizations in Dyninst

Learn about the performance optimizations in Dyninst, a complex instrumentation tool that relocates code, improves register usage, and optimizes code generation.

michealm
Download Presentation

Performance Optimizations in Dyninst

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Optimizations in Dyninst Andrew Bernat, Matthew Legendre

  2. Instrumentation is Complicated • User perspective: • “Insert some new code here, here, and here.” • Dyninst’s perspective: • Relocation – Move code to make space for instrumentation • Infrastructure – Save/restore machine state • Instrumentation – Generate user provided code Performance Optimizations in Dyninst

  3. Sources of Overhead Relocation Infrastructure Instrumentation • Extra jumps • Unnecessary emulation • Traps • Extra register saves • Tramp guards • Inefficient register usage • Poor code generation • Optimizations • Inlining instrumentation • Compiler optimizations of generated code • 665% -> 32% Performance Optimizations in Dyninst

  4. History • Enable fast (and frequent) insertion and removal of code • “Linked list” model • Insert/remove by patching branches • Model has evolved over time • Long-lived instrumentation (particularly with static rewriter) • Focus on speed of execution instead of speed of insertion Performance Optimizations in Dyninst

  5. Outlined Instrumentation Original Code Relocated Block Relocated Block Relocated Block Relocated Code Instrumentation/Infrastructure Basetramp Basetramp Basetramp Basetramp Basetramp Basetramp Minitramp Relocated Function Relocated Function Branch Minitramp Minitramp Minitramp Branch Minitramp Relocated Function Branch Minitramp Performance Optimizations in Dyninst

  6. Outlined System • Fast insertion and removal • Simple to update • Original serves as a “handle” • Reduced code relocation • Block or instruction • Hard to optimize • New code can be inserted without warning • Poor code locality Performance Optimizations in Dyninst

  7. Partial Inlining Original Code Relocated Block Relocated Block Relocated Block Relocated Code Instrumentation & Instrumentation Basetramp Basetramp Basetramp Basetramp Basetramp Basetramp Minitramp Relocated Function Relocated Function Branch Minitramp Minitramp Minitramp Branch Minitramp Relocated Function Branch Minitramp Performance Optimizations in Dyninst

  8. Full Inlining Original Code Relocated Block Relocated Block Relocated Block Relocated Code & Instrumentation Relocated Function Relocated Function Relocated Function Branch ? Branch Relocated Function Branch Performance Optimizations in Dyninst

  9. Branch Reduction • Inlining removed three levels of branching • Function to block to basetramp to minitramp • One level is left • Function original to relocated copy • Can we remove this branch as well? • Identify and rewrite calls to relocated functions • Regenerate whenever target is moved Performance Optimizations in Dyninst

  10. Optimizing BaseTramps and MiniTramps • DyninstAPI contains a built-in compiler • Converts ASTs to machine code • Used for BaseTramps and MiniTramps • Designed to be cross-platform (x86, x86_64, ppc32, ppc64, IA-64, Sparc) • Build new optimizations into compiler • Some optimizations from classic compilers • Some optimizations are instrumentation specific Performance Optimizations in Dyninst

  11. Optimizing Code Generation pusha pushf push %ebp mov %esp,%ebp sub $128,%esp mov 0x805a490,%eax mov (%eax),%ecx test %ecx,%ecx je done mov $0x0,(%ecx) mov $1,%eax mov %eax,4(%ebp) mov 0x805a494,%ebx mov 4(%ebp),%eax add %eax,%ebx mov %ebx,0x805a494 mov 0x805a490,%eax mov $0x1,(%eax) done: leave popf popa Saving too many registers Register Saves Stack frame (Setup) Stack frame unnecessary Tramp guards unnecessary Trampoline Guard (Check) Extraneous register usage “Virtual” registers unnecessary Instrumentation Inefficient instrumentation Trampoline Guard (Restore) Recalculating old value Stack frame (Clean) Register Restores

  12. Register Saves Register Saves • Calculate live registers at inst point • Calculate registers used by instrumentation • Save intersection • Use more efficient flag saves pusha pushf push %eax lahf push %eax Performance Optimizations in Dyninst

  13. Virtual Registers Instrumentation • “Virtual Registers” were stack slots on x86 • Load from virtual register to eax • Operate on eax • Store from eax to virtual register • Now use real register allocation algorithm, with spilling mov $1,%eax mov %eax,4(%ebp) mov 4(%ebp),%eax mov $1,%eax Performance Optimizations in Dyninst

  14. AST to Machine Code Compilation Instrumentation • Each AST node is converted to an instruction • Not optimal on CISC systems • Recognize sequences of ASTs, emit optimized code mov $1,%eax incl 0x805a494 = mov 0x805a494,%ebx 0x805a494 + add %eax,%ebx mov $0x805a494,%ecx 0x805a494 1 mov %ebx,(%ecx) Performance Optimizations in Dyninst

  15. Optional Infrastructure Tramp Guard Stack Frame • Some tramp infrastructure not always required. E.g, • Stack frame only needed for register spilling • Tramp guard only need for function calls • Save only necessary infrastructure mov 0x805a490,%eax mov (%eax),%ecx test %ecx,%ecx je done mov $0x0,(%ecx) push %ebp mov %esp,%ebp sub $0x32,%esp ... FP Saves mov %esp,%eax sub $512,%esp and 0xfffffff0,%esp fxsave (%esp) push %eax Stack Shift lea 0x128(%rsp),%rsp Performance Optimizations in Dyninst

  16. Fixed Point Code Generation • Optimizations may be interlinked. E.g., • Removing code may leave registers unused • Removing unused registers eliminates saves • Eliminating saves removes stack access • Removing stack accesses may eliminate stack shift • Typical code generation requires 2 passes Performance Optimizations in Dyninst

  17. Optimizing Code Generation pusha pushf push %ebp mov %esp,%ebp sub $128,%esp mov 0x805a490,%eax mov (%eax),%ecx test %ecx,%ecx je done mov $0x0,(%ecx) mov $1,%eax mov %eax,4(%ebp) mov 0x805a494,%ebx mov 4(%ebp),%eax incl 0x805a494 mov %ebx,0x805a494 mov 0x805a490,%eax mov $0x1,(%eax) done: leave popf popa pusha pushf push %ebp mov %esp,%ebp sub $128,%esp mov 0x805a490,%eax mov (%eax),%ecx test %ecx,%ecx je done mov $0x0,(%ecx) mov $1,%eax mov %eax,4(%ebp) mov 0x805a494,%ebx mov 4(%ebp),%eax add %eax,%ebx mov %ebx,0x805a494 mov 0x805a490,%eax mov $0x1,(%eax) done: leave popf popa Register Saves Stack frame (Setup) Trampoline Guard (Check) incl 0x805a494 Instrumentation Trampoline Guard (Restore) Stack frame (Clean) Register Restores

  18. Results • Basic block instrumentation on ‘go’ from SPEC2000 Instrumented run time (base: 12.25s) Instrumentation time Performance Optimizations in Dyninst

  19. Conclusions • Optimizations in DyninstAPI instrumentation • Inline instrumentation levels • Generate more efficient code • Significant performance gains • Instrumentation code runs faster • More time spent generating instrumentation Performance Optimizations in Dyninst

  20. Questions? Performance Optimizations in Dyninst

More Related