180 likes | 357 Views
Smarter Code Generation for Dyninst. Nick Rutar. Why do we need better code generation?. Dyninst has evolved through its releases Originally designed with Paradyn in mind Frequent changes to instrumentation Current code generation requirements
E N D
Smarter Code Generation for Dyninst Nick Rutar
Why do we need better code generation? • Dyninst has evolved through its releases • Originally designed with Paradyn in mind • Frequent changes to instrumentation • Current code generation requirements • Not have adverse effects on pre-existing program • Tuned to handle future changes to instrumentation • Certain optimizations currently in compilers can be used for Dyninst • Dataflow analysis • Register allocation • Because it is a dynamic environment certain modifications need to be performed
Methods to Improve Code Generation • Decrease Register Spills • No Function Call • Save only registers generated by a mini-tramp • Function Call • Do Analysis to see which registers need saving • Merge Base Tramp & Mini-Tramp • Need to create flag after first instrumentation • Only one mini-tramp created per site • Dataflow analysis for Dead Registers • Useful for arbitrary instrumentation points
Current Register Implementation • Base tramp • Saves/restores all volatile (caller-save) registers • Mini tramp • Uses volatile registers as needed • Problems • Many small code snippets will have minimal register usage • POWER platform • 11 volatile GPR • 14 volatile FPR
New Register Implementation • Base Tramp Generation • Only registers explicitly used within base tramp are saved • Series of place holder noops are generated for those registers not saved • Jump created at last save/restore to end of noop group • Mini Tramp Generation • Keeps track of all volatile registers used • After Mini Tramp Generation • Noops are replaced within base tramp with save(s)/restore(s) • Jump is updated
Old Base Tramp (saves) stu r1,-540(r1) st r12,312(r1) st r11,308(r1) st r10,304(r1) st r9,300(r1) st r8,296(r1) st r7,292(r1) st r6,288(r1) st r5,284(r1) st r4,280(r1) st r3,276(r1) st r0,264(r1) stfd f0,152(r1) stfd f1,160(r1) stfd f2,168(r1) stfd f3,176(r1) stfd f4,184(r1) stfd f5,192(r1) stfd f6,200(r1) stfd f7,208(r1) stfd f8,216(r1) stfd f9,224(r1) stfd f10,232(r1) stfd f11,240(r1) stfd f12,248(r1) stfd f13,256(r1) Mini Tramp liu r12,8192 l r12,1416(r12) cal r11,1(r12) liu r10,8192 st r11,1416(r10) br Example (from POWER) • Old Base Tramp (restores) l r12,312(r1) l r11,308(r1) l r10,304(r1) l r9,300(r1) l r8,296(r1) l r7,292(r1) l r6,288(r1) l r5,284(r1) l r4,280(r1) l r3,276(r1) l r0,264(r1) lfd f0,152(r1) lfd f1,160(r1) lfd f2,168(r1) lfd f3,176(r1) lfd f4,184(r1) lfd f5,192(r1) lfd f6,200(r1) lfd f7,208(r1) lfd f8,216(r1) lfd f9,224(r1) lfd f10,232(r1) lfd f11,240(r1) lfd f12,248(r1) lfd f13,256(r1) cal r1,540r1) G P R G P R F PR F PR
New Base Tramp stu r1,-540(r1) st r12,312(r1) st r11,308(r1) st r10,304(r1) st r6,288(r1) st r5,284(r1) st r0,264(r1) stfd f10,232(r1) b nop . . . nop brl Reduces Base Tramp by 34 instructions Eliminate 18 Saves, 18 Restores Add two jumps Mini-Tramp liu r12,8192 l r12,1416(r12) cal r11,1(r12) liu r10,8192 st r11,1416(r10) br Example (continued)
Experiments (POWER) • Simple mutatee • for (a = 0; a < 0xfffff; a++) { x=x+a; x+= 5*a; if( x > 6000) x=2; else x *=4; } • Instrumentation • Increments global variable by one • Mini-tramp is six instructions • Inserted at every node on CFG for program • Four base tramps for every iteration of loop
Results (POWER) • Instructions Completed • Version 4.1.1 – 30,393,823 • New Code Generation – 21,485,346 • FPU produced a result • Version 4.1.1 – 4,716,269 • New Code Generation – 1,310,013
Dealing with Function Calls • Linear scan on instructions for function that is called from mini-tramp • Record all modified registers within function • Make recursive calls when needed • At certain cut-off point assume all registers were clobbered
Merging Base & Mini Tramp • Original Design Decisions for Dyninst made to use Paradyn’s instrumentation usage pattern • Large amount of instrumentation changed frequently • Can generate better code for various reasons • Eliminates noops for registers in base tramp • Eliminates link register modifications and branches • Makes assembly more stream-lined • And readable … if you’re in to that kind of thing • One instrumentation point installed … that’s it • Functionality somewhat limited • Tradeoff of speed for ease of further instrumentation • Delete then reinsert (Replace)
How will it work? • Create flag for BPatch class in API • Once flag is set merging is set • When flag gets reset system reverts to old style • void setMergeTramp(bool x) • Similar to recursion flag currently in Dyninst • No effect on current Dyninst use • Default flag set to no merging • Most users will probably leave it at one setting based on instrumentation needs
No Merging Insert Same as before, unlimited per instrumentation site Delete Deletes instance of mini-tramp Merging Insert Only one mini-tramp allowed to be inserted, instrumentation point locked after first mini tramp generated Delete Deletes instance of mini-tramp and base tramp Replace Delete, then Insert new Possible to save AST information at the old mini tramp to be used for new instrumentation Mini-tramp operation comparison
Dataflow analysis for Dead Registers • Register use after instrumentation • Overwritten before accessed • We are free to use them in tramp without having to spill them • Not overwritten • Spill to stay on cautious side • Do analysis before tramp generation • Dead registers have highest priority • Currently same registers used regardless
Analyze code before and after an arbitrary instrumentation point Dead registers are given priority for register use in tramps Dataflow Analysis Example • Uninstrumented Program . . . cal r11,1(r11) cal r10,3(r12) st r11,1416(r10) l r4, 280(r1) cal r3, 2(r11) st r10, 304(r1) **Potential Inst Point** cal r3, 2(r11) cal r4, 5(r10) l r10, 304(r1) l r11, 308(r1)
Other Speed-ups for Dyninst • Partial Parsing of functions • Grab symbol table and create function objects • Delay analysis until function is actually accessed • User can’t see non-symbol-table functions • Therefore, We don’t have to worry about them
Status • Completed and in New Release (POWER) • New Register Spilling for Basic Snippets • Registers Spilled for Function Calls from a Mini Tramp • Partial Parsing (All platforms) • Currently Being Implemented (POWER) • Linear Code Scan for function calls • Base Tramp, Mini Tramp Merging • Data Flow Analysis for Dead Registers • Will eventually be on all platforms
Questions • ???