1 / 51

Efficient x86 Instrumentation :

Efficient x86 Instrumentation :. Dynamic Rewriting and Function Relocation Itai Gurari gurari@cs.wisc.edu Computer Science Department University of Wisconsin 1210 W. Dayton St. Madison, WI 53706-1685. Introduction. Dynamic Instrumentation:

jonah-hill
Download Presentation

Efficient x86 Instrumentation :

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation Itai Gurari gurari@cs.wisc.edu Computer Science Department University of Wisconsin 1210 W. Dayton St. Madison, WI 53706-1685 Paradyn/Condor Week Madison, WI March 12-14, 2001

  2. Introduction Dynamic Instrumentation: • Insert instrumentation into application in execution • Used by Paradyn to gather performance data • Paradyn instrumentation is inserted for three types of points • function entry, exit, and call Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  3. Paradyn Instrumentation Points Executable Code foo () { call <bar> } Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  4. Paradyn Instrumentation Points Executable Code Entry foo () { call <bar> } Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  5. Paradyn Instrumentation Points Instrumentation Executable Code Entry startTimer() foo () { call <bar> } counter++ Call Exit stopTimer() Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  6. Goal Transfer from function to instrumentation code as quickly as possible Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  7. Control Transfer To switch execution from a function to its instrumentation code: • Overwrite instructions in function with a control transfer instruction. • Equivalent of overwritten instructions are copied to the code patch area. • On the x86, Paradyn uses, by default, a 5- byte jump to transfer control the instrumentation code. • 5-byte jump range is whole address space • If a 5-byte instruction won’t fit, we use a 1-byte traps (int3 instruction). Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  8. Inserting Control Transfer Instructions • Dynamically rewrite function in place • Different techniques for different types of instrumentation points Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  9. Jumps and Traps Instrument Entry Point Case 1 push mov sub Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  10. Jumps and Traps Instrument Entry Point Case 1 push mov sub Enough room to replace instruction with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  11. Jumps and Traps Instrument Entry Point Case 2 push mov jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  12. Jumps and Traps Instrument Entry Point Case 2 push mov jmp Inserting a jump instruction interferes with the target of the backwards jump jmp <instrumentation> jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  13. Jumps and Traps Instrument Entry Point Case 2 push mov jmp Must use a trap instruction to get to instrumentation int3 mov jmp Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  14. Jumps and Traps Instrument Call Point call <Foo> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  15. Jumps and Traps Instrument Call Point call <Foo> Enough room to replace instruction with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  16. Jumps and Traps Instrument Exit Point Case 1 mov leave ret Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  17. Jumps and Traps Instrument Exit Point Case 1 mov leave ret Back up far enough to replace instructions with a jump jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  18. Jumps and Traps Instrument Exit Point Case 2 call <Foo> leave ret Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  19. Jumps and Traps Instrument Exit Point Case 2 call <Foo> leave ret Jump interferes with the preceding call call jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  20. Jumps and Traps Instrument Exit Point Case 2a call <Foo> leave ret Beginning of next function (4-byte boundary) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  21. Jumps and Traps Instrument Exit Point Case 2a Compiler pads with “bonus bytes” ? ? ? call <Foo> leave ret Beginning of next function (4-byte boundary) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  22. Jumps and Traps Instrument Exit Point Case 2a Compiler pads with “bonus bytes” ? ? ? call <Foo> leave ret Beginning of next function (4-byte boundary) Replace instructions with a jump jmp <instrumentation> call <Foo> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  23. Jumps and Traps Instrument Exit Point Case 2b Not enough “bonus bytes” to overwrite with a jump (if any) ? call <Foo> leave ret Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  24. Jumps and Traps Instrument Exit Point Case 2b Not enough “bonus bytes” to overwrite with a jump (if any) ? call <Foo> leave ret Overwrite return with a trap call <Foo> leave int3 ? Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  25. Jumps and Traps Extra slot No jumps to first ten bytes of function push mov sub mov Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  26. Jumps and Traps Extra slot No jumps to first ten bytes of function push mov sub mov Enough space to overwrite entry with a jump jmp <instrumentation> mov Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  27. Jumps and Traps Extra slot No jumps to first ten bytes of function push mov sub mov Enough space to overwrite entry with a jump Make 2-byte jump to “extra slot”, overwrite “extra slot” with jump to instrumentation jmp <instrumentation> jmp <instrumentation> Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  28. Control Transfer Traps on x86 • Generate an exception that is caught by either the application (Solaris, Linux) or the paradyn daemon (Windows NT). • Address of trap instruction is used to calculate which instrumentation code to execute. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  29. Problem Trap handling is slow: • On Solaris 2.6 jumps are over 1000 times faster than traps. • On Linux 2.2 jumps are over 200 times faster than traps Traps Limit Instrumentation: • can’t insert as much or at as fine a granularity Trap handling logic is difficult: • Susceptible to bugs • Difficult to understand and maintain Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  30. Solution Rewrite functions that do not have enough room for jumps, into functions that do have enough room for jumps. • Rewrite the function, on-the-fly: combines dynamic instrumentation, binary rewriting. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  31. Dynamic Rewriting Dynamic Rewriting Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  32. Dynamic Rewriting Dynamic Rewriting overwrite existing instructions Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  33. Dynamic Rewriting Dynamic Rewriting overwrite existing instructions expand instrumentation points Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  34. Dynamic Rewriting Dynamic Rewriting overwrite existing instructions expand instrumentation points Relocate Function Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  35. Function Rewriting and Relocation In Paradyn we rewrite a function: • only if the function contains an instrumentation point that would require using a trap to instrument • the first time a request to instrument the function is made • even if the instrumentation to be inserted is not for a point that requires using a jump • e.g. the exit needs a trap, the entry can use a jump, request is to instrument the entry Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  36. Function Rewriting and Relocation(continued) • all instrumentation points that cannot use a jump are expanded. Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  37. Rewriting A Function Entry Call push mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  38. Rewriting A Function Entry Call Insert nop at entry push nop mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  39. Rewriting A Function Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  40. Rewriting A Function Entry Call Insert nop at entry jmp < instrumentation > call <Foo> nop nop nop nop call <Bar> ret Insert nops at exit Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  41. Rewriting A Function Entry Call Insert nop at entry jmp < instrumentation > call <Foo> call <Bar> jmp < instrumentation > Insert nops at exit Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  42. Rewriting A Function Original Function Entry Call push mov call <Foo> call <Bar> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  43. Rewriting A Function Original Function Entry Overwrite entry of original function with jump to rewritten function jmp < rewritten function> call <Foo> call <Foo> ret Call Exit Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  44. Update Jumps and Calls • PC-relative jump and call instructions: • with destinations outside the function will have incorrect displacements • some jumps to locations inside the function will have incorrect displacements • 2-byte jumps: • have range of 128 bytes forward, 127 bytes backwards • if target address is no longer in range, replace 2-byte instruction with 5-byte instruction that has further reach Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  45. Status Dynamic rewriting and function relocation is operational in Paradyn release 3.2 for x86 (Solaris, Linux, Windows NT). Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  46. Current Limitations We do not relocate a function if: • the application is executing within the function we want to instrument • it has a jump table Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  47. Jumps vs. Traps Trap handling: Average time to get to instrumentation and back Trap Jump Solaris Linux 37.6 .03 .04 8.3 • time in microseconds Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  48. Jumps vs. Traps • Relocating functions that are performance bottlenecks, leads to greatest speedup • More instrumentation can be inserted since perturbation to system is minimized. • In Paradyn, ratio of speedup depends on type of metric (e.g. CPU time, number of procedure calls) Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  49. Some Results bubba (circuit layout) • instrumented 9 functions for CPU • all required trap for exit point • 5 relocated functions • called 400 thousand times • consumed 20% of CPU. • 23 seconds to execute using relocation • 42 seconds to execute without relocation Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

  50. Some Results fspx (2-D heat transfer simulation) • 4 of 46 functions required traps • all for exit points • instrumented __atan for CPU • required trap for exit • called 107 million times • consumed 25% of CPU. • 7.5 minutes to execute using relocation • 115 minutes to execute without relocation Efficient x86 Instrumentation: Dynamic Rewriting and Function Relocation

More Related