360 likes | 719 Views
Efficient Debugging using Dynamic Instrumentation (EDDI). Qin Zhao ( Singapore-MIT Alliance ) Rodric Rabbah ( IBM TJ Watson Center ) Saman Amarasinghe ( CSAIL, MIT ) Larry Rudolph ( VMware ) Weng-Fai Wong ( National Univ of Singapore ). Debugging is hard. Today’s applications are huge
E N D
Efficient Debugging using Dynamic Instrumentation(EDDI) Qin Zhao (Singapore-MIT Alliance) Rodric Rabbah (IBM TJ Watson Center) Saman Amarasinghe (CSAIL, MIT) Larry Rudolph (VMware) Weng-Fai Wong (National Univ of Singapore)
Debugging is hard • Today’s applications are huge • Many files and components • Run on complex systems Source: Wikipedia and M Squared Technologies EDDI – Zhao et al.
Debugging today is very myopic • Inspect relatively simple predicates at individual program points EDDI – Zhao et al.
Example using GDB (gdb) break dist_spu.c:19 (gdb) run (gdb) print cb $1 = {a_addr = 25286272, b_addr = 25269248, res_addr = 25269888, padding = 0} (gdb) cond 1 (cb.padding != 0) (gdb) run $2 = {a_addr = 25282312, b_addr = 2483423, res_addr = 25269888, padding = 10} EDDI – Zhao et al.
printf() The state of the art • An exaggeration of course… • … but how many of you use printf() for debugging? EDDI – Zhao et al.
A case of misplaces priorities? • Program instructions located in memory • Instructions are read from memory • Instructions manipulate memory • But debugging practices are not optimized for watching memory • Instruction breakpoints are quite fast • Watching memory is quite slow EDDI – Zhao et al.
Breakpoint break when instruction at specific address executes Watchpoint break when dataat specific address mutates Breakpoint vs. Watchpoint EDDI – Zhao et al.
Typical support for watchpoints • Hardware support for small number of watchpoints • GDB uses one of four x86 debug breakpoint registers • Software fallback for large number of watchpoints • Single step execution and check linked list of watchpoints • More than 1000x slowdown observed EDDI – Zhao et al.
Main insight underlying EDDI • Dynamic binary instrumentation can dramatically improve support for watchpoints • Watch orders of magnitude more locations than is feasible today • Better watchpoint support enables many new debugging features EDDI – Zhao et al.
Examples of new debugging capabilities facilitated by EDDI can provide all of these and other debugging features in a single unified framework EDDI – Zhao et al.
Efficient Watchpoints using EDDI • Carefully crafted strategy featuring and combining • Fast-access shadow memory • Optimized watchpoint tracking data structure • Full instrumentation • Slow and detailed instrumentation of every memory access • Partial instrumentation • Focused heuristics for fast instrumentation • Compiler optimizations • Dynamic binary rewriting EDDI – Zhao et al.
Outline • EDDI framework • Fast-access shadow memory • Full instrumentation • Partial instrumentation • Case studies • Future work EDDI – Zhao et al.
User Translate and dispatch command Front-End Command interpreter DBI • Signals, • IPC EDDI Overview • Accelerate and extend debugger functionality by dynamic co-optimization of debugger and application code Debugger(e.g., GDB) User Application EDDI – Zhao et al.
EDDI and Watchpoints • Associate guarding predicates with watched memory locations • Individual or aggregate addresses • Instrument potentially all memory operations • Check if operation modifies watched location • Update location if guarding predicate allows it • Otherwise interrupt execution EDDI – Zhao et al.
Outline • EDDI framework • Fast-access shadow memory • Full instrumentation • Partial instrumentation • Case studies • Future work EDDI – Zhao et al.
. . . . . . . . . Shadow memory • On-demand shadow page tracks watchpoints (set of watched locations) • Shadow memory optimized for constant overhead • Lookup table stores displacement between application and shadow pages • Trade-off space for time Lookup Table Application Pages Shadow Pages EDDI – Zhao et al.
Outline • EDDI framework • Fast-access shadow memory • Full instrumentation • Partial instrumentation • Case studies • Future work EDDI – Zhao et al.
Instrumentation • DBI instruments application code to monitor reads and writes from/to memory 1. Save context 2. Lookup address in shadow memory 3. Handle watched address according to user commands 4. Restore context and resume execution EDDI – Zhao et al.
Example of full instrumentation 01: mov %ecx -> [ECX_slot] ! Save register 02: mov %eax -> [EAX_slot] 03: seto [OF_slot + 3] ! Save oflag 04: lahf ! Save eflags 05: mov %eax -> [AF_slot] 06: mov [EAX_slot] -> %eax ! Restore eax 07: lea [%eax, %ebx] -> %ecx ! Get address ! Compute table index 08: shr %ecx, $12 -> %ecx ! Shift right 09: cmp table[%ecx, 4], $0 ! Check entry 10: je 16: ! Check if tag is set to ‘watched’ 11: add %eax, table[%ecx, 4] -> %eax 12: testb $0xAA, [%eax, %ebx] 13: jz 15: 14: trap ! Trap 15: sub %eax, table[%ecx, 4] -> %eax 16: mov [AF_slot] -> %eax ! Restore all ! Restore oflag by triggering overflow ! if necessary 17: add [OF_slot], $0x7f000000 -> [OF_slot] 18: sahf ! Restore eflags 19: mov [EAX_slot] -> %eax 20: mov [ECX_slot] -> %ecx • Context Save • Lines 1-6 • Address Calculation • Line 7 • Tag Checks • Lines 8-15 • Context Restore • Lines 16-20 EDDI – Zhao et al.
Experimental Results • SPEC 2000 (GCC 4.0 –O3) • 2.66 GHz Intel Core 2 with 2GB RAM • Linux FC4 EDDI – Zhao et al.
Full instrumentation overhead:Slowdown compared to native EDDI – Zhao et al.
Classic optimizations Context switch reduction Group checks Local variables check elimination Watchpoint specific optimizations Merge checks Stack displacement Reduce overhead for stack variables overhead via shadow stack Lowering instrumentation overhead EDDI – Zhao et al.
Optimized instrumentation:Slowdown compared to native EDDI – Zhao et al.
Performance overhead as a function of watchpoints EDDI – Zhao et al.
Outline • EDDI framework • Fast-access shadow memory • Full instrumentation • Partial instrumentation • Case studies • Future work EDDI – Zhao et al.
Partial instrumentation • Key idea: two-stage instrumentation • Coarse grained fast checks to entire pages • Fine grained instrumentation within a page when necessary 1. Protect pages containing watched data locations 2. Catch SIGSEGV signals when access to protected page occurs 3. Instrument code for fine-grained watchpoint checks EDDI – Zhao et al.
PI: rewrite after SIGSEGV hit mov %ecx [ECX_SLOT] ! steal ecx lea [%eax+0x10] %ecx ! calculate address ... ! save eflags shr %ecx, 20 %ecx ! right shift cmp table[%ecx], $0 ! check table entry je LABEL_ORIG ... ! check tag status ... ! restore eflags and ecx mov 0 [%eax + 0x030010] ! redirected reference jmp LABEL_NEXT LABEL_ORIG ... ! restore eflags and ecx mov 0 [%eax+0x10] ! access original location LABEL_NEXT: ... ! continue execution EDDI – Zhao et al.
Performance evaluation • Randomly select heap objects to watch • Intercept malloc • Randomly allocated object from protected page or non-protected page • Object sizes vary EDDI – Zhao et al.
Runtime overhead using partial instrumentation EDDI – Zhao et al.
Outline • EDDI framework • Fast-access shadow memory • Full instrumentation • Partial instrumentation • Case studies • Future work EDDI – Zhao et al.
The value of having many watchpoints:Case Study 1 • Watch for Return Address Access • some functions try to obtain current pc • a watchpoint is automatically • Set on the return address of a function when it is called. • Cleared on return • Ret, setjmp EDDI – Zhao et al.
The value of having many watchpoints:Case Study 2 • Dynamic Pointer Analysis • Using 181.mcf • Watch all 33,112 instances of node data-type • Identified 468 (static) instructions accessed objects of such type 1.08 × 1010 times during execution EDDI – Zhao et al.
The value of having many watchpoints:Case Study 3 • Read Un-initialized Variable • Again using 181.mcf • Changed calloc() to malloc() • Watch all malloc’ed memory • When a location is initialized, watchpoint is cleared • the first uninitialized read occurs in 0.001 secs from the start of execution • EDDI reports the error in 0.037 secs • Overall, the instrumented execution is 83% slower using PI and 250% slower using FI EDDI – Zhao et al.
The value of having many watchpoints:Case Study 4 • Software Security • Using the 20 Wilander Buffer Overflow Benchmarks • Watched the end of all buffers • Successfully identified all violations EDDI – Zhao et al.
Summary • Efficient debugging using dynamic instrumentation enables new opportunities that increase feature set available for debugging • Paper demonstrates using EDDI to significantly improve support for debugging using watchpoints • Practical to watch millions of memory locations with 3x average slowdown • Large number of watchpoints make it possible to explore new debugging scenarios • Holistic debugging methodology EDDI – Zhao et al.
Main thrust for future work • EDDI for multicores and parallel program • Main idea: rather than watch execution and interleaving to catch data races and deadlocks… • … watch memory, record accesses, and on a data race or deadlock, inspect records to determine source of bug EDDI – Zhao et al.