450 likes | 588 Views
Source Level Debugging of Parallel Programs. Roland Wismüller LRR-TUM, TU München Germany. Outline. Introduction: source level debuggers Debuggers for parallel programs Current / future work at LRR-TUM. What is a Debugger?. A tool to remove bugs? No! A tool to find bugs? No!
E N D
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany
Outline • Introduction: source level debuggers • Debuggers for parallel programs • Current / future work at LRR-TUM
What is a Debugger? • A tool to remove bugs? • No! • A tool to find bugs? • No! • A tool to examine program executions? • Yes!
call foo trap move r0,r1 Continue Execution 4) cont must execute original instruction replace trap with original instruction
call foo add #4,sp move r0,r1 Continue Execution 4) cont must execute original instruction execute a single step
Continue Execution 4) cont must execute original instruction call foo add #4,sp move r0,r1 insert trap again
Continue Execution 4) cont must execute original instruction call foo trap move r0,r1 continue execution
call foo add #4,sp move r0,r1 Continue Execution 4) cont must execute original instruction execute a single step • Problem: • there may be no support for single stepping
call foo add #4,sp move r0,r1 Continue Execution 4) cont must execute original instruction execute a single step replace next instruction with a trap
call foo add #4,sp trap Continue Execution 4) cont must execute original instruction continue execution
call foo add #4,sp trap Continue Execution 4) cont must execute original instruction insert original trap & instruction
call foo trap move r0,r1 Continue Execution 4) cont must execute original instruction continue execution • Still a problem: • original instruction may be a jump / call / ret • we have to emulate these instructions!
call foo add #4,sp move r0,r1 Continue Execution 4) cont must execute original instruction execute a single step • A different problem: • multithreading: • another thread may bypass our breakpoint
call foo add #4,sp move r0,r1 Continue Execution 4) cont must execute original instruction execute a single step • A different problem: • multithreading: • another thread may bypass our breakpoint
call foo add #4,sp move r0,r1 Continue Execution 4) cont must execute original instruction execute a single step • A different problem: • multithreading: • another thread may bypass our breakpoint
call foo add #4,sp move r0,r1 Continue Execution 4) cont must execute original instruction execute a single step • A different problem: • multithreading: • another thread may bypass our breakpoint
Continue Execution 4) cont must execute original instruction call foo trap move r0,r1 add #4,sp • Solution: • don’t remove the trap • execute original instruction somewhere else
add #4,sp Continue Execution 4) cont must execute original instruction call foo trap move r0,r1 • Solution: • don’t remove the trap • execute original instruction somewhere else
add #4,sp Continue Execution 4) cont must execute original instruction call foo trap move r0,r1 • Solution: • don’t remove the trap • execute original instruction somewhere else
Continue Execution 4) cont must execute original instruction call foo trap move r0,r1 add #4,sp • Solution: • don’t remove the trap • execute original instruction somewhere else
add #4,sp Continue Execution 4) cont must execute original instruction call foo trap move r0,r1 • Still a problem: • instruction may depend on the PC value • we have to emulate these instructions!
variable table i register i5 short print i reads i5 prints z !! Optimization Effects
Parallel Debugging • Additional properties of parallel programs • Requirements for parallel debuggers • Problems and solution techniques
Parallel Programs • Multiple processes and/or threads • created dynamically • many of them • Program distributed across several hosts • Additional state components: • communication subsystem
Multiple Processes / Threads • Naming processes / threads • system id’s • may not be unique, not persistent • not user friendly • debugger generated id’s • usually: small integers • selection based on additional information • naming not yet existent processes / threads • DETOP: pattern matching
Thread Selection in DETOP debugger id function executable system id node list selection pattern
Scalability • Input: use process / thread sets • commands are executed for each member • e.g. [1,2,3] print i or [2,7] break 123 • sometimes: named sets • problems: • command semantics may differ for the processes e.g. different executables / call stacks • when to evaluate named sets?
[1]: 12.3 [2]: 4.1 [3]: 12.3 [4]: 12.3 [5]: 12.3 [1,3-5]: 12.3 [2]: 4.1 Scalability • Output: aggregation • simple case: aggregate identical results • complex case: aggregate partially identical results • impossible cases: asynchronous events
Concurrency Issues • What happens if a thread stops? • stop all threads in all processes • stop all threads in the same process • stop only that thread • What happens if I continue a thread? • start all threads in all processes • start all threads in the same process • start only that thread • When does the debugger accept input? • only when all processes are stopped • always
Concurrency Issues • What happens if a thread stops? • stop all threads in all processes (BP option) • stop all threads in the same process (BP option) • stop only that thread • What happens if I continue a thread? • start all threads in all processes (separate command) • start all threads in the same process (use pattern) • start only that thread • When does the debugger accept input? • only when all processes are stopped • always
Additional State Components • E.g. message buffers, blocked processes • Usually no support from debuggers • additional dependency on programming library implementation • Often other tools (visualizers) will show this information • use them together with the debugger (?) interoperable tools
Interoperable Tools • Multiple, loosely coupled tools are used on the same program • Concrete scenario: • debugger that allows to ‘time-warp’ • i.e. return to previous program states without rerunning the program • speed up debugging cycle of long running programs
‘Time-Warp’ Debugger • Tools that need to interoperate: • parallel debugger (DETOP) • checkpointing system for parallel programs (CoCheck, based on Condor) • deterministic execution controller (codex) • means to specify the state to return to (VISTOP: state based program flow visualizer)
Preconditions for Interoperability • Common monitoring infrastructure • OMIS / OCM • Mechanisms for informing tools on modifications of state done by other tools • e.g. VISTOP must know when DETOP stops a process, as event buffer must be read • Mechanisms for direct tool interaction • e.g. VISTOP to CoCheck: ‘restart from checkpoint’
OMIS • Basis: • objects + services • event / action paradigm • scalability by using object sets • location transparency • Example: thread_creates_proc([t_1,t_2]): thread_stop([$proc, $new_proc]) thread_get_backtrace([$thread],0)
Interoperability Problems • A tool may violate preconditions of another tool • DETOP can stop a process • checkpointing is initiated by sending a signal • stopped process won’t handle signal ! • we cannot hide the state change from the checkpointer this case cannot be handled easily
The End • Debuggers are by far not trivial • Parallel debuggers are even more complex • Lots of open (maybe unsolvable) research issues • Interoperability may ease implementation of enhanced functionality