140 likes | 155 Views
This paper discusses the importance of debugging parallel programs and introduces Millipede, a multi-level interactive parallel debugger. Examples, implementation details, and the benefits of using sequential tools for debugging parallel code are presented.
E N D
Sequential Debugging of Parallel Message Passing Programs Using Millipede Jan Bækgaard Pedersen Alan Wagner Department of Computer Science The University of British Columbia Vancouver, BC, Canada Parallel Computation Lab University of British Columbia
Overview • Importance of debugging • Sequential debugging • Using sequential tools to debug parallel programs • Millipede • Examples • Implementation • Conclusion Parallel Computation Lab University of British Columbia
How Important Is It To Debug? • As much time is spent on debugging as on writing the code [Pancake] • 35-90% of parallel programmers still only use print statements for debugging [Pancake] • Some reasons for tools not being widely used • Lack of focus • Information overload • Learning to use new tools with GUI Parallel Computation Lab University of British Columbia
Sequential Program Pointer Errors Variable Inspection Break Points Memory leaks Stack Trace Sequential Debugging Sequential Debugging Tool Parallel Computation Lab University of British Columbia
Debugging straight line code Extract Use a sequential tool to debug The sequential code! • Exploit existing sequential tools • Well known • Trusted • Larger selection Sequential Tools In Parallel Environments Parallel Debugging Message debugging Debugging straight line code Protocol debugging Visualization Parallel Computation Lab University of British Columbia
Millipede MultiLevelInteractive Parallel Debugger Solution: Debugging Parallel Programs Use a tool that is tailored to the specific debugging task • Sequential tool to debug sequential code. • Other tools to debug • Message passing errors • Deadlocks • Message content • Protocol errors • Overall performance Parallel Computation Lab University of British Columbia
Communication Visualization Module Graphical view of the message passing / protocol. Detect and analyze deadlocks And report the cause and fix Deadlock Detection & Correction Module Comm. Protocol Verification Module Online verification of the comm. protocol while running Message Debugging Module Inspect, control and change Contents of messages Sequential Debugging Module Debugging of the sequential code of the parallel program Millipede Parallel Computation Lab University of British Columbia
The Sequential Debugging Module • How ? • Recompile the program with the –DMILLIPEDE flag • Set the environment variable MILLIPEDE_RCM • Millipede collects message information and stores it in log files for each process • Run the program the normal way • Messages sent and recorded • Set the environment variable MILLIPEDE_REM • Debug any of the processes using any sequential tool • Millipede intercepts all message passing calls and supplies the process with messages from the log file Parallel Computation Lab University of British Columbia
Example 1 Example Code: pvm_upkint(&nproc, 1, 1); pvm_upkint(&n, 1, 1); . e = n % nproc; . If nproc = 0 Division by zero Program crashes / disappears With many processes running and one or more processes crashing it is hard to resolve why they crashed. Parallel Computation Lab University of British Columbia
Example 1 (continued) Debugging using any sequential debugger: (1) gcc –g –DMILLIPEDE –o pgm pgm.c –lpvm3 (2) setenv MILLIPEDE_RCM (3) pgm (4) unsetenv MILLIPEDE_RCM; setenv MILLIPEDE_REM (5) gdb pgm Replay filename: MILLIPEDE_RPF-pgm-262152 . (gdb) step 45 e = n % nproc Program received signal SPGPFE Arithmetic exception in () (gdb) Parallel Computation Lab University of British Columbia
Example 2 Example Code: x = calloc(node,sizeof(double)); y = calloc(node,sizeof(double)); . for (i=1; i<=nodes; i++) x[i] = …..; Memory leak • Segmentation fault • Wrong result Some of the processes might crash, some might compute an result. Parallel Computation Lab University of British Columbia
Example 2 (continued) Using Purify: • ABW: Array Bounds Write. This is occurring while in: • main [wave_slave.c:57] • for (i=1; i<=nodes;i++) • x[i] = ……; Writing 8 bytes to 0xdc630 in the heap. Address 0xdc630 is 1 byte past the end of malloc’d block at 0xdc5a8 of 136 bytes. This block was allocated from: main [wave_slave.c:50] x = calloc(node,sizeof(double)) Parallel Computation Lab University of British Columbia
RCM REM pvm_receive(…) pvm_send(…) • Read log • Write log • Call PVM _PVM_send(…) All pvm_xxx calls are replaced by Millipede versions, which in turn will call the real PVM functions (renamed _PVM_xxx) Implementation Application Program Millipede Millipede Log file PVM Parallel Computation Lab University of British Columbia
Millipede allows extraction of any sequential process from a • parallel system • Millipede enables the programmer to use any sequential tool • for debugging / performance tuning on the extracted process • Millipede supports multi level debugging: • Message debugging • Deadlock detection / correction • Protocol verification Conclusion Parallel Computation Lab University of British Columbia