250 likes | 266 Views
Explore debugging best practices, prevention strategies, and specific tools for detecting bugs in parallel programs on systems. Learn how to manage parallel processes efficiently.
E N D
Debugging and Validation Tools on Parallel Systems 2012 | Bernd Mohr Institute for Advanced Simulation (IAS) Jülich Supercomputing Centre (JSC)
Debugging of Parallel Programs • Totalview (Rogue Wave) • DDT (Allinea) • Marmot • Stat
Avoiding Bugs • Careful design and coding is the best way to avoid bugs! • Almost impossible to recover from a bad initial design • without starting again from scratch • Clear source code structure & comments • More complex code requires more comprehensive comments • Straightforward code generally requires no additional comments • Good comments in source code help maintainability • Poor or invalid comments indicate danger • Regular testing is the best way to catch bugs early! • Assertions verify expected consistency • A verbose/logging mode can help follow execution trail • Unit tests validate distinct functionality or operations • Bugs hide in the code/cases that aren't tested!
Serial Bugs • Also manifest in parallel applications • but often reproducible with single process or thread • Compiler (or lint) warnings indicate uncertainties • often symptomatic of unsafe/unportable code • Compilers can automatically insert run-time checks • use of uninitialized variables • null-pointer dereferencing • indices out of bounds • floating-point exceptions • … check your compiler manual for details • Specific tools for memory/heap errors • including leaks, use after free, corruption • e.g., memcheck/valgrind, Insure++, Purify
Parallel Bugs • Multi-threading: race conditions, deadlocks, etc. • e.g., Intel Thread Analysis Tool, Oracle SS Thread Analyzer • additional run-time checks of OpenMP/POSIX lock usage • often limited scalability • often report false positives (which can be ignored/filtered) • sometimes have false negatives (errors missed due to timing) • Message-passing: incorrect/inconsistent arguments, datatype matching, resource/buffer usage, deadlocks, etc. • e.g., Intel MPI Checker, Marmot, Umpire, MUST • additional run-time checks (local & global) • often limited scalability • extra moderator processes can change execution behaviour • sometimes can miss potential deadlocks (due to timing) • identifies unsafe/non-compliant MPI usage (portability bugs)
Parallel Debuggers • Multiple instances of serial debuggers: e.g., dbx, gdb, idb • manually attach to processes of interest in separate windows • type examination/control commands in each window • Lightweight parallel debuggers: e.g., Guard, STAT • produce condensed aggregated reports of where MPI processes have failed and their state at that point • allows considerable scalability and low overhead • Full-featured parallel debuggers: e.g., DDT, TotalView • provide complete control of parallel executions • individual processes/threads or (dynamic) groups thereof • comprehensive examination of state at breakpoints • Individually or collectively • can attach to specific processes • recently have demonstrated significant scalability
Parallel Debugging • Complicated by multiple processes & threads • which need to be managed and monitored as they execute • and which may execute differently each time due toinherent non-determinism • making it difficult to reproduce consistently • To make debugging easier, try to reproduce the buggy execution with as few threads and processes as possible • serial executions are (more) deterministic • debugging takes time and consumes resources • debug runs will be slower than otherwise • single-stepping line-by-line can be very slow • deadlocks (and livelocks) will never terminate!
Making Debugging Easier • Debugging without symbols is hard! • Compile & link with “-g” to include symbolic information • good compilers won't disable optimization, though they may not be able to produce complete symbolic infomation • Debugging optimized code is even harder! • Optimized code may bear little resemblance to the source • instructions will be added/removed/substituted/rearranged • Use the lowest optimization level which reproduces the bug • Sometimes the compiler/optimizer itself is buggy! • Debugging compilers or MPI libraries is no fun at all! • Try reproducing bug with a different compiler or MPI • including older/newer versions • Just because a bug isn't reproducible with another compiler doesn't guarantee that the bug is not in your source code!
Debugging of Parallel Programs • Totalview (Rogue Wave) • DDT (Allinea) • Marmot • Stat
Parallel Debugger • UNIX Symbolic Debuggerfor C, C++, f77, f90, PGI HPF, assembler programs • „Standard” debugger • Special, non-traditional features • Multi-process and multi-threaded • C++ support (templates, inheritance, inline functions) • F90 support (user types, pointers, modules) • 1D + 2D Array Data visualization • Support for parallel debugging (MPI: automatic attach, message queues, OpenMP, pthreads) • Scripting and batch debugging • Memory Debugging • Reverse Debugging with ReplayEngine • http://www.totalviewtech.com
TotalView: Startup 1. Select Toolbar "Parallel" 2. Select MPI for your system 3. Select desired number of tasks
TotalView: Main Window Toolbar forcommon options Stack trace Local variables for selected stack frame Break points Source code window
Totalview: Non-standard Features • Call graph • Data visualization • Message queue graph
Totalview: Batch Debugging ****************************************** * TotalView Debugger Script Log File * * Date: 11-26-2009_17:11:33 * Target: ./hm * Actionpoint/Action Directives: * 10 => print myrank ****************************************** Running target hm !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! Print ! ! Process: ./hm (Debugger Process ID: 1) ! Thread: Debugger ID: 1.1 ! Rank: 0 ! Time Stamp: 11-26-2009 17:11:35 ! Triggered from event: actionpoint ! Results: ! myrank = 0x00000000 (0) ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Same output for other 3 ranks ... • % mpicc -g hello-mpi.c -o hm • In interactive sessionor batch script: • % tvscript -mpi <MPI> –np 4 -create_actionpoint 'main#10=>print myrank' ./hm • % less hm-<date/time>.slog
DDT Parallel Debugger • UNIX Graphical Debugger for C, C++, f77, f90 programs • Modern, easy-to-use debugger • Special, non-traditional features • Multi-process and multi-threaded • 1D + 2D Array Data visualization • Support for MPI parallel debugging(automatic attach, message queues) • Support for OpenMP (Version 2.x and later) • Job submission from within debugger • http://www.allinea.com
DDT: JUROPA Startup Check: MPI otherwise "Change" Select desired number of tasks
DDT: JUROPA MPI Setup Under "System" choose MPI Under "Job Submission" enter specifics of your batch system
DDT: Main Window Process controls Process groups Variables Sourcecode Expressionevaluator Stacktrace
DDT: Non-standard Features • Message queue graph • Multi-Dimensional Array Viewer • Memory Usage
Marmot • MPI correctness and portability checker • http://www.hlrs.de/organization/av/amt/projects/marmot/ • Marmot reports • Errors: violations of the MPI-standard • Warnings: unusual behavior or possible problems • Notes: harmless but remarkable behavior • Also: deadlock detection • Usage • Compile with marmotcc, marmotcxx, marmotf90 • Run your application with one additional process • See report as plain text file, HTML, or as cube report
STAT: Aggregating Stack Traces for Debugging • Existing debuggers don’t scale • Inherent limits in the approaches • Need for new, scalable methodologies • Need to pre-analyze and reduce data • Fast tools to gather state • Help select nodes to runconventional debuggers on • Scalable tool: STAT • Stack Trace Analysis Tool • Goal: Identify equivalence classes • Hierarchical and distributed aggregation of stack traces from all tasks • Stack trace merge <1s from 200K+ cores • (Project by LLNL, UW, UNM)
3D-Trace Space/Time Analysis Appl Appl … Appl … Appl Appl
Scalable Representation 288 Nodes / 10 Snapshots