1 / 25

Debugging and Validation Tools on Parallel Systems

Explore debugging best practices, prevention strategies, and specific tools for detecting bugs in parallel programs on systems. Learn how to manage parallel processes efficiently.

watsonr
Download Presentation

Debugging and Validation Tools on Parallel Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Debugging and Validation Tools on Parallel Systems 2012 | Bernd Mohr Institute for Advanced Simulation (IAS) Jülich Supercomputing Centre (JSC)

  2. Debugging of Parallel Programs • Totalview (Rogue Wave) • DDT (Allinea) • Marmot • Stat

  3. Avoiding Bugs • Careful design and coding is the best way to avoid bugs! • Almost impossible to recover from a bad initial design • without starting again from scratch • Clear source code structure & comments • More complex code requires more comprehensive comments • Straightforward code generally requires no additional comments • Good comments in source code help maintainability • Poor or invalid comments indicate danger • Regular testing is the best way to catch bugs early! • Assertions verify expected consistency • A verbose/logging mode can help follow execution trail • Unit tests validate distinct functionality or operations • Bugs hide in the code/cases that aren't tested!

  4. Serial Bugs • Also manifest in parallel applications • but often reproducible with single process or thread • Compiler (or lint) warnings indicate uncertainties • often symptomatic of unsafe/unportable code • Compilers can automatically insert run-time checks • use of uninitialized variables • null-pointer dereferencing • indices out of bounds • floating-point exceptions • … check your compiler manual for details • Specific tools for memory/heap errors • including leaks, use after free, corruption • e.g., memcheck/valgrind, Insure++, Purify

  5. Parallel Bugs • Multi-threading: race conditions, deadlocks, etc. • e.g., Intel Thread Analysis Tool, Oracle SS Thread Analyzer • additional run-time checks of OpenMP/POSIX lock usage • often limited scalability • often report false positives (which can be ignored/filtered) • sometimes have false negatives (errors missed due to timing) • Message-passing: incorrect/inconsistent arguments, datatype matching, resource/buffer usage, deadlocks, etc. • e.g., Intel MPI Checker, Marmot, Umpire, MUST • additional run-time checks (local & global) • often limited scalability • extra moderator processes can change execution behaviour • sometimes can miss potential deadlocks (due to timing) • identifies unsafe/non-compliant MPI usage (portability bugs)

  6. Parallel Debuggers • Multiple instances of serial debuggers: e.g., dbx, gdb, idb • manually attach to processes of interest in separate windows • type examination/control commands in each window • Lightweight parallel debuggers: e.g., Guard, STAT • produce condensed aggregated reports of where MPI processes have failed and their state at that point • allows considerable scalability and low overhead • Full-featured parallel debuggers: e.g., DDT, TotalView • provide complete control of parallel executions • individual processes/threads or (dynamic) groups thereof • comprehensive examination of state at breakpoints • Individually or collectively • can attach to specific processes • recently have demonstrated significant scalability

  7. Parallel Debugging • Complicated by multiple processes & threads • which need to be managed and monitored as they execute • and which may execute differently each time due toinherent non-determinism • making it difficult to reproduce consistently • To make debugging easier, try to reproduce the buggy execution with as few threads and processes as possible • serial executions are (more) deterministic • debugging takes time and consumes resources • debug runs will be slower than otherwise • single-stepping line-by-line can be very slow • deadlocks (and livelocks) will never terminate!

  8. Making Debugging Easier • Debugging without symbols is hard! • Compile & link with “-g” to include symbolic information • good compilers won't disable optimization, though they may not be able to produce complete symbolic infomation • Debugging optimized code is even harder! • Optimized code may bear little resemblance to the source • instructions will be added/removed/substituted/rearranged • Use the lowest optimization level which reproduces the bug • Sometimes the compiler/optimizer itself is buggy! • Debugging compilers or MPI libraries is no fun at all! • Try reproducing bug with a different compiler or MPI • including older/newer versions • Just because a bug isn't reproducible with another compiler doesn't guarantee that the bug is not in your source code!

  9. Debugging of Parallel Programs • Totalview (Rogue Wave) • DDT (Allinea) • Marmot • Stat

  10. Parallel Debugger • UNIX Symbolic Debuggerfor C, C++, f77, f90, PGI HPF, assembler programs • „Standard” debugger • Special, non-traditional features • Multi-process and multi-threaded • C++ support (templates, inheritance, inline functions) • F90 support (user types, pointers, modules) • 1D + 2D Array Data visualization • Support for parallel debugging (MPI: automatic attach, message queues, OpenMP, pthreads) • Scripting and batch debugging • Memory Debugging • Reverse Debugging with ReplayEngine • http://www.totalviewtech.com

  11. TotalView: Startup 1. Select Toolbar "Parallel" 2. Select MPI for your system 3. Select desired number of tasks

  12. TotalView: Main Window Toolbar forcommon options Stack trace Local variables for selected stack frame Break points Source code window

  13. Totalview: Non-standard Features • Call graph • Data visualization • Message queue graph

  14. Totalview: Batch Debugging ****************************************** * TotalView Debugger Script Log File * * Date: 11-26-2009_17:11:33 * Target: ./hm * Actionpoint/Action Directives: * 10 => print myrank ****************************************** Running target hm !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! Print ! ! Process: ./hm (Debugger Process ID: 1) ! Thread: Debugger ID: 1.1 ! Rank: 0 ! Time Stamp: 11-26-2009 17:11:35 ! Triggered from event: actionpoint ! Results: ! myrank = 0x00000000 (0) ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Same output for other 3 ranks ... • % mpicc -g hello-mpi.c -o hm • In interactive sessionor batch script: • % tvscript -mpi <MPI> –np 4 -create_actionpoint 'main#10=>print myrank' ./hm • % less hm-<date/time>.slog

  15. DDT Parallel Debugger • UNIX Graphical Debugger for C, C++, f77, f90 programs • Modern, easy-to-use debugger • Special, non-traditional features • Multi-process and multi-threaded • 1D + 2D Array Data visualization • Support for MPI parallel debugging(automatic attach, message queues) • Support for OpenMP (Version 2.x and later) • Job submission from within debugger • http://www.allinea.com

  16. DDT: JUROPA Startup Check: MPI otherwise "Change" Select desired number of tasks

  17. DDT: JUROPA MPI Setup Under "System" choose MPI Under "Job Submission" enter specifics of your batch system

  18. DDT: Main Window Process controls Process groups Variables Sourcecode Expressionevaluator Stacktrace

  19. DDT: Non-standard Features • Message queue graph • Multi-Dimensional Array Viewer • Memory Usage

  20. Marmot • MPI correctness and portability checker • http://www.hlrs.de/organization/av/amt/projects/marmot/ • Marmot reports • Errors: violations of the MPI-standard • Warnings: unusual behavior or possible problems • Notes: harmless but remarkable behavior • Also: deadlock detection • Usage • Compile with marmotcc, marmotcxx, marmotf90 • Run your application with one additional process • See report as plain text file, HTML, or as cube report

  21. Marmot HTML Output Example

  22. STAT: Aggregating Stack Traces for Debugging • Existing debuggers don’t scale • Inherent limits in the approaches • Need for new, scalable methodologies • Need to pre-analyze and reduce data • Fast tools to gather state • Help select nodes to runconventional debuggers on • Scalable tool: STAT • Stack Trace Analysis Tool • Goal: Identify equivalence classes • Hierarchical and distributed aggregation of stack traces from all tasks • Stack trace merge <1s from 200K+ cores • (Project by LLNL, UW, UNM)

  23. Distinguishing Behavior with Stack Traces

  24. 3D-Trace Space/Time Analysis Appl Appl … Appl … Appl Appl

  25. Scalable Representation 288 Nodes / 10 Snapshots

More Related