560 likes | 579 Views
Introduction to parallel debugging. John Donners (john.donners@surfsara.nl). SURFsara , Amsterdam, The Netherlands. The age of debugging. Mark II, 1947 Electromechanical computer Built at Harvard University, financed by the navy . About 5 flops. The age of debugging (2).
E N D
Introduction to paralleldebugging John Donners (john.donners@surfsara.nl) SURFsara, Amsterdam, The Netherlands
The age of debugging • Mark II, 1947 • Electromechanical computer • Built at Harvard University,financed by the navy. • About 5 flops
The age of debugging (2) • Moth found trapped between points at Relay # 70, Panel F. • Incident logged with the entry:"First actual case of bug being found". • Coined the term: "debugging a computer program".
A more modern example: Ariane 5 • 10 years development time, costs $7 billion • June 5, 1996: Ariane 5 rocket lifts off from Kourou, French Guyana. • And it explodes 37 seconds after lift-off • Inertial guidance system uses gyroscopes and accelerometers to guide its course. • Conversion of 64-bit floating-point numbers to 16-bit causes an overflow error and shuts down the system. • On-board computer steers off-course based on erroneous input.
Overview • 1300-1315 Introduction explaining a few terms, • 1315-1345 Toolsstrace, ldd, gdb, valgrind, .. • 1345-1400 Debugger commands • 1400-1430 Some excercises • 1430-1500 Coffee break • 1530-1545 Compiler options • 1545-1630 Some more excercises • 1630-1650 Tea Break • 1650-1715 Parallel debugging: padb, DDT/TotalView, OpenMPI • 1715-1800 Excercises finally
Glossary • compiling, linking • object file, shared object, dynamic library, static library • process, thread • stack, heap • stack trace, core dump • runtime environment
Thread A thread of execution is the smallest unit of processing that can be scheduled by an operating system.
Process A process is an instance of a computer program that is being executed. It contains the program code and its current activity. A process may be made up of multiple threads of execution that execute instructions concurrently.
Compiler, object A program that transforms source code written in a programming language into binary form known as an object file. An object is a function or subroutine and its data in binary form. E.g. GCC is the GNU Compiler Collection for C, C++, Fortran, etc.
Linker/loader A program that takes one or more objects (from either object files, static or dynamic libraries) generated by a compiler and combines them into a single executable program. the 'ld' command on unix. Usually, compilers call the linker.
Static library A collection of object files. Only the needed symbols are copied into the executable at the linking stage. Example: .a files on linux
Dynamic library/shared object A collection of object files whose filename is linked into the executable at the linking stage. The dynamic library is loaded (and searched!) at runtime. Example: .so files on Linux or .dll files on Windows
Stack A stack is a data structure that stores information about the active subroutines of a computer program. On entry the function pushes the return address and local variables onto the stack. On exit, it pops the return address and local variables from the stack. In a multi-threaded program each thread has its own stack.
Stack trace / backtrace / traceback The stack trace lists all the functions awaiting return values on the stack at one instant (usually at the time of an error).
Heap When arrays are dynamically allocated, it is taken from the heap. The heap is a pool of unused memory. Example: malloc() in C, ALLOCATE() in Fortran
Core dump An image of the program and all its data at one instant (usually at the time of an error). Core dumps can be read by a debugger for a post-mortem analysis of the program state.
Prevention rather than cure • Comment your code: • Purpose of functions • Use meaningful variable names • Meaning and units of variables • Check your exit codes/return values • Use a version control system (git, ..)
Mars Climate Orbiter • Study the weather, climate andCO2 budget of Mars. $330 million • Intended to orbit Mars 140-150 km above the surface, but it reached as low as 57 km. • The spacecraft was destroyed by atmospheric stresses and friction. • The navigation error arose because the contractors used US units, while the spacecraft expected SI units.
Errors & exception handling • Especially important for C routines • Check return value and errno • malloc returns a NULL pointer if it’s out of memory • Error checking in bash scripts • cd my_working_directory || ( echo “cd failed”; exit 1 ) • Fortran errors are usually fatal, except • I/O routines with optional ERR= or IOSTAT= argument • ALLOCATE with optional STAT= argument
Errors & exception handling (2) • Exception handling in python with a stack trace and exit • try .. except • Errors in MPI are fatal by default, so little need to check exit code of an MPI communication call • MPI-IO is an exception • Error handler can be changed per communicator or window
Explaining your code to someone else (even a teddy bear) is wonderfully effective. Kernighan & Pike (aka ‘rubber duck debugging’)
which:what executable am I running? • If you don't use an absolute path to your executable, the shell executes the first one in your $PATH environment variable. • but are you sure what is your $PATH? • Check the location of your executable with: which program • Add it to your job script to be sure which executable your job used.
ldd: print shared library dependencies donners@p6012:~> ldd /sara/sw/gromacs/4.0.7-sp/bin/mdrun_mpi linux-vdso64.so.1 => (0x0000000000100000) libgslcblas.so.0 => /sara/sw/gsl/1.11/lib/libgslcblas.so.0 (0x0000040000040000) libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00000400000d0000) libz.so.1 => /lib64/libz.so.1 (0x00000400002d0000) libgsl.so.0 => /sara/sw/gsl/1.11/lib/libgsl.so.0 (0x0000040000300000) libm.so.6 => /lib64/power6/libm.so.6 (0x00000400005a0000) libesslsmp.so.1 => /usr/lib64/libesslsmp.so.1 (0x0000040000670000) libxlf90_r.so.1 => /opt/ibmcmp/lib64/libxlf90_r.so.1 (0x0000040002460000) • can be used on dynamic executables and dynamic libraries • libraries are searched in RPATH, $LD_LIBRARY_PATH and system locations (/etc/ld/so.conf)
ltrace / strace: trace library / system calls donners@lisa:~$ strace ./les3d.hybrid write(1, "Loading Flamelet Generated Manifo"..., 38Loading Flamelet Generated Manifolds: ) = 38 getcwd("/home/donners/DEISA/turflame/turflame-openmp3"..., 4096) = 46 stat("/home/donners/DEISA/turflame/turflame-openmp3/FGM_DIFF.dat", {st_mode=S_IFREG|0644, st_size=102255881, ...}) = 0 getcwd("/home/donners/DEISA/turflame/turflame-openmp3"..., 4096) = 46 open("/home/donners/DEISA/turflame/turflame-openmp3/FGM_DIFF.dat", O_RDWR|O_CREAT, 0666) = 23 + Useful for I/O or network-related issues. - Output can be overwhelming. Redirect output and search for last output from program, or filter some functions
pstack: print stack trace of a running process [donners@tcn559 ~]$ pstack 12614 #0 0x00002ada1eb4ae99 in mkl_lapack_ps_avx2_dgtts2 () from /hpc/eb/RHEL/imkl/11.3.3.210-iimpi-2016b/mkl/lib/intel64/libmkl_avx2.so #1 0x00002ada13a5d8d4 in mkl_lapack_xdgttrs () from /hpc/eb/RHEL/imkl/11.3.3.210-iimpi-2016b/mkl/lib/intel64/libmkl_core.so #2 0x00002ada11a45ec8 in dgttrs_ () from /hpc/eb/RHEL/imkl/11.3.3.210-iimpi-2016b/mkl/lib/intel64/libmkl_intel_lp64.so #3 0x0000000000593f21 in solveimpeqnupdate_x_ () #4 0x000000000059bd71 in implicitandupdatevx_ () #5 0x00000000005a88eb in timemarcher_ () #6 0x00000000005928cf in MAIN__ () #7 0x000000000040c3ce in main () • Useful if your program hangs (or it seems to..)
addr2line: convert address into source code location ... *** glibc detected *** ./a.out: double free or corruption (fasttop): 0x0938e008 *** ======= Backtrace: ========= /lib/tls/i686/cmov/libc.so.6[0xf0f0d1] /lib/tls/i686/cmov/libc.so.6(cfree+0x6d)[0xf138ad] ./a.out[0x80485ea] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0xebab56] ... > addr2line -e ./a.out 0x80485ea /home/donners/src/program.c:51 • Can be useful if your program crashed after a day of running without a core file, but with a stack trace • addr2line often points to the line after the error!
nm: list symbols donners@p6012:~/DEISA/turflame/turflame-openmp3> nm pois.o U __domaindecomposition_NMOD_usedomain 0000000000000000 D __multigridpoissonsolver_NMOD__&&_multigridpoissonsolver 0000000000000048 D __multigridpoissonsolver_NMOD_maxvalue 0000000000000018 D __multigridpoissonsolver_NMOD_pois 00000000000000a8 D __multigridpoissonsolver_NMOD_pplus U _xlfBeginIO U _xlfEndIO U mpi_allreduce • works on object files, executables, libraries • Gives you the addresses relative to the start of the file.
valgrind:Memcheck tool Checks for: • Use of uninitialized memory • Malloc Errors: • Usage of free‘d memory • Double free • Reading/writing past malloced memory • Lost memory pointers • Mismatched malloc/new & free/delete • Stack write errors • Overlapping arguments to system functions like memcpy.
valgrind: example testing binary file i/o GZIOOP: input file open error Could not open binary input file binary output file open ==19296== Conditional jump or move depends on uninitialised value(s) ==19296== at 0x4044B75: gzwrite (in /lib/libz.so.1.2.3.3) ==19296== by 0x8049B3C: gzputs_ (ftn_gzio.c:136) ==19296== by 0x80492E4: gzbwrite_ (gzbwrite.F:31) ==19296== by 0x8048EA8: MAIN__ (test.f:21) ==19296== by 0x8049CCA: main (in /home/donnerslocal/libfgz/libfgz-0.3/test) ==19296== Uninitialised value was created by a stack allocation ==19296== at 0x8048B67: MAIN__ (test.f:1)
Using a debugger • Pretty much all programming languages have a debugger: C, Fortran, python, R, matlab • All debuggers have the same basic features: • Run your program line-by-line (“stepping”) • Set breakpoints on specific locations, for specific conditions • Examine stack, variables • Change variables, code • gdb has lots of options, but it is command-driven • DDT, TotalView: helpful GUI, fully parallel debugging, MPI+OpenMP+CUDA support • reverse debugging (TotalView), GUI/good parallel data view (DDT)
gdb: some useful commands start start program and pause on first executable line step step until the next source code line next step, but skips over routine calls continue continue program run run the code until breakpoint or completion finishrun until the end of the current function break address set breakpoint at line or address delete breaknumberremove breakpoint info breakpoints list current breakpoints list line or function list source code backtraceprint all stack frames frame nr change to another stack frame print print variables, expressions display display variables, expressions after each cmd dprintfloc,fmt,varadd a ‘dynamic printf’
GDB: useful commands help command get help on a command [return] repeat the last command (useful for stepping) attach pidattach to a running process set var=value change variables info threads list all threads thread nr change to thread nr quit
Some features of DDT & TotalView • Breakpoints • Can be added by clicking before the line number. • Right-clicking gives a context sensitive menu to add conditions, etc. • evalpoint (TotalView) and tracepoint (DDT) is the ‘dynamic printf’ from GDB • Parallel backtrace • DDT shows this by default, TotalViewhas ‘Parallel backtrace view’ and ‘Call graph’ under ‘Tools’ • MPI message queue & graph • Data exploration • Hover over a variable to see the value of a variable • Select an expression & hover to evaluate it • DDT has ‘sparklines’ • Double-click to see more details, across procs/threads, statistics, plots • Filters, e.g. <0
How to get unbuffered output • “My output seems cut off.” “My program does nothing.” • Output is normally buffered for better performance. Buffering can be disabled. • Fortran environment variables • Intel: FORT_BUFFERED=no • GNU: GFORTRAN_UNBUFFERED_ALL=Y • IBM XL: XLFRTEOPTS=buffering=disable_all • C • #include <stdio.h> • setvbuf(stdout, NULL, _IONBF, 0) • OpenMPI: OMPI_MCA_orte_base_help_aggregate=0
Howtoget a core dump • A core dump is generated at the moment a program is aborted. It can be analysed post-mortem using a debugger (gdb, TotalView or DDT) • Only useful if compiled with debug information. • However, on many systems the maximum size of a core dump is zero by default. Change the limit before running to get a core dump: • bash: ulimit -c unlimited • csh: limit coredumpsizeunlimited • Intel Fortran compiler option to generate a core dump when a runtime error occurs: • Intel: FOR_DUMP_CORE_FILE=TRUE
Debugging (optimized) code • Debug flag: -g • Debug information can be combined with optimization. However, the shortcuts taken by optimized code will produce surprising results: • some variables you declared may not exist at all; • flow of control may move where you did not expect it; • some statements may not be executed because they compute constant results or their values were already at hand; • some statements may execute in different places because they were moved out of loops.
Floating-point exceptions • are usually not trapped for performance reasons. • But you might want to check if your program produces division-by-zeros, overflow, invalid operations or underflow. • Don't check for 'imprecise' operations, since virtually all operations are imprecise.
Floating-point exceptions (Fortran) • Intel: -fpe0 -traceback • GNU: -ffpe-trap=zero,invalid,overflow • PGI: -Ktrap=divz,inv,ovf / -Ktrap=fp (no traceback) • IBM: -qflttrap=en:zero:ov:inv
Floating-point exceptions (C) • #define _GNU_SOURCE#include <fenv.h>err=feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW ); • Link with –lm • Some compilers also have flags: -fp-trap=ov (Intel), -Ktrap (PGI)
Array out-of-bounds error • If you read or write data outside the array bounds, your program may crash, run correctly or give unexpected or incorrect results. • The (Fortran) compiler can put in automatic checks every time an array is read or written. • An error message when a subscript is out-of-range. • Program will run several times slower. • Not so useful if your code is Fortran 77 • IBM/Intel/PathScale: -C • PGI: -Mbounds • GNU: -fcheck=bounds (-fbounds-check before 4.5.0)
Uninitialized variables • If arrays or variables are not initialized, starting conditions might be random, which is usually undesirable. • Some compilers can automatically initialize arrays with signallingNaNs. • In combination with floating-point exception trapping, your program will fail if a variable is not initialized correctly. • Intel Fortran: -init=snan,array -g -traceback • GNU Fortran: -finit-real=inf -finit-integer=99999 (not for allocatables) • IBM C/Fortran: -qinitauto=FF
Parallel debugging • Cross-process and cross-thread comparison • E.g. does each process have the same value of a variable (e.g. a global parameter)? • Easy access to complex data structures, e.g. multi-dimensional arrays • Easy navigation between processes, threads, frames and data • MPI message queue • Parallel stack trace • Integrated memory debugging