180 likes | 435 Views
TotalView. Strategies for debugging hybrid codes with TotalView on the IBM clusters. Overview of this presentation. Print statements are the easiest form of debugging TotalView functionality and limitations Settings needed to run a job on an IBM cluster (whether you use TotalView or not)
E N D
TotalView Strategies for debugging hybrid codes with TotalView on the IBM clusters
Overview of this presentation • Print statements are the easiest form of debugging • TotalView functionality and limitations • Settings needed to run a job on an IBM cluster (whether you use TotalView or not) • Demonstration of a simple TotalView session • Run the CCSM under TotalView
Print statement debugging • Simple to learn when errors are easy to find • A repeating postmortem operation • Only works when you place the print statements in the right places • Uses system resources each time a job re-runs • No flexibility to explore other parts of the code while job is running
TotalView functionality • Interactive debugger • Debugs serial, threaded, and message-passing codes, and any combination of these • Supports source debugging in Fortran 77, Fortran 90-95, C, and C++ • Interoperates with IBM's Parallel Operating Environment (POE), and with OpenMP/Pthreads
Debugging with TotalView • You can watch variables as the code runs and interactively refine your investigation • You can calculate derived quantities based on data calculated during the run • You can interactively stop the run when you see abnormal events occur • You can set conditions that will stop the run; this helps uncover trouble spots and shortens the time needed to discover the problem
TotalView limitations • Its use is limited to Loadleveler's interactive pool (also called the interactive class or queue) • It has a maximum number of parallel processes that it can run as a job in the interactive pool, (not counting threads in a process): • bluesky: 96 processes • blackforest: 26 processes • babyblue: 48 processes • It is an X application and requires a high-bandwidth network connection
Building applications for debugging sessions Include the following compiler parameters on the command line or in the FFLAGS/CFLAGS statement in the makefile: -g (generates symbol table) -OxUse the minimum optimization that reproduces the problem -qsmp=omp:noopt (if threads are used, don't optimize them) -qfullpath (generates full path representations to sources, objects, and executables in the symbol table so that TotalView can find the sources)
POE runtime resource parameters POE parameters needed before using TotalView: • Resource pool MP_RMPOOL=1 (necessary) • stdout/stderr organizationMP_INFOLEVEL=3 (useful) MP_LABELIO= yes (useful) MP_STDOUTMODE=ordered (can be useful) • Location of executables MP_PGMMODEL=[spmd/mpmd] (spmd is the default) MP_CMDFILE=cmdfile (if mpmd, then this is necessary)
POE runtime resource parameters Node resourcesSpecify two of these three:MP_NODES MP_PROCS MP_TASKS_PER_NODE Or specify a POE-submitted LoadLeveler script, for example task geometryMP_LLFILE=llfile Note: Some things cannot be set with POE parameters, so they must be set using LoadLeveler. MP_LLFILE allows POE to access LoadLeveler.
POE runtime resource parameters Communication path and node usage • On node: MP_SHARED_MEMORY=yes MP_CPU_USE=multiple • Off node: MP_EUIDEVICE=csss MP_ADAPTER_USE=shared MP_EUILIB=ip
Demo: Basic TotalView skills • An MPI-OMP demonstration code • Understanding what TotalView shows you • Root Window • Program Window • Source code • Stack trace • Stack frame • Allows you to view processes and threads to see their asynchronous behavior
Demo: Establishing Action Points Using the source window to establish: • Breakpoints and global barriers • Watch points • Evaluation points • Evaluation at any action point
Demo: Diving into subprograms • How to "dive" in the Frame Stack and the Program Source subwindows • How to find variables and data structures in the Frame Stack subwindow
Demo: Exploring threads and processes with theProgram Window • Tabbing through processes and threads withP and T tabs • Using the Root Window to select processes
Demo: Configuring TotalView for the run • Setting the TotalView search paths • Setting the signal processing environment
TotalView demo using the CCSM This demonstration runs on an "active" model using the current version of the CCSM using 32 processes with threads. The demonstration will trace coupler startup communication, examine land model structures, and use threads.
Defining CCSM3 TotalView machine on babyblue Actions to define a case "dan" and set up a machine "totalview" $ROOTDIR/scripts/create_newcase -case "dan"$ROOTDIR/scripts/dan/addmach totalview$ROOTDIR/scripts/dan/configure -mach totalview Files created or affected- In {$ROOTDIR/scripts/ccsm_utils/Machines} env_mach_pes.totalview run.ibm.totalview batch.ibm.totalview env.totalview
Defining CCSM3 TotalView machine on babyblue (continued) Files created or affected- In $ROOTDIR/models/{utils, bld} Changes to makefiles Changes to Macros.AIX set FFLAGS = -g -qsmp=omp:noopt