1 / 18

TotalView

TotalView. Strategies for debugging hybrid codes with TotalView on the IBM clusters. Overview of this presentation. Print statements are the easiest form of debugging TotalView functionality and limitations Settings needed to run a job on an IBM cluster (whether you use TotalView or not)

jana
Download Presentation

TotalView

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TotalView Strategies for debugging hybrid codes with TotalView on the IBM clusters

  2. Overview of this presentation • Print statements are the easiest form of debugging • TotalView functionality and limitations • Settings needed to run a job on an IBM cluster (whether you use TotalView or not) • Demonstration of a simple TotalView session • Run the CCSM under TotalView

  3. Print statement debugging • Simple to learn when errors are easy to find • A repeating postmortem operation • Only works when you place the print statements in the right places • Uses system resources each time a job re-runs • No flexibility to explore other parts of the code while job is running

  4. TotalView functionality • Interactive debugger • Debugs serial, threaded, and message-passing codes, and any combination of these • Supports source debugging in Fortran 77, Fortran 90-95, C, and C++ • Interoperates with IBM's Parallel Operating Environment (POE), and with OpenMP/Pthreads

  5. Debugging with TotalView • You can watch variables as the code runs and interactively refine your investigation • You can calculate derived quantities based on data calculated during the run • You can interactively stop the run when you see abnormal events occur • You can set conditions that will stop the run; this helps uncover trouble spots and shortens the time needed to discover the problem

  6. TotalView limitations • Its use is limited to Loadleveler's interactive pool (also called the interactive class or queue) • It has a maximum number of parallel processes that it can run as a job in the interactive pool, (not counting threads in a process): • bluesky: 96 processes • blackforest: 26 processes • babyblue: 48 processes • It is an X application and requires a high-bandwidth network connection

  7. Building applications for debugging sessions Include the following compiler parameters on the command line or in the FFLAGS/CFLAGS statement in the makefile: -g (generates symbol table) -OxUse the minimum optimization that reproduces the problem -qsmp=omp:noopt (if threads are used, don't optimize them) -qfullpath (generates full path representations to sources, objects, and executables in the symbol table so that TotalView can find the sources)

  8. POE runtime resource parameters POE parameters needed before using TotalView: • Resource pool MP_RMPOOL=1 (necessary) • stdout/stderr organizationMP_INFOLEVEL=3 (useful) MP_LABELIO= yes (useful) MP_STDOUTMODE=ordered (can be useful) • Location of executables MP_PGMMODEL=[spmd/mpmd] (spmd is the default) MP_CMDFILE=cmdfile (if mpmd, then this is necessary)

  9. POE runtime resource parameters Node resourcesSpecify two of these three:MP_NODES MP_PROCS MP_TASKS_PER_NODE Or specify a POE-submitted LoadLeveler script, for example task geometryMP_LLFILE=llfile Note: Some things cannot be set with POE parameters, so they must be set using LoadLeveler. MP_LLFILE allows POE to access LoadLeveler.

  10. POE runtime resource parameters Communication path and node usage • On node: MP_SHARED_MEMORY=yes MP_CPU_USE=multiple • Off node: MP_EUIDEVICE=csss MP_ADAPTER_USE=shared MP_EUILIB=ip

  11. Demo: Basic TotalView skills • An MPI-OMP demonstration code • Understanding what TotalView shows you • Root Window • Program Window • Source code • Stack trace • Stack frame • Allows you to view processes and threads to see their asynchronous behavior

  12. Demo: Establishing Action Points Using the source window to establish: • Breakpoints and global barriers • Watch points • Evaluation points • Evaluation at any action point

  13. Demo: Diving into subprograms • How to "dive" in the Frame Stack and the Program Source subwindows • How to find variables and data structures in the Frame Stack subwindow

  14. Demo: Exploring threads and processes with theProgram Window • Tabbing through processes and threads withP and T tabs • Using the Root Window to select processes

  15. Demo: Configuring TotalView for the run • Setting the TotalView search paths • Setting the signal processing environment

  16. TotalView demo using the CCSM This demonstration runs on an "active" model using the current version of the CCSM using 32 processes with threads. The demonstration will trace coupler startup communication, examine land model structures, and use threads.

  17. Defining CCSM3 TotalView machine on babyblue Actions to define a case "dan" and set up a machine "totalview" $ROOTDIR/scripts/create_newcase -case "dan"$ROOTDIR/scripts/dan/addmach totalview$ROOTDIR/scripts/dan/configure -mach totalview Files created or affected- In {$ROOTDIR/scripts/ccsm_utils/Machines} env_mach_pes.totalview run.ibm.totalview batch.ibm.totalview env.totalview

  18. Defining CCSM3 TotalView machine on babyblue (continued) Files created or affected- In $ROOTDIR/models/{utils, bld} Changes to makefiles Changes to Macros.AIX set FFLAGS = -g -qsmp=omp:noopt

More Related