240 likes | 390 Views
Parallel Debugging with TotalView on Blue Horizon. Yifeng Cui and Laura C. Carrington yfcui@sdsc.edu San Diego Supercomputing Center. Overview. Parallel Debugging What is Totalview Availability of Totalview How to compile for TotalView How to run TotalView on Blue Horizon
E N D
Parallel Debugging with TotalView on Blue Horizon Yifeng Cui and Laura C. Carrington yfcui@sdsc.edu San Diego Supercomputing Center
Overview • Parallel Debugging • What is Totalview • Availability of Totalview • How to compile for TotalView • How to run TotalView on Blue Horizon • Navigating TotalView • Available Documentation • Lab Session
Parallel Debugging: Why Debugger? • To know where the program crashed • To gain a better understanding of the program what is going on • To know what is the value of a distributed array • Use it as last resort
Parallel Debugging • All problems of serial programming • Plus • Increased difficulty to verify correctness of program • Increased difficulty to debug N parallel processes • New parallel problems: • Deadlock • Race conditions • Irreproducibility
Parallel Debugging: Parallel Debuggers • Most vendor debuggers have some support • Conventional debuggers such as unix dbx/gdb/adb may be little value • Debugging parallel programs is hard but possible • A parallel debugger is expected to be: • Portable across different platforms • See information without entering commands • Debug multiprocess/multithreaded programs • Automatically detects and attaches to running processes
What is Totalview? • Parallel debugger • Source level debugging for C, C++, F77, F90, HPF • MPI, openMP, Pthreads, PVM • SMPs, MPPs, PVPs,Clusters • Available in all major Unix platforms and most supercomputers • GUI (independent of platforms except Cray T3E) • TotalView 4x on Tcl/tk • TotalView 5x on Motif
Availability of Totalview • Compaq Digital Alpha • HP-UX • IBM RS6000 and SP Power • SGI MIPS • Sun Sparc Sun OS 5 • Linux Intel IA32 (Redhat) • Linux Alpha (Redhat) • Cray T3E by Cray • Hitachi SR2201 by sofTek, SR8000 • NEC SX-4 by sofTek, SX-5 beta
How To Compile Just add “-g” flag to compiler: mpxlf90 –g stf_01.f mpcc –g stc_01.c
How to Run TotalView • On a single process: % totalview myprog –a [args] • To debug a IBM POE program: % totalview poe –a myprog [args] • The other way to start Totalview % totalview It brings up the root window, select the file menu then New Program • To start Totalview Command Lines Interface % totalviewcli
How to Run TotalView • Create “runme” script with LoadLeveler information, similar to the following: #! /usr/bin/csh -f setenv MP_RMPOOL 1 setenv MP_TASKS_PER_NODE 2 setenv MP_NODES 1 setenv MP_EUILIB ip setenv MP_EUIDEVICE en0 setenv MP_CPU_USAGE unique setenv MP_SHARED_MEMORY yes setenv MP_NODE_USAGE not_shared totalview poe -a a.out This last line starts Totalview • Launch TotalView by running “runme” script. From the interactive command options: poe a.out –nodes 1–tasks_per_node 2 –rmpool 1 \ -euilib ip–euidevice en0 LoadLeveler Keywords
Navigating TotalView Unattached ProcessesWindow RootWindow ProcessWindow Data Windows
Navigating TotalView: Process Window Process & thread motion buttons Stack Trace pane Local variables for the selected frame Source pane Thread pane Action Points pane
Navigating TotalView: Root Window Process name Process ID Expand list Number of threads Thread listtid/systid Process/thread status: B: BreakpointR: RunningT: StoppedE: Error
Navigating TotalView: Mouse Buttons • Left button is Select: • Chooses an item of interest, or • Starts editing a item • Middle button is Menu: • Raises a menu of actions you can perform • All menus have a Help (^?) entry • Right button is Dive: • Gets more information about an item • Shift+Dive forces open a new window View a menu Select anobject Dive
Navigating TotalView:Center Button Source Panel Use center mouse to pop up menus in all the Windows. Select “Go Group” to start running.
Navigating TotalView: Left Button Gridded box is a possible site for a breakpoint Select to set one Current function and source file Current point of execution Breakpoint
Breakpoints Stops execution of process and threads that reach it Barrier Breakpoints Holds each thread and process that reach it until all threads and processes from the group reach it Evaluation Points Causes code fragment to execute when it is reached Navigating TotalView: Action Points
Navigating TotalView: Right Button “Dive” or view source for a function or subroutine by right clicking on routine name Source panel now Displays code for Routine “do_jacobi” Left click to return to main routine source
Navigating TotalView: Right Button “Dive” or view data by right clicking on array name New Data Window New Data Window of array values. You can edit “Slice” of values viewed. Arrays have a slice field that you can edit to specify the dimensions to display
Navigating TotalView: Right Button New Data Window Use center mouse in Data Window to select “Visualize” menu option that pops up graph of data. Click image with center mouse to rotate image
Navigating TotalView: Customize Totalview Add lines in your .Xdefaults file such as totalview*searchPath: /my/src/dir1,/my/src/dir2 totalview*parallelAttach: {yes | no | ask} totalview*sourcePaneTabWidth:n totalview*font:fontname Visualize*graph.height:height … To Load X resource file: Xrdb –load $HOME/.Xdefaults
Documentation NPACI Blue Horizon documentation http://www.npaci.edu/BlueHorizon NPACI Blue Horizon Tools Page http://www.npaci.edu/BlueHorizon/guide_linked/bh_tools_txt.html Etnus Web Page :Getting Started with TotalView http://www.etnus.com/Products/TotalView/started/getting_started.html Etnus Web Page :TotalView User’s Guide http://www.etnus.com/pub/totalview/tv4.1.0/doc/User_Guide.pdf http://www.etnus.com/Support/docs/online_doc/user_guide/index.html
Lab Session for TotalViewEnvironment Setup Setup for running X-windows applications on PCs: 1. Login to b80login.sdsc.edu using CRT (located in Applications common). 2. Launch Exceed (located in either Applications common or as a shortcut on your desktop called "Humming Bird". 3. set your environment, for csh: setenv DISPLAY t-wolf.sdsc.edu:0.0 ****where "t-wolf" is the name of the PC you are using 4. copy files from Tools_examples directory into your own working space. * create a directory to work with TotalView and Xprofiler: mkdir Tools * change directories into new directory: cd Tools * copy files into new directory: cp /work/Training/Tools_examples/* . NOTE: On a 2-button mouse the center mouse button is done by clicking on both the right and left button together.
Lab Session for TotalViewRunning TotalView 1. Compile either Fortran or C example (st_01) with the following: mpxlf90 -g stf_01.f mpcc -g -lm stc_01.c 2. Launch TotalView using the TotalView_runme script: tf004i% TotalView_runme 3. After script launches, in main frame of the Process window (largest frame of largest window), use center mouse button to select "Go Group" menu item. 4. You will be prompted with the following: (NOTE: this may take a while 2-5 minutes while LoadLever searches for available cpus) "Process poe has started the parallel tasks. Do you want to stop the parallel task before they enter MAIN?" select "Yes" 5. After a few minutes the Process window should show the code. Place a break point, by using left mouse button, after the do_jacobi call in the main loop. 6. Use center mouse button in Process window to select "Go Group" menu item. This will cause the code to run to the break point. Use right mouse button to dive into the "do_jacobi" routine and also the "psi" array. 7. Continue to explore TotalView...when you are done exit TotalView by using center mouse button in the "Root Window" to select "Quit Debugger"