130 likes | 281 Views
A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group kantypas@lbl.gov NERSC User Group Meeting September 17, 2007. Outline. Parallel debugger usage at NERSC Comparison of Totalview and Allinea DDT Selecting a parallel debugger for NERSC: Allinea DDT Functionality
E N D
A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group kantypas@lbl.gov NERSC User Group Meeting September 17, 2007
Outline • Parallel debugger usage at NERSC • Comparison of Totalview and Allinea DDT • Selecting a parallel debugger for NERSC: Allinea DDT • Functionality • License model and Price • Current Status • Acceptance Testing • User availability NERSC User Group Meeting, September 17, 2007
Motivation Since parallel debuggers are valuable, yet expensive tools for HPC centers, survey actual debugger usage at NERSC on Seaborg and Bassi to see if resources can be better optimized. NERSC User Group Meeting, September 17, 2007
Totalview Usage on Seaborg and Bassi Number of times users have run Totalview on Bassi in the past 18 months Number of times users have run Totalview on Seaborg in the past year 27 Users ran Totalview fewer than 5 times 23 Users ran Totalview between 10 and 25 times Number of times Number of times NERSC User Group Meeting, September 17, 2007
Totalview usage • Very roughly ~15-20 % of active users have run Totalview • Functionality requested is basic • Find cause for crashes and code hangs • Examine variables across processors • Users typically aren’t using Totalview for analysis • Users are running at lower concurrencies than we expected • Many users debug codes locally and run in production mode at NERSC • In many codes an error at 512 processors can be detected at 32 processors. • Totalview runs interactively and users must wait a longer time for more nodes • Debuggers can run slowly at 256 + processors • Rarely were all licenses checked out NERSC User Group Meeting, September 17, 2007
Another Debugger in the Market: Allinea Software’s DDT • DDT (Distributed Debugging Tool) • Some HPC Customers • Lawrence Livermore National Lab (LLNL) • Texas Advanced Computing Center (TACC) • Barcelona Supercomputing Center (BSC) • Leibniz Computing Center (LRZ) • HPC Center Stuttgart (HLRS) • CEA, IPGP, ONERA - France • CINECA, CASPUR - Italy • AWE, RAL - UK • Spring 2007 tested DDT on NERSC platforms • Low learning curve for Totalview users • Basic debugging functionality worked as expected • Found some bugs, all on AIX • Responsive developers • Viable alternative to Totalview • Created an RFP to get best response from vendors NERSC User Group Meeting, September 17, 2007
Weighing the Debuggers ... Totalview Allinea DDT • Established company and technology with large market share • Totalview debugger ported to most platforms and tested on many codes • Full featured parallel debugger with advanced features such as debugging with multiple executables, GAS languages, sophisticated analysis tools • Inflexible license server model • Expensive • Younger company, established market in Europe but smaller American presence • Basic Parallel Debugging functionality • Linux strongest supported operating system. (Increasing support for AIX) • Responsive developers • Flexible license model • Lower price NERSC User Group Meeting, September 17, 2007
DDT Licensing Model and Price • Flexible model • 1024 processors • Can be divided any way • One 1024 processor job • Two 512 processor jobs • One 512, one 256, four 64 processor jobs • Significantly cheaper than Totalview NERSC User Group Meeting, September 17, 2007
DDT Functionality • Parallel Debugger • Support for MPI, OpenMP, pthreads • Fortran, C, C++ • Typical serial debugging features • set breakpoints and watches, step through program, dive into arrays, evaluate expressions, analyze core files • Parallel debugging features • Step through processors • View variables across processors • Grouping processors Parallel Stack View • Other Features • Memory Debugging • Visualization Tools NERSC User Group Meeting, September 17, 2007
User Interface NERSC User Group Meeting, September 17, 2007
8 8 8 8 8 8 2 Parallel Stack View • Allows user to see position of each processor in the code in the same window • Essentially groups processors by location in code -- only reasonable strategy at high concurrencies • Easily can find stray processor • Can create sub-groups of processors NERSC User Group Meeting, September 17, 2007
Current Status • Acceptance Testing DDT on Franklin • Running 5-6 codes with DDT at various concurrencies • Testing MPI, OpenMP, Fortran, C, C++, mixed-mode applications • Demo on Thursday • Available for users to try • Please let us know if you have any problems • Excited to have DDT on Franklin and think it is good for the HPC community to have options in parallel debugging NERSC User Group Meeting, September 17, 2007
Questions? NERSC User Group Meeting, September 17, 2007