90 likes | 228 Views
Breakout Group: Debugging. David E. Skinner and Wolfgang E. Nagel IESP Workshop 3, October, Tsukuba, Japan. Exascale Debugging. Debugging: finding problems in the execution of code. Identifying and dealing with sources of: incorrectness (application and architecture)
E N D
Breakout Group: Debugging David E. Skinner and Wolfgang E. Nagel IESP Workshop 3, October, Tsukuba, Japan
Exascale Debugging • Debugging: finding problems in the execution of code. Identifying and dealing with sources of: • incorrectness (application and architecture) • application failure (deadlock, hang, segfault) • critical application bottlenecks (standstill, performance cliff) • Exascale issues • Concurrency expense of debugging • Scalability of debugger methodologies (data and interfaces) • Concurrency scaling of the frequency of errors/failures • Heterogeneity and lightweight OS
Exascale Trends relevant to debugging To which broad exascale trends is debugging related? • Concurrency ✓ • Reliability ✓ • Power Costs • Heterogeneity in a node ✓ • I/O and memory: ratios and breakthroughs
What’s different about exascale debugging? • Assumption that many things may/will go wrong at the same time will require triage, filtering, and clustering of faults and problems • Focus on multi-level debugging, communicating details of faults between software layers • Synthesis of fault information into understanding in the context of application and architecture • Simulation of concurrency when possible • Excision of buggy code snippets to run at lower concurrencies
Debugging Priority Research Direction (use one slide for each) Key challenges Summary of research direction • Basic challenge of concurrency (hard & $$) • Interoperability with compiler, library, runtime, OS and I/O • Debugging without stopping (resilient analysis of victim processes) • Vertical integration of debug and performance information across software layers • Layered contexts of debugging (just MPI, just I/O, or framework/application defined ) • Scalable clustering of application process states and contexts. Filter/search within debugger • Automatically triggered debugging Potential impact on software component Potential impact on usability, capability, and breadth of community • More eyes on debug information besides the person running the debugger • Multi-layered debug histories become available/useful to system-wide monitoring • Debugging meets performance analysis • Debugging informs system software • Lowering overhead and barriers to debugging at large scale • Debuggers begin to communicate user level metrics, debugging becomes more meaningful • Greater certainty in scientific validity of exascale’s computational results. Trust.
UR Graph Roadmap for exascale debugging Near-production exascale Scale of debugging Simulation @ 1e6 cores Breakthroughs needed for 1e6 core production debug LWDB @ 1e5 cores Planning & Workshops 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
4.x.Debugging narrative Technology drivers Alternative R&D strategies Recommended research agenda Crosscutting considerations
Roadmap sections on debugging tools • Technology drivers for Debugging • Alternative R&D strategies for Debugging • Recommended research agenda Debugging + Identify cross-cutting consideration and connections (compilers, resiliency and performance) + Identify key regional interests, expertise, and resources
State of the art • Debuggers scale to 10K procs • Vendors are developing solutions for new debugging contexts (memory, communication, etc.) • Some progress in clustering and data aggregation