1 / 14

HPCToolkit: Advanced Performance Analysis Platform for Complex Applications

HPCToolkit offers automated strategies to identify and address performance bottlenecks in intricate applications. With a focus on creating programmer-friendly tools for enhancing application performance, it supports large multi-lingual applications on various platforms. Its sophisticated features include platform, language, and compiler independency, scalable data collection, and effective presentation of analysis results.

Download Presentation

HPCToolkit: Advanced Performance Analysis Platform for Complex Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HPCToolkit : Multi-platform Tools for Performance Analysis John Mellor-Crummey Robert Fowler Nathan Tallent Gabriel Marin Department of Computer Science, Rice University Los Alamos Computer Science Institute http://www.hipersoft.rice.edu/hpctoolkit/

  2. The Big Picture • Long-term: compiler and architecture research requires detailed performance understanding • identify performance bottlenecks in complex applications • understand the mismatch between application needs and architecture capabilities • automate strategies for performance improvement • Short-term result: programmer-accessible tools for understanding application performance

  3. Performance Analysis and Tuning • Increasingly necessary • gap between typical and peak performance is growing • Increasingly hard • complex architectures are harder to program effectively • deeply-pipelined microprocessors • VLIW or superscalar • complex memory hierarchy • non-blocking, multi-level caches • large-scale scientific applications pose challenges for tools

  4. LACSI HPCToolkit • Support large, multi-lingual applications • a mix of of Fortran, C, C++ • hundreds of thousands of lines, many procedures • external libraries • Eliminate manual labor from run, analyze tune cycle • use optimized application binaries directly • no: manual instrumentation, build process changes, recompilation • Platform, language, and compiler independence • emphasis on LANL ASC Platforms (Origin, AlphaServer, Opteron) • multiple data sources  cross platform comparisons • Scalable data collection • Effective presentation of analysis results • intuitive, top-down user interface • hierarchical program structure with loop level metrics

  5. binary object code compilation linking source correlation binary analysis profile execution program structure hyperlinked database performance profile interpret profile hpcviewer HPCToolkit System Overview application source

  6. HPCToolkit System Overview application source binary object code compilation • launch unmodified, optimized application binaries • collect statistical profiles of events of interest linking source correlation binary analysis profile execution program structure hyperlinked database performance profile interpret profile hpcviewer

  7. application source binary object code compilation linking source correlation profile execution binary analysis program structure hyperlinked database performance profile interpret profile hpcviewer HPCToolkit System Overview • decode instructions and combine with profile data

  8. application source binary object code compilation linking source correlation profile execution binary analysis program structure hyperlinked database performance profile interpret profile hpcviewer HPCToolkit System Overview • extract loop nesting information from executables

  9. application source binary object code compilation linking source correlation profile execution binary analysis program structure hyperlinked database performance profile interpret profile hpcviewer HPCToolkit System Overview • synthesize new metrics by combining metrics • relate metrics, structure, and program source

  10. application source binary object code compilation linking source correlation profile execution binary analysis program structure hyperlinked database performance profile interpret profile hpcviewer HPCToolkit System Overview • support top-down analysis with interactive viewer • analyze results anytime, anywhere

  11. HPCViewerScreenshot Annotated Source View Navigation Metrics

  12. Impact on LANL Code Teams • HPCToolkit deployed on Origin • improved SAGE by 2x on one example (see next slide) • First performance workshop (Feb 03) • Feedback: needed on Q, smaller DB on large codes • Improvements: Sophisticated support for Alpha/Tru64 platform, new Java browser using compact database • Second performance workshop (July 03) • Feedback: ready to use, binary analysis too slow on large codes • Improvement: sped up binary analysis on large codes by 30x • HPCToolkit deployed on secure machines (July 03) • used to evaluate FLAG for ASCI burn code review (Aug 03) • Ongoing interactions • Feedback: better support for shared libraries and Opteron • Improvement: new support for shared libraries installed on Q • Ongoing work: LANL/Rice collaboration for Opteron support

  13. Sage Solver Performance Improvement

  14. Future • Collect and present dynamic context • what path gets us to expensive computations • accurate call-graph profiling of unmodified binaries • analysis and presentation of dynamic context to explain performance • solver is slow only when called on non-preconditioned matrices • MPI wait cost is incurred in the backsolve • Statistical clustering • effective analysis of large collections of processes • Performance diagnosis • why rather than what

More Related