200 likes | 303 Views
ParaProf : A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis. Robert Bell , Allen D. Malony, Sameer Shende {bertie,malony,shende}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute / NeuroInformatics Center
E N D
ParaProf:A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis Robert Bell, Allen D. Malony, Sameer Shende {bertie,malony,shende}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute / NeuroInformatics Center University of Oregon
Outline • Motivation • ParaProf Objectives • Related Work • ParaProf Features and Functionality • Examples • 512-processor SAMRAI execution • Interactive demonstration • Software engineering of ParaProf • Recent advancements • Future work • Concluding remarks
Motivation • Profiling is well-known and broadly applied technique • Profiling tools are not the same • Different profile instrumentation and measurement • Sequential vs. parallel profiling • System-specific, proprietary, and incompatible • Complicates cross-platform performance studies • Slows development of portable, robust profile analysis • Increased detail and complexity of profile data • Hardware performance counters • Integration of system and application performance data • Parallel profile data / analysis and large-scale parallelism
ParaProf Objectives • Portable, extensible, and scalable tool for profile analysis • Offer “best of breed” capabilities to performance analysts • Build as profile analysis framework for extensibility • Work with different (most) types of profile data • Support input of profile data from different sources • Universal performance profile analysis capabilities • Large-scale analysis and display support • Multi-profile (multi-experiment) • Programmable analysis • Modular, object-oriented software engineering • Broadly applied
Related Work • Rich history of sequential and parallel profiling tools • Sequential profilers • prof and gprof • Unix profiling of execution time using sampling method • gprof includes callgraph profiling (parent-child distribution) • cxperf and ssrun (SGI) • Hardware performance counter profiling • vprof (Visual Profiler) • DynaProf • PAPI-based profiling using dynamic instrumentation • HPCView • Support for multiple profile analysis
Related Work (continued) • Parallel profilers • GuideView and VGV • OpenMP applications (VGV also supports MPI profiling) • Proprietary • Aksum • Targeted to Linux systems with multiple experiment support • SvPablo • Cross-platform with source-based views • Expert • Trace-generated profile data • Performance property/problem analysis and display • HPM Toolkit
ParaProf Features • Parallel profile data • “Experiment” gives profile for every thread of execution • Multiple performance metrics (time, HPC, …) • Based on TAU performance system • Event-based profiles • Support for callpath profiles • Profile data input • Post-mortem from raw files • Post-mortem from performance database • Online from running program (in progress) • Multiple experiment profiles active simultaneously
ParaProf Features (continued) • Profile analysis • Statistical analysis per thread and across threads • Individual events and event groups • Value-based and percent-based analysis • Derived statistics and distribution statistics for scalability • Experiment profile integration • Profile performance displays • Bargraph displays • Hyperlink navigation
TAU Performance System Framework • Tuning and Analysis Utilities (aka Tools Are Us) • Performance system framework for scalable parallel and distributed high-performance computing • Targets a general complex system computation model • nodes / contexts / threads • Multi-level: system / software / parallelism • Measurement and analysis abstraction • Integrated toolkit for performance instrumentation, measurement, analysis, and visualization • Portable performance profiling/tracing facility • Open software approach
TAU Performance System Architecture Paraver EPILOG
Full Profile Display (SAMRAI) 512 processes
Profile Statistics Histogram (SAMRAI) • Need to address profile display scalability • Statistical analysis to show performance distributions • Value histogramming showing # threads in value range • Define # bins and value distribution function Execution time (wallclock) Floating point operations
Recent ParaProf Enhancements • Integration of ParaProf with DynaProf • Convert DynaProf profile data to TAU format
Future Work • Profile translators • Sequential: prof/grof (vprof), cxperf/ssrun • Parallel: SvPablo, Aksum, HPM Toolkit • Cross-experiment analysis • Generalized programmable analysis engine • Integration with online performance profiling in TAU • Online profile monitor in TAU currently available • Analysis of profiles generated from trace phase analysis • Trace-based phase profile tool in development • More sophisticated performance display graphics • Use 3D performance visualization library (in progress)
Concluding Remarks • ParaProf is a portable parallel profile analysis tool • ParaProf provides broad, integrated functionality • Designed to analyze and display large-scale profile • Designed for multi-experiment performance studies • Intended to serve as a universal profile analysis system • Robust design and software engineering • Future work on extended analysis and visualization • Future work on performance database integration
More Information • TAU performance system www.cs.uoregon.edu/research/paraducks/tau • Acknowledgements • DOE project, “Performance Technology for Tera-Class Parallel Computers: Evolution of the TAU Performance System,” 2001-2004.