A Framework for Online Performance Analysis and Visualization of Large-Scale Parallel Applications

A Framework for Online PerformanceAnalysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute, NeuroInformatics Center University of Oregon

Outline • Problem description • Scaling and performance observation • Interest in online performance analysis • General online performance system architecture • Access models • Profiling issues and control issues • Framework for online performance analysis • TAU performance system • SCIRun computational and visualization environment • Experiments • Conclusions and future work

Problem Description • Need for parallel performance observation • Instrumentation, measurement, analysis, visualization • In general, there is the concern for intrusion • Seen as a tradeoff with accuracy of performance diagnosis • Scaling complicates observation and analysis • Issues of data size, processing time, and presentation • Online approaches add capabilities as well as problems • Performance interaction, but at what cost? • Tools for large-scale performance observation online • Supporting performance system architecture • Tool integration, effective usage, and portability

Scaling and Performance Observation • Consider “traditional” measurement methods • Profiling: summary statistics calculated during execution • Tracing: time-stamped sequence of execution events • More parallelism  more performance data overall • Performance specific to each thread of execution • Possible increase in number interactions between threads • Harder to manage the data (memory, transfer, storage, …) • More parallelism / performance data  harder analysis • More time consuming to analyze • More difficult to visualize (meaningful displays) • Need techniques to address scaling at all levels

Why Complicate Matters with Online Methods? • Adds interactivity to performance analysis process • Opportunity for dynamic performance observation • Instrumentation change • Measurement change • Allows for control of performance data volume • Post-mortem analysis may be “too late” • View on status of long running jobs • Allow for early termination • Computation steering to achieve “better” results • Performance steering to achieve “better” performance • Online performance observation may be intrusive

Performance Instrument Performance Measurement Performance Data Performance Control Performance Analysis Performance Visualization General Online Performance Observation System

Models of Performance Data Access (Monitoring) • Push Model • Producer/consumer style of access and transfer • Application decides when/what/how much data to send • External analysis tools only consume performance data • Availability of new data is signaled passively or actively • Pull Model • Client/server style of performance data access and transfer • Application is a performance data server • Access decisions are made externally by analysis tools • Two-way communication is required • Push/Pull Models

TAU Performance System Architecture Paraver EPILOG ParaProf

Online Profile Measurement and Analysis in TAU • Standard TAU profiling • Per node/context/thread • Profile “dump” routine • Context-level • Profile file per eachthread in context • Appends to profile file • Selective event dumping • Analysis tools access filesthrough shared file system • Application-level profile“access” routine

Performance Steering Online Performance Analysis and Visualization SCIRun (Univ. of Utah) Performance Visualizer Application // performance data streams TAU Performance System Performance Analyzer // performance data output accumulated samples Performance Data Integrator Performance Data Reader file system • sample sequencing • reader synchronization

Profile Sample Data Structure in SCIRun node context thread

Performance Analysis/Visualization in SCIRun SCIRun program

Uintah Computational Framework (UCF) • Universityof Utah • UCF analysis • Scheduling • MPI library • Components • 500 processes • Use for onlineand offlinevisualization • Apply SCIRunsteering

“Terrain” Performance Visualization F

Scatterplot Displays • Each pointcoordinatedeterminedby threevalues: MPI_Reduce MPI_Recv MPI_Waitsome • Min/Maxvalue range • Effective forclusteranalysis • Relation between MPI_Recv and MPI_Waitsome

Online Unitah Performance Profiling • Demonstration of online profiling capability • Colliding elastic disks • Test material point method (MPM) code • Executed on 512 processors ASCI Blue Pacific at LLNL • Example 1 (Terrain visualization) • Exclusive execution time across event groups • Multiple time steps • Example 2 (Bargraph visualization) • MPI execution time and performance mapping • Example 3 (Domain visualization) • Task time allocation to “patches”

Example 1 (Event Groups)

Example 2 (MPI Performance)

Example 3 (Domain-Specific Visualization)

Possible Improvements • Profile merging at context level to reduce number of files • Merging at node level may require explicit processing • Concurrent trace merging could also reduce files • Hierarchical merge tree • Will require explicit processing • Could consider IPC transfer • MPI (e.g., used in mpiP for profile merging) • Create own communicators • Sockets or PACX between computer server and analyzer • Leverage large-scale systems infrastructure • Parallel profile analysis

Concluding Remarks • Interest in online performance monitoring, analysis, and visualization for large-scale parallel systems • Need to intelligently use • Benefit from other scalability considerations of the system software and system architecture • See as an extension to the parallel system architecture • Avoid solutions that have portability difficulties • In part, this is an engineering problem • Need to work with the system configuration you have • Need to understand if approach is applicable to problem • Not clear if there is a single solution

Future Work • Build online support in TAU performance system • Extend to support PULL model capabilities • Develop hierarchical data access solutions • Performance studies of full system • Latency analysis • Bandwidth analysis • Integration with other performance tools • System performance monitors • ParaProf parallel profile analyzer • Development of 3D visualization library • Portability focus

A Framework for Online Performance Analysis and Visualization of Large-Scale Parallel Applications

A Framework for Online Performance Analysis and Visualization of Large-Scale Parallel Applications

Presentation Transcript

Online Performance Monitoring, Analysis, and Visualization of Large-Scale Parallel Applications

Continuation Methods for Performing Stability Analysis of Large-Scale Applications

A Framework for assessing the performance of DWM at large Scale

Portability and Performance for Visualization and Analysis Operators Using the Data-Parallel PISTON Framework

Large Scale Visualization with ParaView

Sailfish: A Framework For Large Scale Data Processing

Adaptive Performance Optimization for Large Scale Web Applications

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Parallel Visualization for Very Large Data Simulations

Stability Analysis Algorithms for Large-Scale Applications

Debugging and Performance Analysis of Parallel MPI Applications

Performance Engineering of Parallel Applications

Parallel Visualization of Large-Scale Datasets for the Earth Simulator

Performance analysis and tuning of parallel/distributed applications

DCCFinder: A Very-Large Scale Code Clone Analysis and Visualization Tool

HiMap: Adaptive Visualization of Large-Scale Online Social Networks

A Multiresolution Volume Rendering Framework for Large-Scale Time-Varying Data Visualization

Large Scale Applications

Unstructured Data Partitioning for Large Scale Visualization

TAU: A Framework for Parallel Performance Analysis

A Multiresolution Volume Rendering Framework for Large-Scale Time-Varying Data Visualization