220 likes | 312 Views
A Framework for Online Performance Analysis and Visualization of Large-Scale Parallel Applications. Kai Li, Allen D. Malony , Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science
E N D
A Framework for Online PerformanceAnalysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Robert Bell, Sameer Shende {likai,malony,bertie,sameer}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute, NeuroInformatics Center University of Oregon
Outline • Problem description • Scaling and performance observation • Interest in online performance analysis • General online performance system architecture • Access models • Profiling issues and control issues • Framework for online performance analysis • TAU performance system • SCIRun computational and visualization environment • Experiments • Conclusions and future work
Problem Description • Need for parallel performance observation • Instrumentation, measurement, analysis, visualization • In general, there is the concern for intrusion • Seen as a tradeoff with accuracy of performance diagnosis • Scaling complicates observation and analysis • Issues of data size, processing time, and presentation • Online approaches add capabilities as well as problems • Performance interaction, but at what cost? • Tools for large-scale performance observation online • Supporting performance system architecture • Tool integration, effective usage, and portability
Scaling and Performance Observation • Consider “traditional” measurement methods • Profiling: summary statistics calculated during execution • Tracing: time-stamped sequence of execution events • More parallelism more performance data overall • Performance specific to each thread of execution • Possible increase in number interactions between threads • Harder to manage the data (memory, transfer, storage, …) • More parallelism / performance data harder analysis • More time consuming to analyze • More difficult to visualize (meaningful displays) • Need techniques to address scaling at all levels
Why Complicate Matters with Online Methods? • Adds interactivity to performance analysis process • Opportunity for dynamic performance observation • Instrumentation change • Measurement change • Allows for control of performance data volume • Post-mortem analysis may be “too late” • View on status of long running jobs • Allow for early termination • Computation steering to achieve “better” results • Performance steering to achieve “better” performance • Online performance observation may be intrusive
Performance Instrument Performance Measurement Performance Data Performance Control Performance Analysis Performance Visualization General Online Performance Observation System
Models of Performance Data Access (Monitoring) • Push Model • Producer/consumer style of access and transfer • Application decides when/what/how much data to send • External analysis tools only consume performance data • Availability of new data is signaled passively or actively • Pull Model • Client/server style of performance data access and transfer • Application is a performance data server • Access decisions are made externally by analysis tools • Two-way communication is required • Push/Pull Models
TAU Performance System Architecture Paraver EPILOG ParaProf
Online Profile Measurement and Analysis in TAU • Standard TAU profiling • Per node/context/thread • Profile “dump” routine • Context-level • Profile file per eachthread in context • Appends to profile file • Selective event dumping • Analysis tools access filesthrough shared file system • Application-level profile“access” routine
Performance Steering Online Performance Analysis and Visualization SCIRun (Univ. of Utah) Performance Visualizer Application // performance data streams TAU Performance System Performance Analyzer // performance data output accumulated samples Performance Data Integrator Performance Data Reader file system • sample sequencing • reader synchronization
Profile Sample Data Structure in SCIRun node context thread
Performance Analysis/Visualization in SCIRun SCIRun program
Uintah Computational Framework (UCF) • Universityof Utah • UCF analysis • Scheduling • MPI library • Components • 500 processes • Use for onlineand offlinevisualization • Apply SCIRunsteering
Scatterplot Displays • Each pointcoordinatedeterminedby threevalues: MPI_Reduce MPI_Recv MPI_Waitsome • Min/Maxvalue range • Effective forclusteranalysis • Relation between MPI_Recv and MPI_Waitsome
Online Unitah Performance Profiling • Demonstration of online profiling capability • Colliding elastic disks • Test material point method (MPM) code • Executed on 512 processors ASCI Blue Pacific at LLNL • Example 1 (Terrain visualization) • Exclusive execution time across event groups • Multiple time steps • Example 2 (Bargraph visualization) • MPI execution time and performance mapping • Example 3 (Domain visualization) • Task time allocation to “patches”
Possible Improvements • Profile merging at context level to reduce number of files • Merging at node level may require explicit processing • Concurrent trace merging could also reduce files • Hierarchical merge tree • Will require explicit processing • Could consider IPC transfer • MPI (e.g., used in mpiP for profile merging) • Create own communicators • Sockets or PACX between computer server and analyzer • Leverage large-scale systems infrastructure • Parallel profile analysis
Concluding Remarks • Interest in online performance monitoring, analysis, and visualization for large-scale parallel systems • Need to intelligently use • Benefit from other scalability considerations of the system software and system architecture • See as an extension to the parallel system architecture • Avoid solutions that have portability difficulties • In part, this is an engineering problem • Need to work with the system configuration you have • Need to understand if approach is applicable to problem • Not clear if there is a single solution
Future Work • Build online support in TAU performance system • Extend to support PULL model capabilities • Develop hierarchical data access solutions • Performance studies of full system • Latency analysis • Bandwidth analysis • Integration with other performance tools • System performance monitors • ParaProf parallel profile analyzer • Development of 3D visualization library • Portability focus