210 likes | 224 Views
Instrumentation and Measurement Strategies for Flexible and Portable Empirical Performance Evaluation. Sameer Shende, Allen D. Malony, Robert Ansell-Bell {sameer,malony,bertie}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon.
E N D
Instrumentation and Measurement Strategies for Flexible and Portable Empirical Performance Evaluation Sameer Shende, Allen D. Malony, Robert Ansell-Bell {sameer,malony,bertie}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon
Empirical Performance Technology • Evolution of parallel systems challenge empirical performance evaluation • Shared- and distributed-memory parallelism • Layered, hierarchical software environments • Multi-level performance semantics • Dual performance technology goals • Robust performance observation • Semantic-based performance mapping • Strategies for instrumentation and measurement • Flexibility • Portability
Talk Outline • Performance Observation Requirements • Flexibility and Portability • Strategies for Empirical Performance Evaluation • TAU Performance System • Computation model for performance technology • TAU performance system toolkit • Performance Case Study • Conclusions
TAU Performance System • Tuning and Analysis Utilities • Performance system framework • scalable parallel and distributed HPC • Targets a general complex system computation model • nodes / contexts / threads • Multi-level: system / software / parallelism • Measurement and analysis abstraction • Integrated performance toolkit • instrumentation, measurement, analysis, visualization • Portable facility based on open software approach • Robust and widely applied
General Complex System Computation Model • Node:physically distinct shared memory machine • Message passing node interconnection network • Context: distinct virtual memory space within node • Thread: execution threads (user/system) in context Interconnection Network Inter-node messagecommunication * * Node Node Node node memory memory memory SMP physicalview VM space … modelview … Context Threads
Empirical Performance Experimentation Space • Wherein the program are performance measurements made • When is performance instrumentation done • How are performance measurements defined and how are instrumentation alternatives chosen
Instrumentation Alternatives • Source-to-source translation using preprocessor level instrumentation • PDT • MPI wrapper library level instrumentation • VampirTrace • Binary instrumentation using runtime code patching • DyninstAPI
Measurement Strategies • Statistical profiles of software actions • timing or counting (sampled or direct methods) • Statistical profiles of hardware actions • hardware performance data • Program event tracing • temporal dynamic behavior
Measurement Alternatives • Wallclock time • gettimeofday (default) • low-overhead nanosecond timers [PAPI] • CPU time (user+sys) • Process virtual time (user) • Hardware performance counters [PCL, PAPI] • floating point instructions • primary and secondary data and instruction cache misses ...
Runtime Instrumentation and Measurement DyninstAPI+TAU (Wallclock, PAPI wallclock, FP)
Dynamic Instrumentation • TAU uses DyninstAPI for runtime code patching • tau_run (mutator) loads measurement library • Instruments mutatee • MPI issues: • one mutator per executable image [TAU, DynaProf] • one mutator for several executables [Paradyn, DPCL]
Measurement Alternatives DyninstAPI+TAU (Event Tracing, CPUTIME profile)
Profiling using Multi-Level Instrumentation PDT (source) and MPI (library)
Performance Perturbation • Measurement alternatives • PAPI wallclock overhead 27% lower than gettimeofday system call under IA-32 Linux 2.x • Source vs. runtime instrumentation • source 23% lower than runtime for TAU profiling • Need to balance alternatives • abstractions • instrumentation levels • flexibility /simplicity
Conclusions Flexibility and portability of performance technology can be improved by integration of instrumentation and measurement strategies. This helps create robust and ubiquitous performance technology for the analysis and tuning of parallel and distributed software and systems in the presence of (evolving) complexity.
More Information and Acknowledgments • URLs • TAU:www.cs.uoregon.edu/research/paracomp/tau • Grant support (TAU) • DOE 2000 ACTS • http://www-unix.mcs.anl.gov/DOE2000 • http://www.nersc.gov/ACTS • DOE ASCI Level 3 (LANL, LLNL) • DARPA