200 likes | 303 Views
Tools for Engineering Analysis of High Performance Parallel Programs. David Culler, Frederick Wong, Alan Mainwaring Computer Science Division U.C.Berkeley http://www.cs.berkeley.edu/~culler/talks. Traditional Parallel Programming Tools.
E N D
Tools for Engineering Analysis of High Performance Parallel Programs David Culler, Frederick Wong, Alan Mainwaring Computer Science Division U.C.Berkeley http://www.cs.berkeley.edu/~culler/talks
Traditional Parallel Programming Tools • Focus on showing “what program did” and “when it did it” • microscopic analysis of deterministic events • oriented towards initial development of small programs on small data sets and small machines • Instrumentation • traces, counters, profiles • Visualization • Examples • AIMS, PTOOLS, PPP • pablo + paradyn + ... => delphi • ACTS TAU - tuning and analysis util. LLNL ASCI III
Example: Pablo LLNL ASCI III
Beyond Zeroth-order Analysis • Basic level to get to a system design that is reasonable and behaves properly under “ideal condition” • Subject the system to various stresses to understand its operating regime and gain deeper insight into its dynamic behavior • Combine empirical data with analytical models • Iterate • from What? to What if? max displacement Wind Speed LLNL ASCI III
Approach: Framework for Parameterized Sensitivity Analsys • framework performs analysis over numerous runs • statistical filtering • vary parameter of interest • provides means of combining data to isolate effects of interest => ROBUSTNESS Problem Data Set Generator Well-developed Parallel Program Instrumentation Tools Study Parameter Machine Characterizers • Procs • Comm. perf. • Cache • Scheduling • ... visualization, modeling LLNL ASCI III
Simplest Example: Performance( P ) • NPB2.2 on NOW and Origin 2000 (250) LLNL ASCI III
Where Time is Spent ( P ) • Reveal basic Processor and network loading (vs P) • Basis for model derivation - comm(P) LLNL ASCI III
Where Time is Spent ( P ) - cont • Reveal basic Processor and network loading (vs P) LLNL ASCI III
Communication Volume ( P ) LLNL ASCI III
Communication Structure ( P ) LLNL ASCI III
Understanding Efficiency ( P, M ) • Want to understand both what load the program is placing on the system • and how well the system is handling that load => characterize the capability of the system via simple benchmarks (rather than advertised peaks) => combine with measured load for predictive model, & compare LLNL ASCI III
Communication Efficiency LLNL ASCI III
Tools => Improvements in Run Time • Efficiency analysis (vs parameters) gives insight into where to improve the system or the program • use traditional profiling to see where is program the ‘bad stuff’ happens • or go back and tune the system to do better LLNL ASCI III
Cache Behavior (P, $) • Combining trace generation with simulation provides new structural insight • Here: clear knees in program working set ($) these shift with machine size (P) LLNL ASCI III
Cache Behavior (P, $) • Clear knees in program working set ($) not affected by P LLNL ASCI III
Sensitivity to Multiprogramming • Parallel machines are increasingly general purpose • multiprogramming, at least interrupts and daemons • Many ‘ideal’ programs very sensitive to perturbations • Msg Passing is loosely coupled, but implementation may not be! LLNL ASCI III
Tools => Improvements in Run Time • MPI implementation spin-waits on send till network available (or queue not full) or on recv-complete • Should use two-phase spin-block LLNL ASCI III
Sensitivity to Seemingly Unrelated Activity • The mechanism for doing parameter studies is naturally extended to get statistically valid data through multiple samples at each point • tend to get crisp, fast results in the wee hours • Extend study outside the app • Example: two programs on big Origin • alone together on 64 P • 8 processor IS run: 4.71 sec 6.18 • 36 processor SP run: 26.36 sec 65.28 LLNL ASCI III
Repeatability • The variance for the repeated runs is a key result for production codes - the real world is not ideal LLNL ASCI III
Plans • Integrate our instrumentation and analysis tools with ACTS TAU • port to UCB Millennium environment • experiment with ASCI platforms • Refine and complete the automated sensitivity analysis framework • Backend performance data storage • Pablo SPPF? • Next Year • integrate performance model development, prediction LLNL ASCI III