160 likes | 177 Views
This paper explores the use of TAU, a performance measurement and modeling infrastructure, for optimizing component assemblies in high-performance computing environments. It discusses the challenges of modeling single components, creating a global model, and selecting optimal implementations. The paper also presents a case study and concludes with the benefits of using a proxy-based measurement system for non-intrusive performance measurement.
E N D
“Performance Modeling of Component Assemblies with TAU” Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of Computer and Information Science Sandia National Laboratories Performance Research Laboratory University of Oregon
Outline • Motivation • Introduction and Background • Performance Measurement in HPC Component Environment • Performance Measuring and Modeling Infrastructure • Proxies • TAU component • Mastermind • Component Assembly Optimization • Conclusions
Motivations • Given a set of components, where each component has multiple implementations, what is the optimal subset of implementations that solve a given problem? • How to model a single component? • How to create a global model from a set of component models? • How to select optimal subset of implementations? • From a performance perspective, a component by itself has no meaning. A component needs a context. • Context is affected by: • The problem being solved • Parameters (e.g., size of an array) • Mismatched data structures
Performance in HPC Component Environments • Traditional role of performance measurement and modeling • Analysis-and-optimization phase • e.g., porting a stable code base to a new architecture • Performance model => predict scalability • In a component environment • Applications are dynamically composed at runtime • Application developers typically do not implement all of their own components • Performance measurements need to be non-intrusive • Users interested in a coarse-grained performance
What does performance mean? • Given a problem (characterized by tuple P), what time Te does a component C need to solve it ? i.e • Te = f ( P ) ; what’s f ? • To create a performance model f ( P ), we need: • Te = Execution time for a method call • Tm = Execution time of message passing calls within a method • Tc = Compute time for a given method (Tc = Te - Tm) • Input parameters that affect performance (e.g., size of an array) • For our purposes start with simplifying assumptions • Blocking communication and no overlap of communication and computation • Ignore disk I/O
How to measure performance? • Need to “instrument” the code • But has to be non-intrusive • What kind of performance infrastructure can achieve this? • Previous research suggests proxies • Proxies serve to intercept and forward method calls
CCA Performance Infrastructure • The proxy measurement system infrastructure: • Proxy • Lightweight : simply, a switch that turns measurement on and off • 1 proxy per component • Tuning and Analysis Utilities (TAU) component • Utilizes the TAU measurement library • Provides a measurement port • Responsible for making the measurements • Mastermind component • Responsible for gathering, storing, and reporting measurement data (timing data from TAU as well as input parameters from proxies) • Queries the TAU component for method-level measurements
Proxy • A proxy uses and provides the same ports that the actual component provides • Also, uses a MonitorPort • Identifies performance-dependent parameters Before: C1 C2 After: C2 C1 P2 MM
Automatic Proxy Generation • A tool based upon the Program Database Toolkit (University of Oregon) • 1 proxy created per port
MasterMind • A record is created for each instrumented routine and stores, for each invocation: • Measurement data (e.g., execution time, communication time, cache hits, etc.) • Input parameters • Currently, the MasterMind outputs all records at application completion • In the future, perhaps the MasterMind could output a performance model for a given component (based upon a linear regression) ?
TAU Component • TAU component is a wrapper to the TAU library • Provides access to timers to measure execution time and communication time • Also provides access to hardware metrics (e.g., cache hits) via external libraries such as PAPI or PCL • See http://www.cs.uoregon.edu/research/paracomp/tau
Using performance timings to select optimal components • To find optimal solution, need to reduce solution space • Eliminate “insignificant” components • 2-step heuristic • Are children, as a group, insignificant to a parent? • Is an individual node insignificant relative to its siblings? • Optimize reduced core for an approximately optimal solution
Case Study Example • Core identification ran on hydro shock simulation developed at Sandia National Labs • 10% thresholds • The original call-graph consisting of 18 nodes reduced to 8 nodes
Conclusions • The proxy-based measurement system allows for non-intrusive measurement of components • A single component may have multiple performance models based on different contexts • Eliminating “insignificant” components can ease the identification of an approximately optimal solution.
Future Work • Synthesize a composite performance model from individual component models • Generalizing performance models (e.g. parameterizing models by a processor speed and cache model to make them architecture independent) • Model representation • XML? • Quality-of-Service • Dynamic Implementation Selection