1 / 16

“Performance Modeling of Component Assemblies with TAU”

This paper explores the use of TAU, a performance measurement and modeling infrastructure, for optimizing component assemblies in high-performance computing environments. It discusses the challenges of modeling single components, creating a global model, and selecting optimal implementations. The paper also presents a case study and concludes with the benefits of using a proxy-based measurement system for non-intrusive performance measurement.

terrysilva
Download Presentation

“Performance Modeling of Component Assemblies with TAU”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Performance Modeling of Component Assemblies with TAU” Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, malony,sameer}@cs.uoregon.edu jairay@ca.sandia.gov Department of Computer and Information Science Sandia National Laboratories Performance Research Laboratory University of Oregon

  2. Outline • Motivation • Introduction and Background • Performance Measurement in HPC Component Environment • Performance Measuring and Modeling Infrastructure • Proxies • TAU component • Mastermind • Component Assembly Optimization • Conclusions

  3. Motivations • Given a set of components, where each component has multiple implementations, what is the optimal subset of implementations that solve a given problem? • How to model a single component? • How to create a global model from a set of component models? • How to select optimal subset of implementations? • From a performance perspective, a component by itself has no meaning. A component needs a context. • Context is affected by: • The problem being solved • Parameters (e.g., size of an array) • Mismatched data structures

  4. Performance in HPC Component Environments • Traditional role of performance measurement and modeling • Analysis-and-optimization phase • e.g., porting a stable code base to a new architecture • Performance model => predict scalability • In a component environment • Applications are dynamically composed at runtime • Application developers typically do not implement all of their own components • Performance measurements need to be non-intrusive • Users interested in a coarse-grained performance

  5. What does performance mean? • Given a problem (characterized by tuple P), what time Te does a component C need to solve it ? i.e • Te = f ( P ) ; what’s f ? • To create a performance model f ( P ), we need: • Te = Execution time for a method call • Tm = Execution time of message passing calls within a method • Tc = Compute time for a given method (Tc = Te - Tm) • Input parameters that affect performance (e.g., size of an array) • For our purposes start with simplifying assumptions • Blocking communication and no overlap of communication and computation • Ignore disk I/O

  6. How to measure performance? • Need to “instrument” the code • But has to be non-intrusive • What kind of performance infrastructure can achieve this? • Previous research suggests proxies • Proxies serve to intercept and forward method calls

  7. CCA Performance Infrastructure • The proxy measurement system infrastructure: • Proxy • Lightweight : simply, a switch that turns measurement on and off • 1 proxy per component • Tuning and Analysis Utilities (TAU) component • Utilizes the TAU measurement library • Provides a measurement port • Responsible for making the measurements • Mastermind component • Responsible for gathering, storing, and reporting measurement data (timing data from TAU as well as input parameters from proxies) • Queries the TAU component for method-level measurements

  8. Proxy • A proxy uses and provides the same ports that the actual component provides • Also, uses a MonitorPort • Identifies performance-dependent parameters Before: C1 C2 After: C2 C1 P2 MM

  9. Automatic Proxy Generation • A tool based upon the Program Database Toolkit (University of Oregon) • 1 proxy created per port

  10. MasterMind • A record is created for each instrumented routine and stores, for each invocation: • Measurement data (e.g., execution time, communication time, cache hits, etc.) • Input parameters • Currently, the MasterMind outputs all records at application completion • In the future, perhaps the MasterMind could output a performance model for a given component (based upon a linear regression) ?

  11. TAU Component • TAU component is a wrapper to the TAU library • Provides access to timers to measure execution time and communication time • Also provides access to hardware metrics (e.g., cache hits) via external libraries such as PAPI or PCL • See http://www.cs.uoregon.edu/research/paracomp/tau

  12. TAU Performance System Architecture

  13. Using performance timings to select optimal components • To find optimal solution, need to reduce solution space • Eliminate “insignificant” components • 2-step heuristic • Are children, as a group, insignificant to a parent? • Is an individual node insignificant relative to its siblings? • Optimize reduced core for an approximately optimal solution

  14. Case Study Example • Core identification ran on hydro shock simulation developed at Sandia National Labs • 10% thresholds • The original call-graph consisting of 18 nodes reduced to 8 nodes

  15. Conclusions • The proxy-based measurement system allows for non-intrusive measurement of components • A single component may have multiple performance models based on different contexts • Eliminating “insignificant” components can ease the identification of an approximately optimal solution.

  16. Future Work • Synthesize a composite performance model from individual component models • Generalizing performance models (e.g. parameterizing models by a processor speed and cache model to make them architecture independent) • Model representation • XML? • Quality-of-Service • Dynamic Implementation Selection

More Related