180 likes | 303 Views
Discussion: How to Address Tools Scalability. Allen D. Malony malony@cs.uoregon.edu Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon. Scale and Scaling. What is meant by scale? Processors execution concurrency/parallelism
E N D
Discussion:How to Address Tools Scalability Allen D. Malony malony@cs.uoregon.edu Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon
Scale and Scaling • What is meant by scale? • Processors execution concurrency/parallelism • Memory memory behavior, problem size • Network concurrent communications • File system parallel file operations / data size • Scaling in the physical size / concurrency of the system • What else? • Program code size / interacting modules • Power electrical power consumption • Performance potential computational power • Dimension • Terascale … Petascale … and beyond
Tools Scalibility • Types of tools • Performance analytical / simulation / empirical • Debugging detect / correct concurrency errors • Programming parallel languages / computation • Compiling parallel code / libraries • Scheduling systems allocation and launching • What does it mean for a tool to be scalable? • Tool dependent (different problems and scaling aspects) • What changes about the tool? • Naturally scalable vs. change in function / operation • Is a paradigm shift required? • To what extent is portability important? • What tools would you say are scalable? How? Why?
PerformanceTuning PerformanceTechnology hypotheses Performance Diagnosis • Experimentmanagement • Performancedata storage Performance Technology properties Performance Experimentation • Instrumentation • Measurement • Analysis • Visualization characterization Performance Observation Focus – Parallel Performance Tools/Technology • Tools for performance problem solving • Empirical-based performance optimization process • Performance technology concerns
Large Scale Performance Problem Solving • How does our view of this process change when we consider very large-scale parallel systems? • What are the significant issues that will affect the technology used to support the process? • Parallel performance observation is required • In general, there is the concern for intrusion • Seen as a tradeoff with performance diagnosis accuracy • Scaling complicates observation and analysis • Nature of application development may change • What will enhance productive application development? • Paradigm shift in performance process and technology?
Instrumentation and Scaling • Make events visible to the measurement system • Direct instrumentation (code instrumentation) • Static instrumentation modifies code prior to execution • does not get removed (always will get executed) • source instrumentation may alter optimization • Dynamic instrumentation modifies code at runtime • can be inserted and deleted at runtime • incurs runtime cost • Indirect instrumentation generates events outside of code • Does scale affect the number of events? • Runtime instrumentation is more difficult with scale • Affected by increased parallelism
Measurement and Scaling • What makes performance measurement not scalable? • More parallelism more performance data overall • performance data specific to each thread of execution • possible increase in number interactions between threads • Harder to manage the data (memory, transfer, storage) • Issues of performance intrusion • Performance data size • Number of event generated X metrics per event • Are there really more events? Which are important? • Control number of events generated • Control what is measured (to a point) • Need for performance data versus cost of obtaining it • Portability!
Measurement and Scaling (continued) • Consider “traditional” measurement methods • Profiling: summary statistics calculated during execution • Tracing: time-stamped sequence of execution events • Statistical sampling: indirect triggers, PC + metrics • Monitoring: access to performance data at runtime • How does the performance data grow? • How does per thread profile / trace size grow? • Consider communication • Strategies for scaling • Control performance data production and volume • Change in measurement type or approach • Event and/or measurement control • Filtering, throttling, and sampling
Concern for Performance Measurement Intrusion • Performance measurement can affect the execution • Perturbation of “actual” performance behavior • Minor intrusion can lead to major execution effects • Problems exist even with small degree of parallelism • Intrusion is accepted consequence of standard practice • Consider intrusion (perturbation) of trace buffer overflow • Scale exacerbates the problem … or does it? • Traditional measurement techniques tend to be localized • Suggests scale may not compound local intrusion globally • Measuring parallel interactions likely will be affected • Use accepted measurement techniques intelligently
Analysis and Visualization Scalability • How to understand all the performance data collected? • Objectives • Meaningful performance results in meaningful forms • Want tools to be reasonably fast and responsive • Integrated, interoperable, portable, … • What does “scalability” mean here? • Performance data size • Large data size should not impact analysis tool use • Data complexity should not overwhelm interpretation • Results presentation should understandable • Tool integration and usability
Analysis and Visualization Scalability (continued) • Online analysis and visualization • Potential interference with execution • Single experiment analysis versus multiple experiments • Strategies • Statistical analysis • data dimension reduction, clustering, correlation, … • Scalable and semantic presentation methods • statistical, 3D relate metrics to physical domain • Parallelization of analysis algorithms (e.g., trace analysis) • Increase system resources for analysis / visualization tools • Integration with performance modeling • Integration with parallel programming environment
Role of Intelligence and Specificity • How to make the process more effective (productive)? • Scale forces performance observation to be intelligent • Standard approaches deliver a lot of data with little value • What are the important performance events and data? • Tied to application structure and computational mode • Tools have poor support for application-specific aspects • Process and tools can be more application-aware • Will allow scalability issues to be addressed in context • More control and precision of performance observation • More guided performance experimentation / exploration • Better integration with application development
Role of Automation and Knowledge Discovery • Even with intelligent and application-specific tools, the decisions of what to analyze may become intractable • Scale forces the process to become more automated • Performance extrapolation must be part of the process • Build autonomic capabilities into the tools • Support broader experimentation methods and refinement • Access and correlate data from several sources • Automate performance data analysis / mining / learning • Include predictive features and experiment refinement • Knowledge-driven adaptation and optimization guidance • Address scale issues through increased expertise
ParaProf – Histogram View (Miranda) 8k processors 16k processors
ParaProf – 3D Full Profile (Miranda) 16k processors
ParaProf – 3D Scatterplot (Miranda) • Each pointis a “thread”of execution • A total offour metricsshown inrelation • ParaVis 3Dprofilevisualizationlibrary • JOGL
Parallel Program File System Analysis Server MergedTraces Monitor System Worker 1 Master Classic Analysis: • monolithic • sequential Trace 1 Worker 2 Trace 2 Trace 3 Trace N Worker m Event Streams Message Passing ParallelI/O Process Internet Visualization Client Timeline with 16 visible Traces Segment Indicator 768 Processes Thumbnail Vampir Next Generation (VNG) Architecture