CG Task 2.4: Interactive and semiautomatic performance evaluation

CG Task 2.4:Interactive and semiautomatic performance evaluation W. Funika

Outline • Motivation • Influencing the Performance • Tools Environment Architecture • Performance Measurement • High Level Performance Analysis • Performance Visualization • Performance Prediction for Grid execution • Summary

Motivation • Large number of tools, but mainly non-Grid oriented ones, Grid-oriented ones existing feature fixed possibilities • Need for generic performance data, application specific data,and performance prediction • A monitoring system for the needs of tools’ operation: • accessible via well-defined interface • with a comprehensive range of possibilities • extensible functionality • Re-usability of existing tools • When interactive tools are difficult or impossible to apply, (semi)automatic ones are of help

Influencing performance • Does allocation of resources correspond to the request ? If not, ask for more resources • How well is application running on this machine ? • Application has some parameters which can be tuned • Determine optimal mapping • Components distribution based on benchmarking • computation speed: dependences between components • communication speed • ratio between them • On-line optimization: brokers/actuators (modify allocation policy) • Application performance should be adjustable dynamically

More fine-grained performance actions • Get statical information, e.g. ‘measure critical path’ • Tuning of individual components of a distributed application • Mapping -> Component-specific based on description of monitoring • Controlling what data is collected • define measurements which can be controlled off-line • define sensors which can realize measurements (system, user-defined) • define relations between measurements at different levels

Categories of performance evaluation tools Interactive, manual performance analysis Off-line tools • track based (combined with visualization) • profile based (no time reference) • problem: strong influence when fine grained measurements On-line tools • possible definition (restriction) of the measurements at run-time • suitable with cyclic programs: new measurements based to the previous results. => Automation of the bottleneck search is possible Semi-automatic and automatic tools • Batch-oriented use of the computational environment (e.g. Grid) • Now going to interactive use of Grid • Basis: Search-model: enables possible refining of measurements

Semiautomatic Analysis • Why (semi-)automatic on-line performance evaluation? • ease of use - guidethe users to performance problems • Grid: exact performance characteristics of computing resources and network often unknown to user • tool should assess actual performance w.r.t. achievable performance • Interactive applications not suited for tracing • applications run 'all the time' • on-line analysis can focus on specific execution phases • detailed information via selective refinement

Legend RMD – raw monitoring data PMD – performance measurement data Component Structure of Environment Benchmarks (Task 2.3) G-PM High Level Analysis Component Performance Measurement Component Applications (WP1) executing on Grid testbed RMD PMD Grid Monitoring (Task 3.3) User Interface and Visualization Component Performance Prediction Component Application source code data flow manual information transfer

G-PM tool (a) a performance measurement component (PMC), • basic performance measurements of Grid applications and • Grid environments • input to HLAC (b) a component for high level analysis (HLAC), • problem specific Analysis • application Specific Performance Metrics • specification Language (c) a component for performance prediction (PPC) based on analytical performance models of application kernels, • computationally Specific Kernels • predicting Behavior of Codes • different HW conditions (d) a user interface and visualization component (UIVC), • defining measurement • resulting performance information

Example measurement a request on the whole volume of communication between two processes Defining the measurement comprises: • delay in receive, the delayin send metrics, • the process identifiers, • the interval of the measurement (when dimension), • the whole program (where dimension). Output in a visualization window: • summary information on the request, • presentation with one of possible displays (e.g. two bargraphs, each per each direction of message passing a communication matrix where a pair (column, row) - a sending process-receiving process pair with corresponding communication volume values. Requests submission: either before application is started, or whenrunning

Performance data Application-related • Data transfer • Resources utilization (CPU, memory access, disk access, cache ops) • Delays due to communication, synchronization, I/O Grid-related • Availability of resources • Dynamic resource information (node, link) • Static resource information (node, link)

Performance data:granularity in space and time Space: • Sites • Hosts • Processes • Functions inside a Process’s code Time: • Single performance value comprising the whole execution • Summary values for a selectable period of time • Progression of performance metrics over time with selectable resolution. Min. 1 s

How a measurement is realized Example: • Summing up all the messages sent between two processes • Capturing each event “message sent” • Adding the volume of the message (event’s parameter) to the sum • What capturing an event means ? In our case: capturing each event “message sent” is notification of each invocation of “send message” function of a library

The APART approach • object oriented performance data model • different kinds and sources, e.g. profiles, traces, ... • make use of existing monitoring tools • formal specification of performance properties • possible bottlenecks in an application • specific to programming paradigm • APART specification language (ASL) • specification of automatic analysis process

APART specification language • specification of performance property has three parts: • CONDITION: when does a property hold? • CONFIDENCE: how sure are we? (depends on data source) (0-1) • SEVERITY: how important is the property? • basis for determining the most important performance problems • specification can combine different types of performance data • data from different hosts => global properties, e.g. load imbalance

HLAC functionality for the user #1 • Combine/correlate performance data from different sources • Measure application specific performance properties • Realized through inserting probes – call to a special function at strategic points (e.g. at the beginning of an iteration) • ASL used to specify how application specific properties are computed from the data delivered by the probes and generic PMD. • Different properties from the sameprobes, e.g. min/max/mean time of iteration, amount of communication for each iteration. • ASL specifications stored in a configuration file.

HLAC functionality for the user #2 • Marking relevant places in the code (e.g. beginning and end of a computation phase) by inserting probes • Scalar parameters are passed to the probe (e.g. problem size) • Values to measured: Accumulated over the whole run-time, Plotted as a graph against real-time, Plotted as a graph against # executions (i.e. against application’s execution phases) • Measurement is performed on a request via UIVC • Generic derived metrics: load imbalance, relative CPU and network performance – application performance as % of optimal performance

HLAC inside • Transformations in PMC fixed, in HLAC configurable w.r.t: • Performance measurement data (PMD) as input • High level properties for the user • How properties are computed from different types of performance measurements • ASL allows to specify: • Data model for incoming performance measurement data • Data model for performance properties

Performance prediction component • Analytical models forkernels – functions of SW and HW characteristics • Predict the cost of communications, computation times, LB, overall runtime • SW characteristics: • # FLOPs • Number, type and size of communications • Locality in the access to data • HW characteristics: • Network latency and bandwidth between pair of nodes • Computation power of each node • Memory hierarchy features

Performance Prediction Component (cont’d) • Example: Conjugate Gradient (sparse solver) # FLOPs per iteration: 19n + 7a –3 - # columns of the matrix and a is # communications • Some models can be applied at runtime using real HW parameters provided by monitoring services • The user can analyse new situations close to the HW conditions • Results of these models at runtime can be used to modify dynamic parameters of parallelisation of the kernels • Based on methods and SW for PARAISO library and AVISPA tool

Request and data transformation in G-PM . Mon. Reqs. PM Req. HL Meas. Requests Monitoring Services PMC HLAC UIVC RM Data PM Data HL/ AS Metrics ASL metrics specification

Interface to monitoring services (MS) (Task3.3) Processing requests/replies involving services of 3 types: • Information requests – supplying information • Manipulation requests – executing actions • Event requests – capturing events and triggering actions All this based on the OMIS specification. Retrieving information from R-GMA: directly or via OMIS ?

What MS are expected to enable #1 For PMC: Example 1: to measure an “send” function call duration concerning two processes p_1 and p_2, twoevent requests to MS can be issued on the following events: Function has startedandFunction has ended. Example 2: to get the volume of data transferred, oneevent request can be issued: Whenever “send” function call has started in process p_1, return the relevant parameter (data send volume) of the function call. Example 3: to get information on CPU utilization of 3 nodes, oneinformation request containing a list of nodes’ identifiers can be issued

What MS are expected to enable #2 For HLAC: Monitoring of probes inserted into the application’s code: similar to “function A has started” Possible actions: • Start (enable) / stop (disable) specified measurements • Increment a named counter by an amount (one of the parameters passed to the probe) • Increment a named integrating counter by an amount • Store a record in an event trace • Send event information directly to the performance tool

PMC Functions A typical measurement comprises: • ·value to be measured (time, data volume) • ·object(s) specification (site, host, process, thread) • ·time interval (complete execution, time/program phases) • ·location (program, module, function, block) • ·constraints (metrics limit values, communication counterparts). Example: Whole execution time of the program PR_1 on nodes n_1, n_2, n_3  4 phase execution: • 1. A measurement is defined as a difference of end time and start time • 2. Measurement is transformed into two requests to MS, to capture start and end of the program’s execution • 3. Requests submitted to MS • 4. Data of the occurred events are processed and returned as a required measurement value to the requesting component

User Interface and Visualization Component • Defining measurements with a measurement specification dialog • Resolution in: • Space (measurement location – whole system, sets of nodes, processes, threads, functions, code fragments) • Time (complete execution or particular phases) • Specifying the counterpart of the location-related part of the measurement in communication related measurements • Display specification dialog window to choose output window • Kinds of output windows: • Performance data over time axis • Single values of metric on an interval • Particular event occurrences over measurement interval

PATOP

Summary: definition and design work • data models for performance measurements • hierarchy and naming policy of objects to be monitored • tool/monitor interface, based on the expressing of measurement requests in terms of monitoring specification standard services • filtering and grouping policy for the tools • granularity of measurement representation and visualization modes • modes of delivering performance data for particular measurements

CG Task 2.4: Interactive and semiautomatic performance evaluation