370 likes | 384 Views
This paper introduces MATE, a tool that enables dynamic performance optimization of parallel/distributed applications without user intervention. It provides monitoring, analysis, and tuning techniques to improve application performance during runtime. The tool allows for tuning without recompiling and rerunning, adapting to existing conditions, and evaluating the profitability of dynamic tuning.
E N D
Paradyn/Condor Week 2004 April 2004 MATE:Monitoring, Analysis and Tuning Environment Anna Morajko, Tomàs Margalef and Emilio LuqueUniversitat Autònoma de Barcelona
Content • Introduction • Dynamic Performance Tuning • MATE • Tuning Techniques • Conclusions and future work
Introduction Application performance • Demand of high performance computation • The main goal of parallel/distributed applications: solve a considered problem in the possible fastest way • Performance is one of the most important issues • Developers must optimize application performance to provide efficient and useful applications
Application development Source Instrumentation Application Measurements Modifications Changes Monitored execution Performance data Monitoring Tuning Bottlenecks Source code relation Performance analysis Solutions Introduction Application performance optimization Steps: • monitoring, • analysis, • tuning
Introduction Application performance optimization • Difficulties in finding bottlenecks and determining their solutions for parallel/distributed applications • Many tasks that cooperate with each other • High degree of expertise • Application behavior may change on input data or environment • Difficult task especially for non-expert users
Introduction Our goals • Investigate if it is possible to optimize performance of parallel/distributed applications dynamically without user intervention • Investigate the applicability of dynamic tuning • Create a tool that is able to dynamically optimize applications: • automatically improve application performance • improve the application execution during run time • tune without recompiling and rerunning • adapt application to existing conditions • Practically evaluate profitability of dynamic tuning
Problem / Application development Solution Source User Application Execution Modifications Performance data Instrumentation Monitoring Tuning Tool Events Performance analysis Introduction Dynamic automatic tuning
Content • Introduction • Dynamic Performance Tuning • MATE • Tuning Techniques • Conclusions and future work
Dynamic Performance Tuning Requirements • No user intervention • No source recompilation • Performance analysis on the fly • Global analysis • Decisions taken in a short time • Not complex analysis and modifications • Run time monitoring • Run time tuning • Modifications performed carefully • Parallel/distributed application control • Low intrusion
Dynamic Performance Tuning Key question What can be tuned in an application? Application knowledge Limited information about the application Tuning layers Approaches to tuning
Application code API Libraries code OS API Operating System kernel Hardware Dynamic Performance Tuning Tuning layers • Application specific code • Standard and custom libraries (API+code) • Operating system libraries (API+code) • Hardware
Application code API Libraries code OS API Operating System kernel Hardware Dynamic Performance Tuning Application • Application code changes • Different bottlenecks that depend on the application implementation Libraries • Library code changes • API usage • Standard • C/C++ library -> memory management, dynamic containers • Custom • PVM, MPI -> communication OS • Kernel code changes • API usage • Adjustment of options (e.g. TCP/IP socket), I/O request grouping More bottlenecks common for wider group of applications
Application code API Libraries code OS API Operating System kernel Hardware Dynamic Performance Tuning Approaches to tuning • Cooperative • Application must be prepared for tuning • Application-specific knowledgeis provided • Automatic - black-box • Tuning of any application • No application-specific knowledge is required • Knowledge about bottleneck is required • No changes are introduced into the application source code More cooperative, more application-specific More automatic, more generic information available
Formulasand conditions for optimal behavior measurements optimal values Dynamic Performance Tuning Knowledge representation • Measure points • Where the instrumentation must be inserted to provide measurements • Performance model • Determines minimal execution time of the entire application • Tuning points/actions/synchronization • What and when can be changed in the application • point – element that may be changed • action – what to invoke on a point • synchronization – when a tuning action can be invoked to ensure application correctness
Provided by the user Measure points Application code API Performance model Libraries code OS API Provided automatically by a tuning system Operating System kernel Tuning point, action, sync Hardware Dynamic Performance Tuning Application knowledge
Dynamic Performance Tuning Manipulation of a running application • monitoring – collect information about the behavior of a running application • tuning – insert tuning code into a running application that improves its performance Dynamic instrumentation – DynInst
Dynamic Performance Tuning Dynamic modifications of a running application with DynInst • Function replacement • Function invocation • One-time function invocation • Function call elimination • Function parameter changes • Variable changes
Content • Introduction • Dynamic Performance Tuning • MATE • Tuning Techniques • Conclusions and future work
MATE MATE – Monitoring, Analysis and Tuning Environment • prototype implementation in C++ • for PVM based applications • Sun Solaris 2.x / SPARC
Machine 1 Machine 2 pvmd pvmd modif. AC AC Task1 Task3 Task2 DMLib DMLib DMLib instr. instr. events events Machine 3 Analyzer MATE • Application Controller - AC • Dynamic Monitoring Library - DMLib • Analyzer
MATE: Application Controller Services • Distributed application control • Startup/exit of tasks (Tasker) • Startup/exit of PVM daemons, slave ACs (Hoster) • Clock synchronization • Application model management (Task Manager) • Performance monitoring (Monitors) • Manage monitoring instrumentation • Provide monitoring API for Analyzer • Performance tuning (Tuners) • Manage tuning instrumentation • Provide tuning API for Analyzer
Machine 1 Task2 Task1 Instrument Via DynInst DMLib DMLib AC Monitor add event/ remove event Machine 2 Analyzer MATE: Application Controller • Monitors • Instrumentation management via DynInst • Dynamically load DMLib • Generate monitoring snippets that call appropriate library functions • Insert/remove snippets in/from requested points • API • AddEventTrace(tid, eventId, funcName, instrPlace, attrs) • RemoveEventTrace(tid,eventId)
Machine 1 Task2 Task1 Tune Via DynInst AC Tuner Machine 2 Apply tuning Analyzer MATE: Application Controller Tuners • Tuning via DynInst • Generate tuning snippet according to the request • Insert tuning snippet • API • LoadLibrary(tid,path) • SetVariableValue(tid,params,brkpt) • ReplaceFunction(…) • InsertFunctionCall(…) • OneTimeFunctionCall(…) • RemoveFunctionCall(…) • FunctionParamChange(…)
Machine 1 Task1 DMLib pvm_send (p1, p2) { } entry event DMLib_OpenEvent(); DMLib_AddIntAttr(); DMLib_AddIntAttr(); DMLib_CloseEvent(); 1 0 64884 524247262149 1 API implementation TCP/IP Analyzer MATE: Dynamic Monitoring Library Services • Register event • What – event type (id, place) • When – global timestamp • Where – task identifier • Requested attributes – e.g. function call parameters, return value • Deliver event to the Analyzer • API • DMLib_InitLogger(tid, analyzerHost,port,clockDiff) • DMLib_OpenEvent(id, nAttrs) • DMLib_AddIntAttr(value) • DMLib_AddFloatAttr(value) • DMLib_AddCharAttr(value) • DMLib_AddStringAttr(value) • DMLib_CloseEvent() • DMLib_DoneLogger()
MATE: Analyzer Services • Automatic performance analysis on the fly • Request for events • Collect incoming events • Find bottlenecks among events applying performance model • Find solutions that overcome bottlenecks • Send tuning request • Analyzer is provided with an application knowledge about performance problems • Information related to one problem we call a tuning technique • A tuning technique describes a complete performance optimization scenario
Analyzer Tunlet Performance model Measure points Tuning point, action, sync MATE: Analyzer Tunlets • Each technique is implemented in MATE as a tunlet • A tunlet contains specific code (analysis logic) related to one concrete performance problem • measure points – what events are needed • performance model – how to determine bottlenecks and solutions • tuning actions/points/synchronization - what to change, where, when • A tunlet is a C/C++ library dynamically loaded to the Analyzer process
thread Events (from DMLibs) via TCP/IP MetaData (from ACs) via TCP/IP Tuning request (to tuner) via TCP/IP Event Collector Controller AC Proxy Instrument. request (to monitor) via TCP/IP Event Repository DTAPI Application model Tunlet Tunlet Tunlet MATE: Analyzer
Content • Introduction • Dynamic Performance Tuning • MATE • Tuning Example • Conclusions and future work
Tuning Example Workload balancing (App layer) • Imbalance problem: • Heterogeneous computing and communication powers • Varying amount of distributed work • Goal: • minimize the idle time by balancing the work among the processes considering efficiency of machines • Balancing -> faster machines process more work than slower • It cannot be statically balanced before program execution (different input data, network load, machine power and load)
Tuning Example Workload balancing (App layer) • Many scheduling methods -> Factoring Scheduling method • Work is divided into different-size tuples according to the factor • Application must be tunable: • well known variable that represents the factor • the factor must be checked before each iteration of the work distribution • the work tuples are calculated using the factoring scheduling method and according to the current factor value
Tuning Example Example application • Forest Fire propagation – Xfire • High computation cost Benefits: 1) Up to 2% 2) Up to 49% 3) Up to 48% Scenarios: 1) homogeneous and dedicated 2) heterogeneous and dedicated 3) heterogeneous and non-dedicated
Content • Introduction • Dynamic Performance Tuning • MATE • Tuning Techniques • Conclusions and future work
Conclusions • The principal conclusion: dynamic tuning works, is applicable, effective and useful in certain conditions • Limits of such tuning -> incomplete application information • Classification of layers where tuning can be performed (OS, libraries, apps) • Approaches to tuning: automatic and cooperative • Application knowledge representation: • measure points, performance model, tuning point/action/sync
Conclusions • Working prototype environment – MATE – that automatically monitors, analyses and tunes running applications • Practical experiments conducted with MATE and parallel/distributed applications prove that it automatically adapts application behavior to existing conditions during run time!
Future work • Global and local analysis • Scalability (problems with global analysis) • Some problems can be treated locally • Performance analysis • How tuning techniques influence other techniques • Other approaches than performance model • Metrics • Complementary information provided by metrics • Provision of the application knowledge • Tunlet provided externally in a declarative manner • Instrumentation evaluation • Prediction of monitoring and tuning instrumentation cost
Future work • Tuning techniques • OS layer • TCP/IP options (e.g. sending without delay – Nagle’s algorithm) • I/O operations (e.g. read/write operations, I/O buffer size) • Library layer • Investigation of problems in MPI, numerical libraries • Application layer • Automatic selection of algorithm (e.g. sorting algorithm) • Recommendations • Provision of good explanation to the user • Towards grid
Thesis March, 2004 Thank you very much