1 / 38

UAB

Paradyn/Condor Week 2005 March 2005. UAB. Dynamic Tuning of Master/Worker Applications. Anna Morajko, Paola Caymes Scutari, Tomàs Margalef, Eduardo Cesar, Joan Sorribes and Emilio Luque Universitat Autònoma de Barcelona. Outline. Introduction MATE Number of workers Data distribution

austink
Download Presentation

UAB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Paradyn/Condor Week 2005 March 2005 UAB Dynamic Tuning of Master/Worker Applications Anna Morajko, Paola Caymes Scutari, Tomàs Margalef, Eduardo Cesar, Joan Sorribes and Emilio Luque Universitat Autònoma de Barcelona

  2. Outline • Introduction • MATE • Number of workers • Data distribution • Conclusions

  3. Outline • Introduction • MATE • Number of workers • Data distribution • Conclusions

  4. Introduction Application performance • The main goal of parallel/distributed applications: solve a considered problem in the possible fastest way • Performance is one of the most important issues • Developers must optimize application performance to provide efficient and useful applications

  5. Introduction (II) • Difficulties in finding bottlenecks and determining their solutions for parallel/distributed applications • Many tasks that cooperate with each other • Application behavior may change on input data or environment • Difficult task especially for non-expert users

  6. Outline • Introduction • MATE • Number of workers • Data distribution • Conclusions

  7. Problem / Solution Application development Source User Application Execution Performance data Monitoring Tuning Tool Events Performance analysis MATE • Monitoring, Analysis and Tuning Environment • Dynamic automatic tuning of parallel/distributed applications Modifications DynInst Instrumentation

  8. MATE (II) Machine 1 Machine 2 pvmd pvmd modif. AC AC Task1 Task3 Task2 DMLib DMLib DMLib instr. instr. events • Application Controller - AC • Dynamic Monitoring Library - DMLib • Analyzer events Machine 3 Analyzer

  9. MATE (II) • Analyzer • Carries out the application performance analysis • Detects problems “on the fly” and requests changes Machine 1 Machine 2 pvmd pvmd modif. AC AC Task1 Task3 Task2 DMLib DMLib DMLib instr. instr. events • Application Controller - AC • Dynamic Monitoring Library - DMLib • Analyzer events Machine 3 Analyzer

  10. MATE (II) Machine 1 Machine 2 pvmd pvmd modif. AC AC Task1 Task3 Task2 DMLib DMLib DMLib instr. instr. • Application Controller (AC) • Controls the execution of the application • Has a Monitor module to manage instrumentation via DynInst and gather execution information • Has a Tuner module to perform tuning via DynInst events • Application Controller - AC • Dynamic Monitoring Library - DMLib • Analyzer events Machine 3 Analyzer

  11. MATE (II) • Dynamic Monitoring Library (DMLib) • Facilitates the instrumentation and data collection • Responsible for registration of events Machine 1 Machine 2 pvmd pvmd modif. AC AC Task1 Task3 Task2 DMLib DMLib DMLib instr. instr. events • Application Controller - AC • Dynamic Monitoring Library - DMLib • Analyzer events Machine 3 Analyzer

  12. MATE (III) • Automatic performance Analysis on the fly • Find bottlenecks among events applying performance model • Find solutions that overcome bottlenecks • Analyzer is provided with an application knowledge about performance problems • Information related to one problem is called a tuning technique • A tuning technique describes a complete performance optimization scenario

  13. Analyzer Tunlet Performance model Measure points Tuning point, action, sync MATE (IV) • Each tuning technique is implemented in MATE as a “tunlet” • A tunlet is a C/C++ library dynamically loaded to the Analyzer process • measure points – what events are needed • performance model – how to determine bottlenecks and solutions • tuning actions/points/synchronization - what to change, where, when

  14. thread Events (from DMLibs) via TCP/IP MetaData (from ACs) via TCP/IP Tuning request (to tuner) via TCP/IP Event Collector Controller AC Proxy Instrument. request (to monitor) via TCP/IP Event Repository DTAPI Application model Tunlet Tunlet Tunlet MATE (V)

  15. Outline • Introduction • MATE • Number of workers • Data distribution • Conclusions

  16. Master Worker Worker Worker Worker Number of Workers • Master/Worker paradigm • Easy to understand concept, but with some bottlenecks • Example: inadequate number of workers • - workers  master idle • + workers  + communication

  17. iftl > + then else Number of Workers (II) • Execution Trace of an Homogeneous Master-Worker Application • (where are homogeneous: • message size • workers execution time) Master Workers Where... tl = latency λ = inverse bandwidth vi = size of tasks sent to worker i, in bytes. n = current number of workers in the application.

  18. Master Workers tci Number of Workers (II) • Execution Trace of an Homogeneous Master-Worker Application • (where are homogeneous: • message size • workers execution time) Where... tci = time that worker i spends processing a task

  19. Master Workers tl + λ*vm Number of Workers (II) • Execution Trace of an Homogeneous Master-Worker Application • (where are homogeneous: • message size • workers execution time) Where... tl = latency λ = inverse bandwidth vm = size of results sent back to master

  20. Number of Workers (III)

  21. Number of Workers (IV)

  22. Machi ne A (master) Machine B (worker) send (entry) receive (entry) send (exit) receive (exit) send (entry) receive (entry) send (exit) receive ( exit ) time time Number of Workers: Tunlet • Measure points: • The amount of data sent to the workers and received by the master • The total computational time of workers • The network overhead and bandwidth

  23. Number of Workers: Tunlet (II) • Performance function: • Calculation of the optimal number of workers: • Tuning actions: • To change the value of “numworkers” to add or remove as many workers as is needed

  24. Experimentation • Example application • Forest Fire Propagation simulator – Xfire • Intensive computing application Master/Worker • Simulation of the fireline propagation • Calculates the next position of the fireline considering the current fireline position and weather factors, vegetation,etc. • Platform • Cluster of Pentium 4, 1.8Ghz, SuSE Linux 8.0, connected by 100Mb/sec network

  25. Experimentation (II) • Load in the system • We designed different external load patterns • They simulate the system’s time-sharing • Allow us to reproduce experiments • Case Studies • Xfire executed with different fixed number of workers without any tuning, introducing external loads • Xfire executed under MATE, introducing external loads

  26. Starts with 1 worker and adapts it 1400 1200 1000 800 Execution time (Sec.) 600 400 200 0 1 2 4 6 8 10 12 14 16 18 20 22 24 26 Xf+MATE Case studies Experimentation (III) • Note that... • Execution time of Xfire under MATE is close to the best execution times obtained. • Resources devoted to the application using MATE, are used when they are really needed.

  27. Experimentation (IV) • Statically, the model fits • Dynamically, there are some problems • Nopt Could be extremely high • Computation power added or removed may be not significant considering the previous computational power • Solution • Finding a “reasonable” number of workers that define a trade off between resources utilization and execution time.

  28. Outline • Introduction • MATE • Number of workers • Data distribution • Conclusions

  29. Master Workers Data Distribution • Imbalance Problem: • Heterogeneous computing and communication powers • Varying amount of distributed work Unbalanced iteration Balanced iteration

  30. Data Distribution (II) • Goal: • minimize the idle time by balancing the work among the processes considering efficiency of machines • Performance Model • Factoring Scheduling method • Work is divided into different-size tuples according to the factor

  31. Data Distribution: Tunlet • Measure points: • The work unit processing time. • The latency and bandwidth • Performance function: • Calculation of the factor. • Analyzer simulates the execution considering different factors. Finally, it decides the best factor. • Currently we are working on an analytical model to determine the factor • Tuning actions: • To change the value of “TheFactorF”

  32. Experimentation • Example application • Forest Fire Propagation simulator – Xfire • Platform • Cluster of Pentium 4, 1.8Ghz, SuSE Linux 8.0, connected by 100Mb/sec network

  33. Experimentation (II) • Load in the system • We designed different external load patterns • They simulate the system’s time-sharing • Permit us to reproduce experiments • Study Cases • Xfire executed without any tuning • Xfire, introducing controlled variable external loads • Xfire executed under MATE, introducing variable external loads

  34. 18000 16000 14000 12000 10000 8000 Execution time (Sec.) 6000 4000 2000 0 Xfire 1 2 4 8 16 30 Xfire+Load Number of Workers Xfire+Load+MATE Experimentation (III) • Note that… • Introduction of an extra load increases the execution time. • Execution with MATE corrects the factor value to improve the execution time

  35. Outline • Introduction • MATE • Number of workers • Data distribution • Conclusions

  36. Conclusions and open lines • Conclusions • Prototype environment – MATE – automatically monitors, analyses and tunes running applications • Practical experiments conducted with MATE and parallel/distributed applications prove that it automatically adapts application behavior to existing conditions during run time • MATE in particular is able to tune Master/Worker applications and overcome the possible bottlenecks: number of workers and data distribution • Dynamic tuning works, is applicable, effective and useful in certain conditions.

  37. Conclusions and open lines • Open Lines • Determining the “reasonable” number of workers. • Considering interaction between different tunlets. • Providing the system with other tuning techniques.

  38. Thank you…

More Related