140 likes | 357 Views
Towards Automated Tuning of Parallel Programs. Jeffrey K. Hollingsworth hollings@cs.umd.edu Department of Computer Science University of Maryland, College Park, MD 20742. Why Automated Performance Tuning?. Software commonly have parameters that impact their performance.
E N D
Towards Automated Tuning of Parallel Programs Jeffrey K. Hollingsworth hollings@cs.umd.edu Department of Computer Science University of Maryland, College Park, MD 20742
Why Automated Performance Tuning? • Software commonly have parameters that impact their performance. • Optimal parameter values are usually variable and un-predictable. Automated Parameter tuning can be used for adaptive parameter tuning in complex software.
Automated Performance Tuning • Goal: Maximize achieved performance • Problems: • Large number of parameters to tune • Shape of objective function unknown • Multiple libraries and coupled applications • Analytical model may not be available • Requirements: • Runtime tuning for long running programs • Don’t try too many configurations • Avoid gradients
Active Harmony • Runtime performance optimization • Can also support training runs • Automatic library selection (code) • Monitor library performance • Switch library if necessary • Automatic performance tuning (parameter) • Monitor system performance • Adjust runtime parameters • Hooks for Compiler Frameworks • Working to integrate USC/ISI Chill • Looking at others too
Example: Cluster Based Web Server • 3-tier system • Harmony Provides • Parameter updates for DB, and App Severs • TPC-W Benchmark • Transactional web benchmark • Mimic operations of an e-commerce site • Uses Java implementation from Univ. of Wisconsin • Performance metrics • Web Interaction Per Second (WIPS)
Best configuration after 200 iterations Browsing Shopping Ordering Improvements compared to the default configuration 15% 16% 5% Cluster-Based Web Service Tuning
A Bit More About Harmony Search • Pre-execution • Sensitivity Discovery Phase • Used to Order not Eliminate search dimensions • Online • Use Parallel Rank Order Search • Different configurations on different nodes
Performance for Two Programs High Performance Linpack POP Ocean Code Benefits of Searching in Parallel
Performance of Matrix Multiply ActiveHarmony + CHiLL vs ATLAS and MKL 11
Must Coordinate Auto Tuners • Problem: Warring auto tuning systems • Multiple components “auto tuning” at once • Tuning based on multiple changes at once • Solution: • Need some level of coordination • Possible Answer: • Exposing different tuning systems • Part of PERI Auto-tuning Effort
Conclusion • Active Harmony • An infrastructure for runtime tuning • Automatic tuning and environment adaptation • It works ! • Auto Tuning Integration • Need to integrate multiple tools • Compilers, runtime, application
Acknowledgements • Coding and Experiments • I-Hsin Chung (IBM Watson) • Vahid Tabatabaee • Ananta Tiwari • Chill Integration Effort • Marry Hall (Utah) • Chun Chen (USC/ISI) • Jacqueline Chame (USC/ISI) • Funding • DOE – PERC/PERI • NSF • LTS (DoD)