1 / 17

Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization

This paper explores a hybrid approach to code optimization, combining the accuracy of analytic models and the speed of empirical search. The authors demonstrate the effectiveness of this approach using matrix multiplication as a case study. The results show significant improvements in performance compared to traditional methods.

kanem
Download Presentation

Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn1, M. Garzaran1, G. DeJong1, D. Padua1, G. Ren1, X. Li1, K. Yotov2, K. Pingali2 1 University of Illinois at Urbana-Champaign 2 Cornell University

  2. Two approaches to code optimization: • Empirical Search • E.g., execute and measure different versions of MM code with different tile sizes. • Slow • Accurate because of feedback • Models • E.g., calculate the best tile size for MM as a function of cache size. • Fast • May be inaccurate • No verification through feedback

  3. Hybrid Approach • Faster than empirical search • More accurate than the model • Use the model as a prior • Use active sampling to minimize the amount of searching

  4. Why is Speed Important? • Adaptation may have to be applied at runtime, where running time is critical. • Adaptation may have to be applied at compile time (e.g., with feedback from a fast simulator) • Library routines can be used as a benchmark to evaluate alternative machine designs.

  5. Problem: Matrix Multiplication • Tiling • Improves the locality of references • Cache Blocking (NB): Matrix is decomposed into smaller subblocks of size NBxNB • Matrix multiplication - illustrative example for testing the hybrid approach • Ultimate goal: a learning compiler that specializes itself to its installation environment, user profile, etc.

  6. Empirical Search: ATLAS • Try tiling parameters NB in the range in steps of 4

  7. Model (Yotov et. al.) • Compute NB which optimizes the use of the L1 cache. • Constructed by analyzing the memory access trace of the matrix multiplication code. • Formula: • Has been extended to optimize the use of the L2 cache

  8. Model in action: • Performance curve: • Vertical lines: model-predicted L1 and L2 blocking factors • Whether to tile for the L1 or the L2 cache depends on the architecture and the application

  9. Hybrid approach • Model performance with a family of regression curves • Regression (nonparam) • minimizing the average error • Regression (ML) • Distribution over regression curves • Pick the most likely curve

  10. Regression (Bayesian) • Prior distribution P(curve) over regression curves • Make regression curves with model-predicted maxima more likely • Posterior distribution given the data (Bayes rule): • P(curve|data)=P(data|curve) P(curve)/P(data) • Pick the maximum a-posteriori curve • Picks curves with peaks in model-predicted locations when the data sample is small • Picks curves which fit the data best when the sample is large

  11. Active sampling • Objectives: • Sample at lower-tile sizes – takes less time • Explore – don’t oversample in the same region • Get information about the dominant peak

  12. Solution: Potential Fieldsobjectives 1,2 • Positive charge at the origin • Negative charges at previously sampled points • Sample at the point which minimizes the field

  13. Potential Fieldsobjective 3 • Positive charge in the region of the dominant peak • How do we know which peak dominates: • Distribution over regression curves • can compute: P(peak1 is located at x), P(peak2 is located at x), P(peak1 is of height h), P(peak2 is of height h) • Hence, can compute P(peak1 dominates peak2) • Impose a positive charge in the region of each peak proportional to its probability of domination

  14. Results I – Regression Curves

  15. Results II – Time, Performance Time (mins) Performance (MFLOPS) • Sparc – actual improvement due to the hybrid search for NB: ~10% • SGI – improvement over both the model and ATLAS due to choosing to • tile for the L2 cache

  16. Results III – Library Performance

  17. Conclusion • Approach: incorporates the prior. • Active sampling: actively picks to sample in the most informative region. • Decreases the search time of the empirical search, improves on the model’s performance.

More Related