170 likes | 187 Views
This paper explores a hybrid approach to code optimization, combining the accuracy of analytic models and the speed of empirical search. The authors demonstrate the effectiveness of this approach using matrix multiplication as a case study. The results show significant improvements in performance compared to traditional methods.
E N D
Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn1, M. Garzaran1, G. DeJong1, D. Padua1, G. Ren1, X. Li1, K. Yotov2, K. Pingali2 1 University of Illinois at Urbana-Champaign 2 Cornell University
Two approaches to code optimization: • Empirical Search • E.g., execute and measure different versions of MM code with different tile sizes. • Slow • Accurate because of feedback • Models • E.g., calculate the best tile size for MM as a function of cache size. • Fast • May be inaccurate • No verification through feedback
Hybrid Approach • Faster than empirical search • More accurate than the model • Use the model as a prior • Use active sampling to minimize the amount of searching
Why is Speed Important? • Adaptation may have to be applied at runtime, where running time is critical. • Adaptation may have to be applied at compile time (e.g., with feedback from a fast simulator) • Library routines can be used as a benchmark to evaluate alternative machine designs.
Problem: Matrix Multiplication • Tiling • Improves the locality of references • Cache Blocking (NB): Matrix is decomposed into smaller subblocks of size NBxNB • Matrix multiplication - illustrative example for testing the hybrid approach • Ultimate goal: a learning compiler that specializes itself to its installation environment, user profile, etc.
Empirical Search: ATLAS • Try tiling parameters NB in the range in steps of 4
Model (Yotov et. al.) • Compute NB which optimizes the use of the L1 cache. • Constructed by analyzing the memory access trace of the matrix multiplication code. • Formula: • Has been extended to optimize the use of the L2 cache
Model in action: • Performance curve: • Vertical lines: model-predicted L1 and L2 blocking factors • Whether to tile for the L1 or the L2 cache depends on the architecture and the application
Hybrid approach • Model performance with a family of regression curves • Regression (nonparam) • minimizing the average error • Regression (ML) • Distribution over regression curves • Pick the most likely curve
Regression (Bayesian) • Prior distribution P(curve) over regression curves • Make regression curves with model-predicted maxima more likely • Posterior distribution given the data (Bayes rule): • P(curve|data)=P(data|curve) P(curve)/P(data) • Pick the maximum a-posteriori curve • Picks curves with peaks in model-predicted locations when the data sample is small • Picks curves which fit the data best when the sample is large
Active sampling • Objectives: • Sample at lower-tile sizes – takes less time • Explore – don’t oversample in the same region • Get information about the dominant peak
Solution: Potential Fieldsobjectives 1,2 • Positive charge at the origin • Negative charges at previously sampled points • Sample at the point which minimizes the field
Potential Fieldsobjective 3 • Positive charge in the region of the dominant peak • How do we know which peak dominates: • Distribution over regression curves • can compute: P(peak1 is located at x), P(peak2 is located at x), P(peak1 is of height h), P(peak2 is of height h) • Hence, can compute P(peak1 dominates peak2) • Impose a positive charge in the region of each peak proportional to its probability of domination
Results II – Time, Performance Time (mins) Performance (MFLOPS) • Sparc – actual improvement due to the hybrid search for NB: ~10% • SGI – improvement over both the model and ATLAS due to choosing to • tile for the L2 cache
Conclusion • Approach: incorporates the prior. • Active sampling: actively picks to sample in the most informative region. • Decreases the search time of the empirical search, improves on the model’s performance.