390 likes | 490 Views
Efficiently Exploring Compiler Optimization Sequences With Pairwise Pruning. Milind Chabbi John Mellor- Crummey Keith Cooper RICE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE.
E N D
Efficiently Exploring Compiler Optimization Sequences With Pairwise Pruning MilindChabbi John Mellor-Crummey Keith Cooper RICE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE This work is funded by the Defense Advanced Research Projects Agency (DARPA) through the Air Force Research Lab (AFRL).
Compiler Optimization Phase-Ordering Problem • Order of application of compiler optimizations drastically changes measured performance • Kulkarni et al. [CGO’ 06] show 38% average code size reduction • Zhao et al. [CGO’09] show up to 32% speedup • Production compilers still use fixed order Exascale systems multiply the cost of poor node performance Figure credit : Zhao et al. [CGO’09]
Phase-Order Selection Is Hard • Selecting best phase order is non-trivial • Program dependent • Relations between optimizations are complex • One optimization can enable/disable another • Exhaustive empirical exploration is expensive and unrealistic • 20 Optimization 2.5 * 1018 possible optimization sequences • “Exhaustive optimization phase order space exploration.” [Kulkarni et al. CGO '06] • Many optimization orders lead to structurally same function instances • Approaches • Analytically modeling code and effects of optimization is non-trivial and still in infancy • “M. L. A framework for exploring optimization properties.”[Zhao et al. CC '09] • Other techniques have been tried and proven to be effective • Genetic algorithms [Cooper et al. SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems 1999]
Roadmap • Phase order selection using pairwise constraints between optimizations • Graph model • Regression model • Conditional Sampling model Will show effectiveness on sample numerical program FMIN throughout the discussion with dynamic instruction count (DIC) as our optimization metric
Interaction Is Significant Between Pairs • Interaction is significant between pairs • Capture the ordering of pairs without regard to their absolute positions a b Good Bad b a a b Good Bad b a
Pruning Using Pairwise Constraints • Generate all possible optimization pairs of length 2 and record their performance characteristics • pairs to empirically evaluate • 20 optimization 380 pairs vs. 2.5 * 1018 sequences • For k-wise , it will be groups to empirically evaluate • Compare performance of each pair with its reverse to build pair-ordering constraints • If < then, final sequence will look like: • Reduces search space from O(n!) to O(n2) • Not a silver bullet strategy • Can be used to augment other search space pruning techniques a b b a a b
Background And Effectiveness Of Pairwise Pruning • Used by test community • In software testing : multiple input variables taking multiple values cause combinatorial explosion • Pairwise (a.k.a. all-pairs) testing is based on the observation that most faults are caused by interactions of at most two factors. • Pairwise-generated test suites cover all combinations of two therefore are much smaller than exhaustive ones yet still very effective in finding defects K. Burr and W. Young [STAR’98] D. R. Wallace and D. R. Kuhn[International Journal of Reliability, Quality and Safety Engineering,2001]
Roadmap • Phase order selection using pairwise constraints between optimizations • Graph model • Regression model • Conditional Sampling model
Graph Model • Nodes represent optimizations : E.g. { a, b, c} • Directed edges represent optimization orders • Graph construction • Empirically evaluate all pairs to add edges • ab < ba edge (a,b) • ac < ca edge (a,c) • cb < bc edge (c,b) • Add weights to edges based on profitability • E.g. (ab) Vs. (ba) has profit of 20% a 20 15 30 c b Graph may be cyclic or acyclic
Phase Order Selection For Acyclic Graphs • Topologically sort graph nodes to get a sequence • Such sequence (if exists) maintains all pair-ordering constrains a 20 15 30 c b Model found best sequence
Phase Order Selection For Graphs With Cycles • Cyclic ordering constraints: • ab < ba edge (a,b) • bc< cb edge (b,c) • ca< ac edge (c,a) • Select an edge to break in each cycle • Select edge to minimize total weight of deleted edges (minimizes cost of pair-ordering constraint violation) • E.g. break edge (c,a) • Optimal sequence is : abc a 20 15 30 c b
Graph Model On FMIN • 13 optimizations • ~6 billion search space • Measure benefit of = 156 pairwise orderings • Model found best had 1111 DIC • 1103 DIC was best among 5000 random sequences • We were within 0.73% of the best • 3.9% of the sequences in the random sampling were better than the model found best
Performance Estimation • Want to predict performance of any random sequence • Useful to ensure that a given sequence optimized for one objective function does not dramatically worsen another objective • E.g. Speed vs. Code size • Provides an analytical model for performance prediction
Graph Model For Performance Estimation • Graph model has built-in ability to estimate performance of a given sequence • To estimate the performance of a random sequence: • Perform a walk on the graph using the given sequence • Add weights of violated ordering-preference along the walk to the performance number of the model found best sequence (already known)
Example Graph Model For Performance Estimation • Let observed performance of model found best sequence (abcd) be 1200 instructions • Estimated performance of sequence dacbis: 1200 + = 1340 + + + a a 120 30 30 60 b d d b 50 50 Edges decorated with absolute difference not relative % 20 20 40 40 c c
Performance Estimation With Graph Model On FMIN • 6 optimizations i.e. 720 sequences • Divergence + Phase mismatch
Issues With Graph Model • Considered just pairs of optimizations of length 2 • Neglected global behavior of optimizations • Assumed weights or behaviors of pairs to be context-insensitive (i.e. same even in full length sequence) Want a model that is context-sensitive
Roadmap • Phase order selection using pairwise constraints between optimizations • Graph model • Regression model • Conditional Sampling model
Getting Context Sensitive WithRegression Model • Take into account context of the pairs by sampling full-length sequences • Represent sequences by regression equations • Represent all possible pairs as a parameter vector • Presence / absence of pairs in a sequence as input variables • Observed performance of a sequence as measured value = X • Parameter vector • Input • variables • Measured value
Example Linear Regression Model • Optimizations : { a, b, c } • Sequence : • Equation : Parameter vector 1 0 1 0 1 0 Xabc 1045 0 1 1 0 1 0 Xbac 1050 … …
Analytical Model For A Sequence • Sample unique sequences • Solve the linear regression to obtain value of each of Xij • Given a sequence : • Analytically projected performance is : Xcb + Xca + Xba
Regression Model On FMIN • Sequence of length 6 • 6! = 720 total sequences No phase mismatch, less divergence
Analysis of Regression-equation: Optimization Grouping Effect • Sequence of length 6 • 6! = 720 total sequences lg, lm lg, lm gn,ln,mn
Refined Regression Model 100% sampling to solve regression equation Superior projections, perfect corelation
Regression Model With Reduced Sampling Rate 12% sampling
Roadmap • Phase order selection using pairwise constraints between optimizations • Graph model • Regression model • Conditional Sampling model
Properties of Pairs Across Phase Shifts (m,n) = 0% (m,n) = 66.6% (l,n) = 0% (l,n) = 66.6% (g,n) = 0% (g,n) = 66.6%
Properties of Pairs Across Phase Shifts (l,g) = 0% (l,g) = 75% (l,g) = 0% (l,g) = 75% (l,m) = 0% (l,m) = 75% (l,m) = 0% (l,m) = 75% mn,ln,gn shift
Properties of Pairs Across Phase Shifts (c,d) = 100% (c,d) = 0% lm, lg shift 0% 100% 0% 100% mn,ln,gn shift 0% 100%
Conditional Sampling Model • Sample k << n! full length sequences that satisfy a set of pairwise ordering constraints C • Initially C = {} • We sampled 100 sequences in our implementation • Identify largest phase shift • Obtain pattern on either side of largest phase shift • e.g. pairs present with 100% or 0% on one side • Add pairwise constrains favoring better performance to C • Repeat sampling and refining Cuntil we reach a performance plateau
Conditional Sampling On FMIN (o,d) = 100% (o,d) = 17% 13 optimization : {a,b, c, d, g, l, m, n, o, q, t, v, z}
Conditional Sampling On FMIN (v,d) = 60% (v,d) = 100% 13 optimization : {a,b, c, d, g, l, m, n, o, q, t, v, z}
Conditional Sampling On FMIN an , oa, bn, cn, dn, gn, ln, ol, mn, on, qn, tn, vn, zn, oq, ov = 100% an = 39% oa = 80% cn = 39% bn = 46% dn = 43% on = 71% oq = 79% qn = 37% zn = 61% gn = 37% mn = 40% tn = 37% vn = 13% ov = 100% ln = 37% ol = 79%
Conditional Sampling On FMIN (c,d) = 100% (c,d) = 0% (c,v) = 100% (c,v) = 0% 13 optimization : {a,b, c, d, g, l, m, n, o, q, t, v, z}
Conditional Sampling On FMIN Required 500 samples i.e. 8 * 10-6 % sampling
Summary • Order of application of compiler optimizations has dramatic effect on performance • “Pairwise pruning” reduces empirical search space by several orders of magnitude, yet effective • Three models of pairwise pruning • Context insensitive graph model • Context sensitive regression model • Context sensitive Conditional Sampling model • Initial results are encouraging • Technique can be used to augment other search space pruning techniques
Backup slides • In our implementation we represent presence of pair by 1 and absence by -1 • Reduces unknowns to • We add a residue term Xresidue to account for residual minimum advantage of applying each optimization i • Each Xijaccounts only for the advantage/disadvantage of the ordered pair (i,j) • Standalone strength of optimizations iand jare accounted in Xresidue
Challenges And Opportunities • Not a silver bullet strategy • Sometimes patterns may not be as distinct as 0% or 100%, we may have to choose pattern based on higher percentage on one side • E.g. 90% on left vs. 30% on right • In our experiments we always took 100 samples, we can tune it with various techniques • Vuduc et al. [International Journal of High Performance Computing Applications - 2004] suggest a statistical early stopping criterion which suggests when sampling can be stopped
Graph Model On FMIN • Six optimizations : {c,d,g,l,m,n} • Model found optimal sequence : cndgml • Model found sequence had dynamic instruction count of 1221 which was best among entire 720 possible sequences