Efficiently Exploring Compiler Optimization Sequences With Pairwise Pruning

Efficiently Exploring Compiler Optimization Sequences With Pairwise Pruning MilindChabbi John Mellor-Crummey Keith Cooper RICE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE This work is funded by the Defense Advanced Research Projects Agency (DARPA) through the Air Force Research Lab (AFRL).

Compiler Optimization Phase-Ordering Problem • Order of application of compiler optimizations drastically changes measured performance • Kulkarni et al. [CGO’ 06] show 38% average code size reduction • Zhao et al. [CGO’09] show up to 32% speedup • Production compilers still use fixed order Exascale systems multiply the cost of poor node performance Figure credit : Zhao et al. [CGO’09]

Phase-Order Selection Is Hard • Selecting best phase order is non-trivial • Program dependent • Relations between optimizations are complex • One optimization can enable/disable another • Exhaustive empirical exploration is expensive and unrealistic • 20 Optimization  2.5 * 1018 possible optimization sequences • “Exhaustive optimization phase order space exploration.” [Kulkarni et al. CGO '06] • Many optimization orders lead to structurally same function instances • Approaches • Analytically modeling code and effects of optimization is non-trivial and still in infancy • “M. L. A framework for exploring optimization properties.”[Zhao et al. CC '09] • Other techniques have been tried and proven to be effective • Genetic algorithms [Cooper et al. SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems 1999]

Roadmap • Phase order selection using pairwise constraints between optimizations • Graph model • Regression model • Conditional Sampling model Will show effectiveness on sample numerical program FMIN throughout the discussion with dynamic instruction count (DIC) as our optimization metric

Interaction Is Significant Between Pairs • Interaction is significant between pairs • Capture the ordering of pairs without regard to their absolute positions a b Good Bad b a a b Good Bad b a

Pruning Using Pairwise Constraints • Generate all possible optimization pairs of length 2 and record their performance characteristics • pairs to empirically evaluate • 20 optimization  380 pairs vs. 2.5 * 1018 sequences • For k-wise , it will be groups to empirically evaluate • Compare performance of each pair with its reverse to build pair-ordering constraints • If < then, final sequence will look like: • Reduces search space from O(n!) to O(n2) • Not a silver bullet strategy • Can be used to augment other search space pruning techniques a b b a a b

Background And Effectiveness Of Pairwise Pruning • Used by test community • In software testing : multiple input variables taking multiple values cause combinatorial explosion • Pairwise (a.k.a. all-pairs) testing is based on the observation that most faults are caused by interactions of at most two factors. • Pairwise-generated test suites cover all combinations of two therefore are much smaller than exhaustive ones yet still very effective in finding defects K. Burr and W. Young [STAR’98] D. R. Wallace and D. R. Kuhn[International Journal of Reliability, Quality and Safety Engineering,2001]

Roadmap • Phase order selection using pairwise constraints between optimizations • Graph model • Regression model • Conditional Sampling model

Graph Model • Nodes represent optimizations : E.g. { a, b, c} • Directed edges represent optimization orders • Graph construction • Empirically evaluate all pairs to add edges • ab < ba edge (a,b) • ac < ca edge (a,c) • cb < bc edge (c,b) • Add weights to edges based on profitability • E.g. (ab) Vs. (ba) has profit of 20% a 20 15 30 c b Graph may be cyclic or acyclic

Phase Order Selection For Acyclic Graphs • Topologically sort graph nodes to get a sequence • Such sequence (if exists) maintains all pair-ordering constrains a 20 15 30 c b Model found best sequence

Phase Order Selection For Graphs With Cycles • Cyclic ordering constraints: • ab < ba edge (a,b) • bc< cb edge (b,c) • ca< ac  edge (c,a) • Select an edge to break in each cycle • Select edge to minimize total weight of deleted edges (minimizes cost of pair-ordering constraint violation) • E.g. break edge (c,a) • Optimal sequence is : abc a 20 15 30 c b

Graph Model On FMIN • 13 optimizations • ~6 billion search space • Measure benefit of = 156 pairwise orderings • Model found best had 1111 DIC • 1103 DIC was best among 5000 random sequences • We were within 0.73% of the best • 3.9% of the sequences in the random sampling were better than the model found best

Performance Estimation • Want to predict performance of any random sequence • Useful to ensure that a given sequence optimized for one objective function does not dramatically worsen another objective • E.g. Speed vs. Code size • Provides an analytical model for performance prediction

Graph Model For Performance Estimation • Graph model has built-in ability to estimate performance of a given sequence • To estimate the performance of a random sequence: • Perform a walk on the graph using the given sequence • Add weights of violated ordering-preference along the walk to the performance number of the model found best sequence (already known)

Example Graph Model For Performance Estimation • Let observed performance of model found best sequence (abcd) be 1200 instructions • Estimated performance of sequence dacbis: 1200 + = 1340 + + + a a 120 30 30 60 b d d b 50 50 Edges decorated with absolute difference not relative % 20 20 40 40 c c

Performance Estimation With Graph Model On FMIN • 6 optimizations i.e. 720 sequences • Divergence + Phase mismatch

Issues With Graph Model • Considered just pairs of optimizations of length 2 • Neglected global behavior of optimizations • Assumed weights or behaviors of pairs to be context-insensitive (i.e. same even in full length sequence)  Want a model that is context-sensitive

Getting Context Sensitive WithRegression Model • Take into account context of the pairs by sampling full-length sequences • Represent sequences by regression equations • Represent all possible pairs as a parameter vector • Presence / absence of pairs in a sequence as input variables • Observed performance of a sequence as measured value = X • Parameter vector • Input • variables • Measured value

Example Linear Regression Model • Optimizations : { a, b, c } • Sequence : • Equation : Parameter vector 1 0 1 0 1 0 Xabc 1045 0 1 1 0 1 0 Xbac 1050 … …

Analytical Model For A Sequence • Sample unique sequences • Solve the linear regression to obtain value of each of Xij • Given a sequence : • Analytically projected performance is : Xcb + Xca + Xba

Regression Model On FMIN • Sequence of length 6 • 6! = 720 total sequences No phase mismatch, less divergence

Analysis of Regression-equation: Optimization Grouping Effect • Sequence of length 6 • 6! = 720 total sequences lg, lm lg, lm gn,ln,mn

Refined Regression Model 100% sampling to solve regression equation Superior projections, perfect corelation

Regression Model With Reduced Sampling Rate 12% sampling

Properties of Pairs Across Phase Shifts (m,n) = 0% (m,n) = 66.6% (l,n) = 0% (l,n) = 66.6% (g,n) = 0% (g,n) = 66.6%

Properties of Pairs Across Phase Shifts (l,g) = 0% (l,g) = 75% (l,g) = 0% (l,g) = 75% (l,m) = 0% (l,m) = 75% (l,m) = 0% (l,m) = 75% mn,ln,gn shift

Properties of Pairs Across Phase Shifts (c,d) = 100% (c,d) = 0% lm, lg shift 0% 100% 0% 100% mn,ln,gn shift 0% 100%

Conditional Sampling Model • Sample k << n! full length sequences that satisfy a set of pairwise ordering constraints C • Initially C = {} • We sampled 100 sequences in our implementation • Identify largest phase shift • Obtain pattern on either side of largest phase shift • e.g. pairs present with 100% or 0% on one side • Add pairwise constrains favoring better performance to C • Repeat sampling and refining Cuntil we reach a performance plateau

Conditional Sampling On FMIN (o,d) = 100% (o,d) = 17% 13 optimization : {a,b, c, d, g, l, m, n, o, q, t, v, z}

Conditional Sampling On FMIN (v,d) = 60% (v,d) = 100% 13 optimization : {a,b, c, d, g, l, m, n, o, q, t, v, z}

Conditional Sampling On FMIN an , oa, bn, cn, dn, gn, ln, ol, mn, on, qn, tn, vn, zn, oq, ov = 100% an = 39% oa = 80% cn = 39% bn = 46% dn = 43% on = 71% oq = 79% qn = 37% zn = 61% gn = 37% mn = 40% tn = 37% vn = 13% ov = 100% ln = 37% ol = 79%

Conditional Sampling On FMIN (c,d) = 100% (c,d) = 0% (c,v) = 100% (c,v) = 0% 13 optimization : {a,b, c, d, g, l, m, n, o, q, t, v, z}

Conditional Sampling On FMIN Required 500 samples i.e. 8 * 10-6 % sampling

Summary • Order of application of compiler optimizations has dramatic effect on performance • “Pairwise pruning” reduces empirical search space by several orders of magnitude, yet effective • Three models of pairwise pruning • Context insensitive graph model • Context sensitive regression model • Context sensitive Conditional Sampling model • Initial results are encouraging • Technique can be used to augment other search space pruning techniques

Backup slides • In our implementation we represent presence of pair by 1 and absence by -1 • Reduces unknowns to • We add a residue term Xresidue to account for residual minimum advantage of applying each optimization i • Each Xijaccounts only for the advantage/disadvantage of the ordered pair (i,j) • Standalone strength of optimizations iand jare accounted in Xresidue

Challenges And Opportunities • Not a silver bullet strategy • Sometimes patterns may not be as distinct as 0% or 100%, we may have to choose pattern based on higher percentage on one side • E.g. 90% on left vs. 30% on right • In our experiments we always took 100 samples, we can tune it with various techniques • Vuduc et al. [International Journal of High Performance Computing Applications - 2004] suggest a statistical early stopping criterion which suggests when sampling can be stopped

Graph Model On FMIN • Six optimizations : {c,d,g,l,m,n} • Model found optimal sequence : cndgml • Model found sequence had dynamic instruction count of 1221 which was best among entire 720 possible sequences

Efficiently Exploring Compiler Optimization Sequences With Pairwise Pruning

Efficiently Exploring Compiler Optimization Sequences With Pairwise Pruning

Presentation Transcript

Compiler-Assisted Optimization for Graphics

Compiler Optimization-Space Exploration

R Byte Code Optimization Compiler (1)

Machine Learning in Compiler Optimization

Exploring Protein Sequences

IBM Compiler Optimization on Bassi

Compiler Optimization and Code Generation

Optimizing General Compiler Optimization

Pairwise alignment of DNA/protein sequences

Towards a More Principled Compiler: Progressive Backend Compiler Optimization

High Performance Direct Pairwise Comparison of Genomic Sequences

IBM Compiler Optimization Arguments

IBM Compiler Optimization Arguments

Static Compiler Optimization Techniques

Learning with Pairwise Losses

Static Compiler Optimization Techniques

CSC D70: Compiler Optimization

CSC D70: Compiler Optimization Memory Optimizations

CSC D70: Compiler Optimization Parallelization

IBM Compiler Optimization on Bassi

Static Compiler Optimization Techniques

Compiler Optimization-Space Exploration