Optimizing General Compiler Optimization

Optimizing General Compiler Optimization M. Haneda, P.M.W. Knijnenburg, and H.A.G. Wijshoff

Problem: Optimizing optimizations • A compiler usually has many optimization settings (e.g. peephole, delayed-branch, etc) • gcc 3.3 has 54 optimization options • gcc 4 has over 100 possible settings • Very little is known about how these options affect each other • Compiler writers typically include switches that bundle together many optimization options • gcc –O1, -O2, -O3

…but can we do better? • It is possible to perform better than these predefined optimization settings, but doing so requires extensive knowledge of the code as well as the available optimization options • How do we define one set of options that would work well with a large variety of programs?

Motivation for this paper • Since there are too many optimization settings, an exhaustive search would cost too much • gcc 3: 2^50 different combinations! • We want to define a systematic method to find the optimal settings, with a reduced search space • Ideally, we would like to do this with minimal knowledge of what the options actually will do

Big Idea • We want to find the biggest subsets of compiler options that positively interact with each other • Once we obtain these subsets, we will try to combine them together, under the condition that they do not negatively affect each other • We will select our ultimate optimal compiler setting from the result of these set combinations

Full vs. Fractional Factorial Design • Full Factorial Design: explores the entire search space, with every possible combination • Given k options, this will take O(2^k) time • Fractional Factorial Design: explores a reduced search space, that is representative of the full search space • This can be done using orthogonal arrays

Orthogonal Arrays • An Orthogonal Array is a matrix of 0’s and 1’s. • The rows represent the experiments to be performed. • The columns represent the factors that the experiment tries to analyze • Any option is equally likely to be turned on/off. • Given a particular experiment with a particular option turned on, all the other options are still equally likely to be turned on/off

Orthogonal Arrays

Algorithm – Step 1 • Finding maximum subsets of positively interacting options • Step 1.1: Find a set of options that give the best overall improvement • For any single optimization setting i, compute the average speedup for all the settings in the search space in which i is turned on • Select M of the highest average improvement settings

Step 1.1: Selecting the initial set

Algorithm – Step 1(cont.) • Step 1.2: Iteratively add new options to the already obtained sets, to get a maximum set of positively reinforcing optimizations • Ex: If using options A and B together produces a more optimal setting than just using A, then add B • If using {A, B} and C together produces a more optimal setting than {A, B}, then add C to {A, B}

Algorithm – Step 2 • Take the sets that we already have and try to combine them together, assuming that they do not negatively influence each other. • This is done to maximize the number of settings turned on for each set • Example: • If {A, B, C} and {D, E} do not counteract each other, then we can combine them into {A, B, C, D, E} • Otherwise, leave them separate

Algorithm – Step 3 • Take the resulting sets from step 2, and select the one with the best overall improvement. • The result would be the ideal combination of optimization settings, according to this methodology.

Final Result

Comparing with existing compiler switches

Comparing results • The compiler setting obtained by this methodology outpeforms –O1, -O2, and –O3 on almost all the SPECint95 benchmarks • -O3 performs better on li (39.2% vs. 38.4%) • The new setting delivers the best performance for perl (18.4% vs. 10.5%)

Conclusion • The paper introduced a systematic way of combining compiler optimization settings • Used a reduced search space, constructed as an orthogonal array • Can be done with no knowledge of actual options • Can be done independently of architecture • Can be applied to a wide variety of applications

Future work • Using the same methodology to find a good optimization setting for a particular domain of applications • Applying the methodology to newer versions of the gcc compiler, such as gcc 4.0.1

Optimizing General Compiler Optimization

Optimizing General Compiler Optimization

Presentation Transcript

Compiler-Assisted Optimization for Graphics

The structure of an optimizing compiler

Optimizing Compiler . Scalar optimizations .

IBM Compiler Optimization on Bassi

Translation Validation for an Optimizing Compiler

Optimizing compiler . Interpocedural optimizations .

Compiler Optimization-Space Exploration

Optimizing Android Performance with GCC Compiler

Optimizing Compiler for the Cell Processor

IBM Compiler Optimization Arguments

IBM Compiler Optimization Arguments

Static Compiler Optimization Techniques

Static Compiler Optimization Techniques

CSC D70: Compiler Optimization

CSC D70: Compiler Optimization Parallelization

Optimizing Compiler . Scalar optimizations .

Optimizing compiler. Vectorization .

Static Compiler Optimization Techniques

Compiler Optimization-Space Exploration

The structure of an optimizing compiler

CStar Optimizing a C Compiler

Translation Validation for an Optimizing Compiler