230 likes | 379 Views
Statistical Analysis of Inlining Heuristics in Jikes RVM. Jing Yang jy8y@cs.virginia.edu Department of Computer Science, University of Virginia. Outline. Introduction Data Set Description Data Analysis, Summarization and Interpretation Future Work Conclusion. Introduction. Java
E N D
Statistical Analysis of Inlining Heuristics in Jikes RVM Jing Yang jy8y@cs.virginia.edu Department of Computer Science, University of Virginia
Outline • Introduction • Data Set Description • Data Analysis, Summarization and Interpretation • Future Work • Conclusion
Introduction • Java • highly portable programming language • portability is often at the cost of execution speed • seeking optimizations to improve the performance • inlining – one of the heavily used optimizations • reduce overhead associated with the function call • code expansion • aggressive inlining can hurt performance
Introduction • Inlining Heuristic • used in Java Virtual Machines to decide whether to inline or not • need to be tuned to achieve the optimal solution • current tuning technique • manually – not accurate • genetic algorithm – time-consuming due to the large number of parameters
Introduction • Goal • locate parameters used in the inlining heuristic that actually impact its performance • focus on these “effective” parameters • reduce time for tuning process • simplify the inlining heuristic to reduce its running overhead without sacrificing its performance
Data Set Description • Data Collection • a complete combination of all the values – 500 • for each configuration, run the whole SPECjvm98 benchmark (including nine Java programs) – 9 • within each run, collect three performance metrics (compiled code size, compilation time, execution time) – 3 • raw data – 500*9*3=13,500 observations • form the raw data into different formats for different statistical analysis
Data Analysis, Summarization and Interpretation • Inside Performances • 3 variables – compiled code size, compilation time, execution time • geometric mean from the first seven benchmarks in SPECjvm98 • 500 observations • standardization before analysis
Data Analysis, Summarization and Interpretation • Principal Component Analysis
Data Analysis, Summarization and Interpretation • First Principal Component • explain 71.58% of the total sample variance. • correlations between the first principal component and the three variables • compiled code size/compilation time has a contrast to execution time • accords with our intuition of inlining
Data Analysis, Summarization and Interpretation • Performances versus Parameters • 7 variables – four parameters and three performance metrics • 500 observations • standardization before analysis
Data Analysis, Summarization and Interpretation • Canonical Correlation Analysis
Data Analysis, Summarization and Interpretation • Only Consider One Performance Metric
Data Analysis, Summarization and Interpretation • Conclusion • CALLEE_MAX_SIZE and MAX_INLINE_DEPTH are important • ALWAYS_INLINE_SIZE and CALLER_MAX_SIZE can be neglected unless only optimizing execution time
Data Analysis, Summarization and Interpretation • Verification • classify data from the remaining two benchmarks • the more important a parameter is, the better classification can be performed based on it • Fisher's discriminant function • pairwise classifications based on CALLEE_MAX_SIZE • multi-population classification based on CALLEE_MAX_SIZE • most separate pairwise classification based on ALWAYS_INLINE_SIZE, MAX_INLINE_DEPTH and CALLER_MAX_SIZE
Future Work • Verification by Jikes RVM • Better Discriminant Function • Statistical Method to Find Optimal Solution • Program Properties
Conclusion • Locate and Verify the Important Parameters • Reduce Time for Tuning Process • Simplify the Inlining Heuristic