160 likes | 171 Views
Study the correlation among inputs dimensions and develop a greedy algorithm to select representative code variants. Explore the root cause of sensitivity and determine how to mitigate it. Select the best code variants for new inputs using ML-based prediction or analytical models.
E N D
Input sensitivity Study • Input-sensitivity • Develop a greedy algorithm to select representative code variants
What can we do? • Study the correlation among inputs dimensions? • D=10 • There are 2^10 combination of dimensions • Which to study? • What is the purpose? • PCA? Feature selection?, etc • Or clustering and slicing?
What can we do? • Study the root cause of sensitivity • Why 1 code variant is much faster on certain inputs but slower on others? • How to get rid of such sensitivity? • Does ICC PGO help with multiple data sets? • Or, instead, we need high-level solution?
What can we do? • Given a new input, select the best code variants • ML-based Prediction • Classification: SVM, Decision tree, Clustering • Analytical model: • Linear regression • Or combined?
Code variant selection (1) • Greedy • If a code variant’s performance on input I is within 3% (P) to the best performance achieved by all code variants, consider it is the best • Select per-input best code variant which covers most inputs at each step • Perfect (greedy) • P is 0%, always the best performance • Perfect (true) • Pick the per-input best always • 11.7% across 1000 inputs
Code variant selection (3) • Observations • Different code variants performs best for different subsets of inputs (input sensitivity) • 3 or 4 code variants are sufficient to harvest most performance benefit (code representativeness)
Code variant and critical flags • Observations • Code variants selected in order: 89, 13, 64, 66
Discussion • What is the goal to do such ad-hoc analysis? • If our goal is prediction with machine learning techniques, we should try decision tree, neural networks, etc • If our goal is prediction with analytical modeling techniques, what could be the target of such models? • Performance modeling: predict duration of a run with input parameters • Or, we can use information derived from analytical models to improve machine learning. • The greedy algorithm could be the target
Plan • Continue study Kripke • More analysis (Analytical approach) • Uni-variate and multi-variate analysis on inputs • Per-code variant analysis • Develop prediction model (ML approach)