Motivation: Why automatic parameter tuning ? (1)

Performance Prediction andAutomated Tuning of Randomized and Parametric Algorithms:An Initial Investigation Frank Hutter1, Youssef Hamadi2, Holger Hoos1, and Kevin Leyton-Brown11University of British Columbia, Vancouver, Canada2Microsoft Research Cambridge, UK

Motivation: Why automatic parameter tuning? (1) • Most approaches for solving hard combinatorial problems (e.g. SAT, CP) are highly parameterized • Tree search • Variable/value heuristic • Propagation • Whether and when to restart • How much learning • Local search • Noise parameter • Tabu length in tabu search • Strength of penalty increase and decrease in DLS • Pertubation, acceptance criterion, etc. in ILS Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Motivation: Why automatic parameter tuning? (2) • Algorithm design: new algorithm/application: • A lot of time is spent for parameter tuning • Algorithm analysis: comparability • Is algorithm A faster than algorithm B because they spent more time tuning it ?  • Algorithm use in practice: • Want to solve MY problems fast, not necessarily the ones the developers used for parameter tuning Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Related work in automated parameter tuning • Best fixed parameter setting for instance set • [Birattari et al. ’02, Hutter ’04, Adenso-Daz & Laguna ’05] • Algorithm selection/configuration per instance • [Lobjois and Lemaître, ’98, Leyton-Brown, Nudelman et al. ’02 & ’04, Patterson & Kautz ’02] • Best sequence of operators / changing search strategy during the search • [Battiti et al, ’05, Lagoudakis & Littman, ’01 & ‘02] Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Overview • Previous work on empirical hardness models[Leyton-Brown, Nudelman et al. ’02 & ’04] • EH models for randomized algorithms • EH models automatic tuning for parametric algorithms • Conclusions Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Empirical hardness models:Basics (1 algorithm) • Training: Given a set of t instances s1,...,st • For each instance si • Compute instance features xi = (xi1,...,xim) • Run algorithm and record its runtime yi • Learn function f: features  runtime, such that yi f(xi) for i=1,…,t • Test: Given a new instance st+1 • Compute features xt+1 • Predict runtime yt+1 = f(xt+1) Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Empirical hardness models:Which instance features? • Features should be polytime computable (for us, in seconds) • Basic properties, e.g. #vars, #clauses, ratio (13) • Graph-based characterics (10) • Estimates of DPLL search space size (5) • Local search probes (15) • Combine features to form more expressive basis functions • Basis functions  = (1,...,q) can be arbitrary combinations of the features x1,...,xm • Basis functions used for SAT in [Nudelman et al. ’04] • 91 original features: xi • Pairwise products of features: xi * xj • Only subset of these (drop useless basis functions) Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Empirical hardness models:How to learn function f: features  runtime? • Runtimes vary by orders of magnitudes, and we need to pick a model that can deal with that • Log-transform the output e.g. runtime is 103 secyi = 3 • Learn linear function of the basis functionsf(i) = i * wT yi • Learning reduces to fitting the weights w (ridge regression: w = ( + T )-1Ty) Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Algorithm selection based on empirical hardness models (e.g. satzilla) • Given portfolio of n different algorithms A1,...,An • Pick best algorithm for each instance • Training: • Learn n separate functions fj: features  runtime of algorithm j • Test (for each new instance st+1): • Predict runtime yjt+1 = fj(t+1) for each algorithm • Choose algorithm Aj with minimal yjt+1 Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Empirical hardness models for randomized algorithms • Can this same approach predict the run-time of incomplete, randomized local search algorithms? • Yes! • Incomplete • Limitation to satisfiable instances (train & test) • Local search • No changes needed • Randomized • Ultimately, want to predict entire run-time distribution (RTDs) • For our algorithms, these RTDs are typically exponential and can thus be characterized by a single sufficient statistic, such as median run-time Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Prediction of median run-time (only red stuff changed) • Training: Given a set of t instances s1,...,st • For each instance si • Compute features xi = (xi1,...,xim) • Run algorithm multiple timesto get its runtimes yi1, …, yik • Compute median miof yi1, …, yik • Learn function f: features median run-time, mif(xi) • Test: Given a new instance st+1 • Compute features xt+1 • Predict median run-time mt+1 = f(xt+1) Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Experimental setup: solvers • Two SAT solvers • Novelty+ (WalkSAT variant) • SAPS (Scaling and Probabilistic Smoothing) • Adaptive version of Novelty+ won SAT04 competition for random instances, SAPS came second • Runs cut off after 15 minutes Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Experimental setup: benchmarks • Three unstructured distributions: • CV-fix: c/v ratio 4.26 20,000 instances with 400 variables, (10,011 satisfiable) • CV-var: c/v ratio between 3.26 and 5.26 20,000 instances with 400 variables, (10,129 satisfiable) • SAT04: 3,000 instances created with the generators for the SAT04 competition (random instances) with same parameters (1,420 satisfiable) • One structured distribution: • QWH: quasi groups with holes, 25% to 75% holes, 7,498 instances, satisfiable by construction • All data sets were split 50:25:25 for train/valid/test Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Prediction based onmedians of 1000 runs Results for predicting (median) run-time for SAPS on CV-var Prediction based onsingle runs Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Results for predicting median run-time based on 10 runs Novelty+ on SAT04 SAPS on QWH Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Predicting complete run-time distribution for exponential RTDs (from extended CP version) RTD of SAPS on q0.75 instance of QWH RTD of SAPS on q0.25 instance of QWH Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Empirical hardness models for parametric algorithms (only red stuff changed) • Training: Given a set of instances s1,...,st • For each instance si • Compute features xi • Run algorithm with some settings pi1,...,pinito get runtimes yi1,...,yini • Basis functions i j = (xi, pi j) of features and parameter settings(quadratic expansion of params, multiplied by instance features) • Learn a function g:basis functions run-time, g(i j)  yi j • Test: Given a new instance st+1 • Compute features xt+1 • For each parameter setting p of interest, construct t+1p=(xt+1, p) and predict run-time g(t+1) Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Experimental setup • Parameters • Novelty+: noise  {0.1,0.2,0.3,0.4,0.5,0.6} • SAPS: all 30 combinations of  {1.2,1.3,1.4} and  {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9} • One additional data set • Mixed = union of QWH and SAT04 Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

(from extended CP version) 5 instances, one symbol per instance 1 instance in detail, (blue diamonds in left figure) Results for predicting SAPS runtime with 30 different parameter settings on QWH Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Results for automated parameter setting Not the best algorithm to tune Do you have a better one? Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Results for automated parameter setting, Novelty+ on Mixed Compared to best fixed parameters Compared to random parameters Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Overview • Previous work on empirical hardness models[Leyton-Brown, Nudelman et al. ’02 & ’04] • EH models for randomized algorithms • EH models and automated tuning for parametric algorithms • Conclusions Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Conclusions • We can predict the run-time of randomized and incomplete parameterized local search algorithms • We can automatically find good parameter settings • Better than the default settings • Sometimes better than the best possible fixed setting • There’s no free lunch • Long initial training time • Need domain knowledge to define features for a domain (only once per domain) Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Future work • Even better predictions • More involved ML techniques, e.g. Gaussian processes • Predictive uncertainty to know when our predictions are not reliable • Reduce training time • Especially for high-dimensional parameter spaces, the current approach will not scale • Use active learning to choose best parameter configurations to train on • We need compelling domains: please come talk to me! • Complete parametric SAT solvers • Parametric solvers for other domains where features can be defined (CP? Even planning?) • Optimization algorithms Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

The End • Thanks to • Holger Hoos, Kevin Leyton-Brown,Youssef Hamadi • Reviewers for helpful comments • You for your attention  Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Experimental setup: solvers • Two SAT solvers • Novelty+ (WalkSAT variant) • Default noise setting 0.5 (=50%) for unstructured instances • Noise setting 0.1 used for structured instances • SAPS (Scaling and Probabilistic Smoothing) • Default setting (alpha, rho) = (1.3, 0.8) • Adaptive version of Novelty+ won SAT04 competition for random instances, SAPS came second • Runs cut off after 15 minutes Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Which features are most important?(from extended CP version) • Results consistent with those for deterministic tree-search algorithms • Graph-based and DPLL-based features • Local search probes are even more important here • Only very few features needed for good models • Previously observed for all-sat data [Nudelman et al. ’04] • A single quadratic basis function is often almost as good as the best feature subset • Strong correlation between features • Many choices yield comparable performance Hutter, Hamadi, Hoos, Leyton-Brown: Automatic Parameter Tuning

Motivation: Why automatic parameter tuning ? (1)

Motivation: Why automatic parameter tuning ? (1)

Presentation Transcript

Solaris/Linux Performance Measurement and Tuning

Motivation: In Learning and Teaching

Tuning Fork Tests

Automatic Verification of Industrial Designs

Demand-side and Supply-side Policies

Automatic Transmission Fundamentals

Motivation and Emotion

ASE106: Tuning ASE for PeopleSoft Applications

Automatic Text Summarization

Performance Tuning Workshop - Architecture

An Introduction to Parallel Processing

Theories of Motivation Hunger Motivation Eating Disorders

Computational Physics An Introduction to High-Performance Computing

Conditional Random Fields for Automatic Speech Recognition

Manual and Automatic Subjectivity and Sentiment Analysis

Performance Tuning Tips

Motivation

Module 24 Encoding: Getting Information In

Automatic Voltage Regulator

Conditional Random Fields for Automatic Speech Recognition