1 / 38

Dynamic Restarts Optimal Randomized Restart Policies with Observation

Dynamic Restarts Optimal Randomized Restart Policies with Observation. Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman. Outline. Background heavy-tailed run-time distributions of backtracking search restart policies

bert-davis
Download Presentation

Dynamic Restarts Optimal Randomized Restart Policies with Observation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic RestartsOptimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman

  2. Outline • Background • heavy-tailed run-time distributions of backtracking search • restart policies • Optimal strategies to improve expected time to solution using • observation of solver behavior during particular runs • predictive model of solver performance • Empirical results

  3. Backtracking Search • Backtracking search algorithms often exhibit a remarkable variability in performance among: • slightly different problem instances • slightly different heuristics • different runs of randomized heuristics • Problematic for practical application • Verification, scheduling, planning

  4. Very long Very short Heavy-tailed Runtime Distributions • Observation (Gomes 1997): distributions of runtimes of backtrack solvers often have heavy tails • infinite mean and variance • probability of long runs decays by power law (Pareto-Levy), rather than exponentially (Normal)

  5. Formal Models of Heavy-tailed Behavior • Imbalanced tree search models (Chen 2001) • Exponentially growing subtrees occur with exponentially decreasing probabilities • Heavy-tailed runtime distribution can arise in backtrack search for imbalanced models with appropriate parameters p and b • p is the probability of the branching heuristics making an error • b is the branch factor

  6. Randomized Restarts • Solution: randomize the systematic solver • Add noise to the heuristic branching (variable choice) function • Cutoff and restart search after some number of steps • Provably eliminates heavy tails • Effective whenever search stagnates • Even if RTD is not formally heavy-tailed! • Used by all state-of-the-art SAT engines • Chaff, GRASP, BerkMin • Superscalar processor verification

  7. Complete Knowledge of RTD P(t) D t

  8. Complete Knowledge of RTD P(t) D T* t

  9. Complete Knowledge of RTD P(t) D T* t

  10. No Knowledge of RTD • Open cases: • Partial knowledge of RTD (CP 2002) • Additional knowledge beyond RTD

  11. D1 D2 Example: Runtime Observations • Idea: use observations of early progress of a run to induce finer-grained RTD’s P(t) D T1 T2 T* t

  12. Example: Runtime Observations What is optimal policy, given original & component RTD’s, and classification of each run? • Lazy: use static optimal cutoff for combined RTD D1 D2 P(t) D T* t

  13. Example: Runtime Observations What is optimal policy, given original & component RTD’s, and classification of each run? • Naïve: use static optimal cutoff for each RTD D1 D2 P(t) T1* T2* t

  14. Results • Method for inducing component distributions using • Bayesian learning on traces of solver • Resampling & Runtime Observations • Optimal policy where observation assigns each run to a component distribution • Conditions under which optimal policy prunes one (or more) distributions • Empirical demonstration of speedup

  15. I. Learning to Predict Solver Performance

  16. Observation horizon Short Long Median run time Formulation of Learning Problem • Consider a burst of evidence over observation horizon • Learn a runtime predictive model using supervised learning Horvitz, et al. UAI 2001

  17. Runtime Features • Solver instrumented to record at each choice (branch) point: • SAT & CSP generic features: number free variables, depth of tree, amount unit propagation, number backtracks, … • CSP domain-specific features (QCP): degree of balance of uncolored squares, … • Gather statistics over 10 choice points: • initial / final / average values • 1st and 2nd derivatives • SAT: 127 variables, CSP: 135 variables

  18. Learning a Predictive Model • Training data: samples from original RTD labeled by (summary features, length of run) • Learn a decision tree that predicts whether current run will complete in less than the median run time • 65% - 90% accuracy

  19. Generating Distributions by Resampling the Training Data • Reasons: • The predictive models are imperfect • Analyses that include a layer of error analysis for the imperfect model are cumbersome • Resampling the training data: • Use the inferred decision trees to define different classes • Relabel the training data according to these classes

  20. Creating Labels • The decision tree reduces all the observed features to a single evidential featureF • F can be: • Binary valued • Indicates prediction: shorter than median runtime? • Multi-valued • Indicates particular leaf of the decision tree that is reached when trace of a partial run is classified

  21. Observed F Observed F Result • Decision tree can be used to precisely classify new runs as random samples from the induced RTD’s P(t) D median t Make Observation

  22. II. Creating Optimal Control Policies

  23. Control Policies • Problem Statement: • A process generates runs randomly from a known RTD • After the run has completed K steps, we may observe features of the run • We may stop a run at any point • Goal: Minimize expected time to solution • Note: using induced component RTD’s implies that runs are statistically independent • Optimal policy is stationary

  24. Optimal Policies Straightforward generalization to multi-valued features

  25. Case (2): Determining Optimal Cutoffs

  26. Optimal Pruning • Runs from component D2 should be pruned (terminated) immediately after observation when:

  27. III. Empirical Evaluation

  28. Backtracking Problem Solvers • Randomized SAT solver • Satz-Rand, a randomized version of Satz (Li 1997) • DPLL with 1-step lookahead • Randomization with noise parameter for increasing variable choices • Randomized CSP solver • Specialized CSP solver for QCP • ILOG constraint programming library • Variable choice, variant of Brelaz heuristic

  29. Domains • Quasigroup With Holes • Graph Coloring • Logistics Planning (SATPLAN)

  30. Dynamic Restart Policies • Binary dynamic policies • Runs are classified as either having short or long run-time distributions • N-ary dynamic policies • Each leaf in the decision tree is considered as defining a distinct distribution

  31. Policies for Comparison • Luby optimal fixed cutoff • For original combined distribution • Luby universal policy • Binary naïve policy • Select distinct, separately optimal fixed cutoffs for the long and for the short distributions

  32. Illustration of Cutoffs D1 D2 P(t) D T1* T2* T* t T1** T2** Make Observation

  33. Comparative Results Improvement of dynamic policies over Luby fixed optimal cutoff policy is 40~65%

  34. Cutoffs: Graph Coloring (Satz) Dynamic n-ary: 10, 430, 10, 345, 10, 10 Dynamic binary: 455, 10 Binary naive: 342, 500 Fixed optimal: 363

  35. Discussion • Most optimal policies turned out to prune runs • Policy construction independent from run classification – may use other learning techniques • Does not require highly-accurate prediction! • Widely applicable

  36. Limitations • Analysis does not apply in cases where runs are statistically dependent • Example: • We begin with 2 or more RTD’s • E.g.: of SAT and UNSAT formulas • Environment flips a coin to choose a RTD, and then always samples that RTD • We do not get to see the coin flip! • Now each unsuccessful run gives us information about that coin flip!

  37. The Dependent Case • Dependent case much harder to solve • Ruan et al. CP-2002: • “Restart Policies with Dependence among Runs: A Dynamic Programming Approach” • Future work • Using RTD’s of ensembles to reason about RTD’s of individual problem instances • Learning RTD’s on the fly (reinforcement learning)

  38. Big Picture control / policy runtime Solver Problem Instances dynamic features Learning / Analysis static features Predictive Model

More Related