CS 4700: Foundations of Artificial Intelligence

CS 4700:Foundations of Artificial Intelligence Carla P. Gomes gomes@cs.cornell.edu Module: Randomization in Complete Tree Search Algorithms Wrap-up of Search!

Randomization in Local Search • Randomized strategies are very successful in the area of local search. • Random Hill Climbing • Simulated annealing • Genetic algorithms • Tabu Search • Gsat and variants. • Key Limitation? Inherent incomplete nature of local search methods.

Randomization in Tree Search • Introduce randomness in a tree search method e.g., by randomlybreaking ties in variable and/or value selection. • Why would we do that? Can we also add a stochastic element to a systematic (tree search) procedure without losing completeness?

Backtrack Search ( aOR NOT b OR NOT c ) AND ( b OR NOT c) AND ( a OR c)

Backtrack Search Two Different Executions ( aOR NOT b OR NOT c ) AND ( b OR NOT c) AND ( a OR c)

The fringe of the search space The fringe of search space

Time: 7 11 30 (*) (*) (*) no solution found - reached cutoff: 2000 Latin Square Completion:Randomized Backtrack Search Easy instance – 15 % pre-assigned cells Gomes et al. 97

2000 500 Erratic Mean Behavior 3500! sample mean Median = 1! number of runs (on the same instance)

Number backtracks Number backtracks 75%<=30 5%>100000 Proportion of cases Solved F(x)

Run Time Distributions • The runtime distributions of some of the instances reveal interesting properties: • I Erratic behavior of mean. • II Distributions have “heavy tails”.

Heavy-Tailed Distributions • … infinite variance … infinite mean • Introduced by Pareto in the 1920’s • --- “probabilistic curiosity.” • Mandelbrot established the use of heavy-tailed distributions to model real-world fractal phenomena. • Examples: stock-market, earth-quakes, weather,...

Decay of Distributions • Standard --- Exponential Decay • e.g. Normal: • Heavy-Tailed --- Power Law Decay • e.g. Pareto-Levy:

Levi -Power law Decay Cauchy -Power law Decay Normal - Exponential Decay Normal, Cauchy, and Levy

Tail Probabilities (Standard Normal, Cauchy, Levy)

Normal distribution  kurtosis is 3 Fat tailed distribution  when kurtosis > 3 (e.g., exponential, lognormal) second central moment (i.e., variance) fourth central moment Fat tailed distributions • Kurtosis =

Fat and Heavy-tailed distributions Exponential decay for standard distributions, e.g. Normal, Logonormal, exponential: Normal Heavy-Tailed Power Law Decay e.g. Pareto-Levy:

Pareto Distribution • where > 0 is a shape parameter • Density Function f(x) = P[ X = x ] • f(x) =  / x( + 1) for x 1 • Distribution Function F(x) = P[ X  x ] • F(x) = 1 - 1 / x for x 1 • Survival Function (Tail probability S(x) = 1 – F(x) = P[X>x] • S(x) = 1 / x for x 1

Pareto Distribution • Moments E(Xn) =  / ( - n) if n <  E(Xn) =  if n. Mean  E(X) =  / ( - 1) if  > 1. E(X) =  if  1. Variance var(X) =  / [( - 1)2( - 2)] if  > 2 var(X) =  if  2.

How to Check for “Heavy Tails”? • Power-law decay of tail • Log-Log plot of tail of distribution (Survival function or 1-F(x): e.g for the Pareto S(x) = 1 / x for x 1 ) • should be approximately linear. • Slope gives value of • infinite mean and infinite variance • infinite variance

Pareto =1Lognormal 1,1 Lognormal(1,1) Pareto(1) f(x) X Infinite mean and infinite variance.

How to Visually Check for Heavy-Tailed Behavior Log-log plot of tail of distribution exhibits linear behavior.

Survival Function:Pareto and Lognormal

Example of Heavy Tailed Model • Random Walk: • Start at position 0 • Toss a fair coin: with each head take a step up (+1) with each tail take a step down (-1) X --- number of steps the random walk takes to return to position 0.

Long periods without zero crossing Zero crossing The record of 10,000 tosses of an ideal coin (Feller)

50% Random Walk Median=2 2 Heavy-tails vs. Non-Heavy-Tails Normal (2,1000000) 1-F(x) Unsolved fraction O,1%>200000 Normal (2,1) X - number of steps the walk takes to return to zero (log scale)

18% unsolved 0.002% unsolved => Infinite mean Heavy-Tailed Behavior in Latin Square Completion Problem (1-F(x))(log) Unsolved fraction Number backtracks (log)

Walsh 99 How Toby Walsh Fried his PC(Graph Coloring)

To Be or Not To Be • Heavy-Tailed

Random Binary CSP Models Model E <N, D, p> N – number of variables; D – size of the domains: p – proportion of forbidden pairs (out of D2N ( N-1)/ 2) N – from 15 to 50; (Achlioptas et al 2000)

Typical Case Analysis: Model E Phase Transition Phenomenon: Discriminating “easy” vs. “hard” instances % of solvable instances Computational Cost (Mean) Constrainedness Hogg et al 96

Runtime distributions

Towards phase transition

Explaining and Exploiting Fat and Heavy-Tailed

Backdoors Hidden tractable substructure in real-world problems subset of the “critical” variables such that once assigned a value the instance simplifies to a tractable class practical consequences How to explain short runs? Heavy/Fat Tails – wide range of solution times very short and very long runtimes Formal Models of Heavy and Fat Tails in Combinatorial Search

Aftersetting 5 backdoor vars Aftersetting 12 backdoor vars Initial Constraint Graph Logistics planning problem formula- 843 vars, 7,301 constraints – 16 backdoor variables (visualization by Anand Kapur, 4701 project) Logistics Planning – instances with O(log(n)) backdoors

Exploiting Backdoors

Algorithms • Three kinds of strategies for dealing with backdoors: Acomplete backtrack-search deterministicalgorithm Acomplete randomized backtrack-searchalgorithm Provably better performance over the deterministic one A heuristicly guided complete randomized backtrack-search algorithm Assumes existence of a good heuristic for choosing variables to branch on We believe this is close to what happens in practice Williams, Gomes, Selman 03/04

Deterministic Generalized Iterative Deepening

x1 = 0 x2 = 0 xn = 0 x1 = 1 x2 = 1 xn = 1 Generalized Iterative Deepening (…) All possible trees of depth 1

x2 = 1 x2 = 0 x2 = 1 x2 = 0 Generalized Iterative Deepening Level 2 x1 = 0 x1 = 1 All possible trees of depth 2

xn = 1 xn = 0 xn= 1 xn = 0 Generalized Iterative Deepening Level 2 xn-1 = 0 Xn-1 = 1 All possible trees of depth 2 Level 3, level 4, and so on …

Randomized Generalized Iterative Deepening Assumption: There exists a backdoor whose size is bounded by a function of n (call it B(n)) Idea: Repeatedly choose random subsets of variables that are slightly larger than B(n), searching these subsets for the backdoor

Det. algorithm outperforms brute-force search for k > 4.2 Deterministic Versus Randomized Suppose variables have 2 possible values (e.g. SAT) For B(n) = n/k, algorithm runtime is cn c Deterministic strategy Randomized strategy k

Complete Randomized Depth First Search with Heuristic • Assume we have the following. • DFS, a generic depth first search randomized • backtrack search solver with: • (polytime) sub-solverA • Heuristic Hthat (randomly) chooses variables to branch on, in polynomial time • Hhas probability 1/h of choosing a • backdoor variable (h is a fixed constant) • Call this ensemble (DFS, H, A)

Polytime Restart Strategy for(DFS, H, A) • Essentially: • If there is a small backdoor, then(DFS, H, A) has a restart strategy that runs in polytime.

Runtime Table for Algorithms DFS,H,A B(n) = upper bound on the size of a backdoor, given n variables When the backdoor is a constant fraction of n, there is an exponential improvement between the randomized and deterministic algorithm Williams, Gomes, Selman 03/04

How to avoid the long runs in practice? Userestarts or parallel / interleavedruns to exploit the extreme variance performance. Restartsprovablyeliminate heavy-tailed behavior.

Restarts 70% unsolved no restarts 1-F(x) Unsolved fraction restart every 4 backtracks 0.001% unsolved 250 (62 restarts) Number backtracks (log)

100000 ~10 restarts ~100 restarts 2000 20 Example of Rapid Restart Speedup(planning) Number backtracks (log) Cutoff (log)

CS 4700: Foundations of Artificial Intelligence