550 likes | 708 Views
Computer Science CPSC 502 Lecture 4 Local Search (4.8). Lecture Overview. Local Search Stocastic Local Search (SLS) Comparing SLS SLS variants. Local Search: Motivation. Solving CSPs is NP-hard Search space for many CSPs is huge Exponential in the number of variables
E N D
Computer Science CPSC 502 Lecture 4 Local Search (4.8)
Lecture Overview • Local Search • Stocastic Local Search (SLS) • Comparing SLS • SLS variants
Local Search: Motivation • Solving CSPs is NP-hard • Search space for many CSPs is huge • Exponential in the number of variables • Even arc consistency with domain splitting is often not enough • Alternative: local search • use algorithms that search the space locally, rather than systematically • Often finds a solution quickly, but are not guaranteed to find a solution if one exists (thus, cannot prove that there is no solution)
Local Search • Idea: • Consider the space of complete assignments of values to variables (all possible worlds) • Neighbours of a current node are similar variable assignments • Move from one node to another according to a function that scores how good each assignment is • Useful method in practice • Best available method for many constraint satisfaction and constraint optimization problems
Local Search Problem: Definition • Definition: Alocal search problem consists of a: • CSP: a set of variables, domains for these variables, and constraints on their joint values. • A node in the search space will be a complete assignment to all of the variables. • Neighbour relation: an edge in the search space will exist • when the neighbour relation holds between a pair of nodes. • Scoring function: h(n), judges cost of a node (want to minimize) e.g., • - the number of constraints violated in node n. • - the cost of a state in an optimization context.
Example • Given the set of variables {V1 ….,Vn }, each with domain Dom(Vi) • The start node is any assignment {V1/ v1,…,Vn/ vn} . • The neighbors of node with assignment • A= {V1/ v1,…,Vn/ vn} • are nodes with assignments that differ from A for one value only
Search Space for Local Search V1 = v1 ,V2 = v1 ,..,Vn = v1 V1 = v2,V2 = v1 ,..,Vn = v1 V1 = v1 ,V2 = vn,..,Vn = v1 V1 = v4,V2 = v1 ,..,Vn = v1 V1 = v4 ,V2 = v2,..,Vn = v1 V1 = v4 ,V2 = v1 ,..,Vn = v2 V1 = v4 ,V2 = v3,..,Vn = v1 • Only the current node is kept in memory at each step. • Very different from the systematic tree search approaches we • have seen so far! • Local search does NOT backtrack!
Iterative Best Improvement • How to determine the neighbor node to be selected? • Iterative Best Improvement: • select the neighbor that optimizes some evaluation function • Evaluation function:h(n): number of constraint violations in state n • Greedy descent: evaluate h(n) for each neighbour, pick the neighbour n with minimal h(n) • Hill climbing: equivalent algorithm for maximization problems • Here: maximize the number of constraints satisfied
Hill Climbing NOTE: Everything that will be said for Hill Climbing is also true for Greedy Descent evaluation/scoring function Current state/possible world X1 (assume both vars have integer domain X2
Example: N-queen as a local search problem CSP: N-queen CSP • One variable per column; domains {1,…,N} => row where the queen in the ith column seats; • Constraints: no two queens in the same row, column or diagonal Neighbour relation: value of a single column differs Scoring function: number of constraint violations (i..e, number of attacks)
h = h = 5 h =
h = 2 h = 5 h = 0
Example: Greedy descent for N-Queen For each column, assign randomly each queen to a row (a number between 1 and N) Repeat • For each column & each number: Evaluate how many constraint violations changing the assignment would yield • Choose the column and number that leads to the fewest violated constraints; change it Until (solved or some stopping criterion is met) Each cell lists h (i.e. #constraints unsatisfied) if you move the queen from that column into the cell
General Local Search Algorithm 1: Procedure Local-Search(V,dom,C) 2: Inputs 3: V: a set of variables 4: dom: a function such that dom(X) is the domain of variable X 5: C: set of constraints to be satisfied 6: Output complete assignment that satisfies the constraints 7: Local 8: A[V] an array of values indexed by V 9: repeat 10: for each variable X do 11: A[X] ←a random value in dom(X); 12: 13: while (stopping criterion not met & A is not a satisfying assignment) 14: Select a variable Y and a value V ∈dom(Y) 15: Set A[Y] ←V 16: 17: if (A is a satisfying assignment) then 18: return A 19: 20: until termination Random initialization Local Search Step Based on local information. E.g., for each neighbour evaluate how many constraints are unsatisfied.Greedy descent: select Y and V to minimize #unsatisfied constraints at each step
Example: N-Queens 5 steps h = 17 h = 1
The problem of local minima • Which move should we pick in this situation? • Current cost: h=1 • No single move can improve on this • In fact, every single move only makes things worse (h 2) • Locally optimal solution • Since we are minimizing: local minimum
Local minima • Most research in local search concerns effective mechanisms for escaping from local minima • Want to quickly explore many local minima: global minimum is a local minimum, too Evaluation function Evaluation function B State Space (1 variable) State Space (1 variable) Local minima
Lecture Overview • Local Search • Stochastic Local Search (SLS) • Comparing SLS • SLS variants
Stochastic Local Search • We will use greedy steps to find local minima • Move to neighbour with best evaluation function value • We will use randomness to avoid getting trapped in local minima
Stochastic Local Search for CSPs • Start node: random assignment • Goal: assignment with zero unsatisfied constraints • Heuristic function h: number of unsatisfied constraints • Lower values of the function are better • Stochastic local search is a mix of: • Greedy descent: move to neighbor with lowest h • Random walk: take some random steps • Random restart: reassigning values to all variables
General Local Search Algorithm 1: Procedure Local-Search(V,dom,C) 2: Inputs 3: V: a set of variables 4: dom: a function such that dom(X) is the domain of variable X 5: C: set of constraints to be satisfied 6: Output complete assignment that satisfies the constraints 7: Local 8: A[V] an array of values indexed by V 9: repeat 10: for each variable X do 11: A[X] ←a random value in dom(X); 12: 13: while (stopping criterion not met & A is not a satisfying assignment) 14: Select a variable Y and a value V ∈dom(Y) 15: Set A[Y] ←V 16: 17: if (A is a satisfying assignment) then 18: return A 19: 20: until termination Extreme case 1: random sampling.Restart at every step: Stopping criterion is “true” Random restart Extreme case 2: greedy descentOnly restart in local minima: Stopping criterion is “no more improvement in eval. function h” Select variable/value greedily.
Tracing SLS algorithms in AIspace • Let’s look at these “extreme cases” in AIspace: • Greedy Descent • Random Sampling (always selects the assignment at random) • Simple scheduling problem 2 in AIspace:
Greedy descent vs. Random sampling • Greedy descent is • good for finding local minima • bad for exploring new parts of the search space • Random sampling is • good for • bad for
Greedy descent vs. Random sampling • Greedy descent is • good for finding local minima • bad for exploring new parts of the search space • Random sampling is • good for exploring new parts of the search space • bad for finding local minima • A mix of the two can work very well
Greedy Descent + Randomness • Greedy steps • Move to neighbour with best evaluation function value • Next to greedy steps, we can allow for: • Random restart: reassign random values to all variables (i.e. start fresh) • Random steps: move to a random neighbour • Only doing random steps is called “random walk”
Which randomized method would work best in each of the these two search spaces? Evaluation function Evaluation function B A State Space (1 variable) State Space (1 variable)
Which randomized method would work best in each of the these two search spaces? Evaluation function Evaluation function B A State Space (1 variable) State Space (1 variable) Greedy descent with random steps best on B Greedy descent with random restart best on A • But these examples are simplified extreme cases for illustration • in practice, you don’t know what your search space looks like • Usually integrating both kinds of randomization works best
Stochastic Local Search for CSPs: details • Examples of ways to add randomness to local search for a CSP • one stage selection: choose best variable and value assignment: • instead choose a random variable-value pair • two stage selection: first choose a variable (e.g. the one in the most conflicts), then best value • Selecting variables: • Sometimes choose the variable which participates in the largest number of conflicts • Sometimes choose a random variable that participates in some conflict • Sometimes choose a random variable • Selecting values • Sometimes choose the best value for the chosen variable • Sometimes choose a random value for the chosen variable
Random Walk (one-stage) One way to add randomness to local search for a CSP Sometime chose the pair according to the scoring function Sometime chose at random • E.G in 8-queen, given the assignment here • How many neighbors? • Chose …….. • Chose…..
Random Walk (one-stage) One way to add randomness to local search for a CSP Sometime chose the pair according to the scoring function Sometime chose a random one • E.G in 8-queen, given the assignment here • How many neighbors? 56 • Viwith h(Vi) = V 2 • Chose any of the 56 neighbors
0 2 2 3 3 2 3 Random Walk (two-stage selection) • Selecting a variable first, then a value • Selecting variable: alternate among selecting one • that participates in the largest number of conflicts. • A random variable that participates in some conflict. • A random variable • Selecting value: alternate among selecting one • That minimizes # of conflicts • at random Aispace 2 a: Greedy Descent with Min-Conflict Heuristic
0 2 2 3 3 2 3 Random Walk (two-stage selection) • Selecting a variable first, then a value • Selecting variable: alternate among selecting one • that participates in the largest number of conflicts: V5 • A random variable that participates in some conflict: {V4, V5 ,V8} • A random variable • Selecting value: alternate among selecting one • That minimizes # of conflicts: V5= 1 • at random Aispace 2 a: Greedy Descent with Min-Conflict Heuristic
Lecture Overview • Local Search • Stocastic Local Search (SLS) • Comparing SLS • SLS variants
Evaluating SLS algorithms • SLS algorithms are randomized • The time taken until they solve a problem is a random variable • It is entirely normal to have runtime variations of 2 orders of magnitude in repeated runs! • E.g. 0.1 seconds in one run, 10 seconds in the next one • On the same problem instance (only difference: random seed) • Sometimes SLS algorithm doesn’t even terminate at all: stagnation • If an SLS algorithm sometimes stagnates, what is its mean runtime (across many runs)? • Infinity! • In practice, one often counts timeouts as some fixed large value X • Still, summary statistics, such as mean run time or median run time, don't tell the whole story • E.g. would penalize an algorithm that often finds a solution quickly but sometime stagnates
Comparing SLS algorithms • A better way to evaluate empirical performance • Runtime distributions • Perform many runs (e.g. below left: 1000 runs) • Consider the empirical distribution of the runtimes • Sort the empirical runtimes • Compute fraction (probability) of solved runs for each runtime • Rotate graph 90 degrees. E.g. below: longest run took 12 seconds
Comparing runtime distributions x axis: runtime (or number of steps)y axis: proportion (or number) of runs solved in that runtime • Typically use a log scale on the x axis Slow, but does not stagnate Crossover point:if we run longer than 80 steps, green is the best algorithm 57% solved after 80 steps, then stagnate If we run less than 10 steps, red is thebest algorithm 28% solved after 10 steps, then stagnate # of steps Fraction of solved runs, i.e. P(solved within this # of steps/time)
Comparing runtime distributions • Which algorithm has the best median performance? • I.e., which algorithm takes the fewest number of steps to be successful in 50% of the cases? Fraction of solved runs, i.e. P(solved by this # of steps/time) # of steps
Runtime distributions in AIspace • Can look at some of the algorithms and their runtime distributionsin AISpace • There is a related question in Assign 1
SLS limitations • Typically no guarantee to find a solution even if one exists • SLS algorithms can sometimes stagnate • Get caught in one region of the search space and never terminate • Very hard to analyze theoretically • Not able to show that no solution exists • SLS simply won’t terminate • You don’t know whether the problem is infeasible or the algorithm has stagnated
SLS pros: anytime algorithms • When do you stop? • When you know the solution found is optimal (e.g. no constraint violations) • Or when you’re out of time: you have to act NOW • Anytime algorithm: • maintain the node with best h found so far (the “incumbent”) • given more time, can improve its incumbent
SLS pros: dynamically changing problems • The problem may change over time • Particularly important in scheduling • E.g., schedule for airline: • Thousands of flights and thousands of personnel assignments • A storm can render the schedule infeasible • Goal: Repair the schedule with minimum number of changes • Often easy for SLS starting from the current schedule • Other techniques usually: • Require more time • Might find solution requiring many more changes
Successful application of SLS • Scheduling of Hubble Space Telescope: reducing time to schedule 3 weeks of observations: • from one week to around 10 sec.
Lecture Overview • Recap of Lecture 16 • SLS variants • Evaluating SLS algorithms • Advanced SLS algorithms
Simulated Annealing • Key idea: Change the degree of randomness…. • Annealing: a metallurgical process where metals are hardened by being slowly cooled. • Analogy: start with a high ``temperature'‘. i.e., a high tendency to take random steps • Over time, cool down: more likely to follow the scoring function • Temperature reduces over time, according to an annealing schedule
Simulated Annealing: algorithm • Here's how it works • You are at a node n. Pick a variable at random and a new value at random. You generate n' • If it is an improvement i.e., h(n) – h(n’) > 0 , adopt it. • If it isn't an improvement, adopt it probabilistically depending on • the difference h(n) – h(n’) and • a temperature parameter, T. move to n' with probability
Simulated Annealing: algorithm • if there is no improvement, move to n' with probability • The higher T, the higher the probability of selecting a non improving node for the same h(n) – h(n’) • The higher |h(n) – h(n’)|, the more negative • The lower the probability of selecting a non improving node for the same T h(n)-h(n’)
Simulated Annealing: algorithm • The higher T, the higher the probability of selecting a non improving node for the same h(n) – h(n’) • The higher |h(n) – h(n’)|, the more negative • The lower the probability of selecting a non improving node for the same T h(n’)
Properties of simulated annealing search • If T decreases slowly enough, then simulated annealing search will find a global optimum with probability approaching 1 • - But “slowly enough” usually means longer than the time required by an exhaustive search of the assignment space • Finding a good annealing schedule is “an art” and very much problem dependent • Widely used in VLSI layout, airline scheduling, etc.