CS 416 Artificial Intelligence

CS 416Artificial Intelligence Lecture 6 Informed Searches

A sad day • No more Western Union telegrams • 1844 - First telegram (Morse) “What hath God wrought” • 1858 - First transatlantic telegram from Queen Victoria to President Buchanan • Break, break, break… 1866 working again • A few words per minute • Punctuation cost extra. “stop” was cheaper.

Assignment 1 • Getting Visual Studio • Signing up for Thursday (3-5) Friday (2-3:30) • Explanation of IsNormal ( )

A* without admissibility B 4 5 5 A D 9 0 4 C 5 200 Never explored!

A* with admissibility B 4 5 5 A D 9 0 4 C 5 1

Another A* without admissibility B 1 5 5 A D 9 0 4 C 5 6 Never explored!

Admissible w/o Consistency B 2 0 1 2 A D 0 3 1 1 C 1 1

Meta-foo • What does “meta” mean in AI? • Frequently it means step back a level from foo • Metareasoning = reasoning about reasoning • These informed search algorithms have pros and cons regarding how they choose to explore new levels • a metalevel learning algorithm may learn how to combine search techniques to suit application domain

Heuristic Functions • 8-puzzle problem Avg Depth=22 Branching = approx 3 322 states 170,000 repeated

Heuristics • The number of misplaced tiles • Admissible because at least n moves required to solve n misplaced tiles • The distance from each tile to its goal position • No diagonals, so use Manhattan Distance • As if walking around rectilinear city blocks • also admissible

Compare these two heuristics • Effective Branching Factor, b* • If A* generates N nodes to find the goal at depth d • b* = branching factor such that a uniform tree of depth d contains N+1 nodes (we add one for the root node that wasn’t included in N) • N+1 = 1 + b* + (b*)2 + … + (b*)d

Compare these two heuristics • Effective Branching Factor, b* • b* close to 1 is ideal • because this means the heuristic guided the A* search linearly • If b* were 100, on average, the heuristic had to consider 100 children for each node • Compare heuristics based on their b*

Compare these two heuristics

Compare these two heuristics • h2 is always better than h1 • for any node, n, h2(n) >= h1(n) • h2dominates h1 • Recall all nodes with f(n) < C* will be expanded? • This means all nodes, h(n) + g(n) < C*, will be expanded • All nodes where h(n) < C* - g(n) will be expanded • All nodes h2 expands will also be expanded by h1 and because h1 is smaller, others will be expanded as well

Inventing admissible heuristic funcs • How can you create h(n)? • Simplify problem by reducing restrictions on actions • Allow 8-puzzle pieces to sit atop on another • Call this a relaxed problem • The cost of optimal solution to relaxed problem is admissible heuristic for original problem • It is at least as expensive for the original problem

Examples of relaxed problems • A tile can move from square A to square B if • A is horizontally or vertically adjacent to B • and B is blank • A tile can move from A to B if A is adjacent to B (overlap) • A tile can move from A to B if B is blank (teleport) • A tile can move from A to B (teleport and overlap) • Solutions to these relaxed problems can be computed without search and therefore heuristic is easy to compute

Multiple Heuristics • If multiple heuristics available: • h(n) = max {h1(n), h2(n), …, hm(n)}

Use solution to subproblem as heuristic • What is optimal cost of solving some portion of original problem? • subproblem solution is heuristic of original problem

Pattern Databases • Store optimal solutions to subproblems in database • We use an exhaustive search to solve every permutation of the 1,2,3,4-piece subproblem of the 8-puzzle • During solution of 8-puzzle, look up optimal cost to solve the 1,2,3,4-piece subproblem and use as heuristic

Learning • Could also build pattern database while solving cases of the 8-puzzle • Must keep track of intermediate states and true final cost of solution • Inductive learning builds mapping of state -> cost • Because too many permutations of actual states • Construct important features to reduce size of space

Local Search Algorithms andOptimization Problems

Characterize Techniques • Uninformed Search • Looking for a solution where solution is a path from start to goal • At each intermediate point along a path, we have no prediction of the future value of the path • Informed Search • Again, looking for a path from start to goal • This time we have more insight regarding the value of intermediate solutions

Now change things a bit • What if the path isn’t important, just the goal? • So the goal is unknown • The path to the goal need not be solved • Examples • What quantities of quarters, nickels, and dimes add up to $17.45 and minimizes the total number of coins • Is the price of Microsoft stock going up tomorrow?

Local Search • Local search does not keep track of previous solutions • Instead it keeps track of current solution (current state) • Uses a method of generating alternative solution candidates • Advantages • Use a small amount of memory (usually constant amount) • They can find reasonable (note we aren’t saying optimal) solutions in infinite search spaces

Optimization Problems • Objective Function • A function with vector inputs and scalar output • goal is to search through candidate input vectors in order to minimize or maximize objective function • Example • f (q, d, n) = 1,000,000 if q*0.25 + d*0.1 + n*0.05 != 17.45 = q + n + d otherwise • minimize f

Search Space • The realm of feasible input vectors • Also called state-space landscape • Usually described by • number of dimensions (3 for our change example) • domain of each dimension (#quarters is discrete from 0 to 69…) • functional relationship between input vector and objective function output • no relationship (chaos or seemingly random) • smoothly varying • discontinuities

Search Space • Looking for global maximum (or minimum)

Hill Climbing • Also called Greedy Search • Select a starting point and set current • evaluate (current) • loop do • neighbor = highest value successor of current • if evaluate (neighbor) <= evaluate (current) • return current • else current = neighbor

Hill climbing gets stuck • Hiking metaphor (you are wearing glasses that limit your vision to 10 feet) • Local maxima • Ridges (in cases when you can’t walk along the ridge) • Plateau • why is this a problem?

Hill Climbing Gadgets • Variants on hill climbing play special roles • stochastic hill climbing • don’t always choose the best successor • first-choice hill climbing • pick the first good successor you find • useful if number of successors is large • random restart • follow steepest ascent from multiple starting states • probability of finding global max increases with number of starts

Hill Climbing Usefulness • It Depends • Shape of state space greatly influences hill climbing • local maxima are the Achilles heel • what is cost of evaluation? • what is cost of finding a random starting location?

Simulated Annealing • A term borrowed from metalworking • We want metal molecules to find a stable location relative to neighbors • heating causes metal molecules to jump around and to take on undesirable (high energy) locations • during cooling, molecules reduce their movement and settle into a more stable (low energy) position • annealing is process of heating metal and letting it cool slowly to lock in the stable locations of the molecules

Simulated Annealing • “Be the Ball” • You have a wrinkled sheet of metal • Place a BB on the sheet and what happens? • BB rolls downhill • BB stops at bottom of hill (local or global min?) • BB momentum may carry it out of hill into another (local or global) • By shaking metal sheet, your are adding energy (heat) • How hard do you shake?

Our Simulated Annealing Algorithm • “You’re not being the ball, Danny” (Caddy Shack) • Gravity is great because it tells the ball which way is downhill at all times • We don’t have gravity, so how do we find a successor state? • Randomness • AKA Monte Carlo • AKA Stochastic

Algorithm Outline • Select some initial guess of evaluation function parameters: • Evaluate evaluation function, • Compute a random displacement, • The Monte Carlo event • Evaluate • If v’ < v; set new state, • Else set with Prob(E,T) • This is the Metropolis step • Repeat with updated state and temp

Metropolis Step • We approximate nature’s alignment of molecules by allowing uphill transitions with some probability • Prob (in energy state E) ~ • Boltzmann Probability Distribution • Even when T is small, there is still a chance in high energy state • Prob (transferring from E1 to E2) = • Metropolis Step • if E2 < E1, prob () is greater than 1 • if E2 > E1, we may transfer to higher energy state • The rate at which T is decreased and the amount it is decreased is prescribed by an annealing schedule

What have we got? • Always move downhill if possible • Sometimes go uphill • More likely at start when T is high • Optimality guaranteed with slow annealing schedule • No need for smooth search space • We do not need to know what nearby successor is • Can be discrete search space • Traveling salesman problem More info: Numerical Recipes in C (online) Chapter 10.9

Local Beam Search • Keep more previous states in memory • Simulated Annealing just kept one previous state in memory • This search keeps k states in memory Generate k initial states if any state is a goal, terminate else, generate all successors and select best k repeat

Isn’t this steepest ascent in parallel? • Information is shared between k search points • Each k state generates successors • Best k successors are selected • Some search points may contribute none to best successors • One search point may contribute all k successors • “Come over here, the grass is greener” (Russell and Norvig) • If executed in parallel, no search points would be terminated like this

Beam Search • Premature termination of search paths? • Stochastic beam search • Instead of choosing best K successors • Choose k successors at random

CS 416 Artificial Intelligence