660 likes | 702 Views
Claude Shannon (finite look-ahead). Chaturanga, India (~550AD) (Proto-Chess). Von Neuman (Min-Max theorem). 9/30. Donald Knuth ( a-b analysis). John McCarthy ( a-b pruning). Announcements etc. Homework 2 returned (!! Our TA doesn’t sleep) Average 33/60 Max 56/60
E N D
Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) Von Neuman (Min-Max theorem) 9/30 Donald Knuth (a-b analysis) John McCarthy (a-b pruning)
Announcements etc. • Homework 2 returned • (!! Our TA doesn’t sleep) • Average 33/60 • Max 56/60 • Solutions online • Homework 3 socket opened • Project 1 due today • Extra credit portion will be accepted until Thursday with late penalty • Any steam to be let off? • Today’s class • It’s all fun and GAMES Steaming in Tempe
The expected value computation is fine if you are maximizing “expected” return If you are --if you are risk-averse? (and think “nature” is out to get you) V2= min(V3,V4) What if you see this as a game? If you are perpetual optimist then V2= max(V3,V4) Review Min-Max!
Game Playing (Adversarial Search) • Perfect play • Do minmax on the complete game tree • Resource limits • Do limited depth lookahead • Apply evaluation functions at the leaf nodes • Do minmax • Alpha-Beta pruning (a neat idea that is the bane of many a CSE471 student) • Miscellaneous • Games of Chance • Status of computer games..
Fun to try and find analogies between this and environment properties…
Evaluation Functions: TicTacToe If win for Max +infty If lose for Max -infty If draw for Max 0 Else # rows/cols/diags open for Max - #rows/cols/diags open for Min
Why is “deeper” better? • Possible reasons • Taking mins/maxes of the evaluation values of the leaf nodes improves their collective accuracy • Going deeper makes the agent notice “traps” thus significantly improving the evaluation accuracy • All evaluation functions first check for termination states before computing the non-terminal evaluation
<= 2 <= 2 <= 5 <= 14 Cut 2 14 5 2 • Whenever a node gets its “true” value, its parent’s bound gets updated • When all children of a node have been evaluated (or a cut off occurs below that node), the current bound of that node is its true value • Two types of cutoffs: • If a min node n has bound <=k, and a max ancestor of n, say m, has a bound >=j, then cutoff occurs as long as j >=k • If a max node n has bound >=k, and a min ancestor of n, say m, has a bound <=j, then cutoff occurs as long as j<=k
An eye for an eye only ends up making the whole world blind. -Mohandas Karamchand Gandhi, born October 2nd, 1869. Lecture of October 2nd, 2003
Another alpha-beta example Project 2 assigned
(order nodes in terms of their static eval values) Click for an animation of Alpha-beta search in action on Tic-Tac-Toe
Multi-player Games Everyone maximizes their utility --How does this compare to 2-player games? (Max’s utility is negative of Min’s)
(just as human weight lifters refuse to compete against cranes)
RTA* S G=1 H=2 F=3 G G=1 H=2 F=3 n m G=2 H=3 F=5 infty k RTA* is a special case of RTDP --It is useful for acting in determinstic, dynamic worlds --While RTDP is useful for actiong in stochastic, dynamic worlds --Grow the tree to depth d --Apply f-evaluation for the leaf nodes --propagate f-values up to the parent nodes f(parent) = min( f(children))
MDPs and Deterministic Search • Problem solving agent search corresponds to what special case of MDP? • Actions are deterministic; Goal states are all equally valued, and are all sink states. • Is it worth solving the problem using MDPs? • The construction of optimal policy is an overkill • The policy, in effect, gives us the optimal path from every state to the goal state(s)) • The value function, or its approximations, on the other hand are useful. How? • As heuristics for the problem solving agent’s search • This shows an interesting connection between dynamic programming and “state search” paradigms • DP solves many related problems on the way to solving the one problem we want • State search tries to solve just the problem we want • We can use DP to find heuristics to run state search..
Incomplete observability(the dreaded POMDPs) • To model partial observability, all we need to do is to look at MDP in the space of belief states (belief states are fully observable even when world states are not) • Policy maps belief states to actions • In practice, this causes (humongous) problems • The space of belief states is “continuous” (even if the underlying world is discrete and finite). {GET IT? GET IT??} • Even approximate policies are hard to find (PSPACE-hard). • Problems with few dozen world states are hard to solve currently • “Depth-limited” exploration (such as that done in adversarial games) are the only option… Belief state = { s1:0.3, s2:0.4; s4:0.3} 5 LEFTs 5 UPs This figure basically shows that belief states change as we take actions
Planning Where states are transparent and actions have preconditions and effects
Blocks world Init: Ontable(A),Ontable(B), Clear(A), Clear(B), hand-empty Goal: ~clear(B), hand-empty State variables: Ontable(x) On(x,y) Clear(x) hand-empty holding(x) Initial state: Complete specification of T/F values to state variables --By convention, variables with F values are omitted Goal state: A partial specification of the desired state variable/value combinations --desired values can be both positive and negative Pickup(x) Prec: hand-empty,clear(x),ontable(x) eff: holding(x),~ontable(x),~hand-empty,~Clear(x) Putdown(x) Prec: holding(x) eff: Ontable(x), hand-empty,clear(x),~holding(x) Unstack(x,y) Prec: on(x,y),hand-empty,cl(x) eff: holding(x),~clear(x),clear(y),~hand-empty Stack(x,y) Prec: holding(x), clear(y) eff: on(x,y), ~cl(y), ~holding(x), hand-empty All the actions here have only positive preconditions; but this is not necessary
An action A can be applied to state S iff the preconditions are satisfied in the current state The resulting state S’ is computed as follows: --every variable that occurs in the actions effects gets the value that the action said it should have --every other variable gets the value it had in the state S where the action is applied Progression: holding(A) ~Clear(A) ~Ontable(A) Ontable(B), Clear(B) ~handempty Pickup(A) Ontable(A) Ontable(B), Clear(A) Clear(B) hand-empty holding(B) ~Clear(B) ~Ontable(B) Ontable(A), Clear(A) ~handempty Pickup(B)
Generic (progression) planner • Goal test(S,G)—check if every state variable in S, that is mentioned in G, has the value that G gives it. • Child generator(S,A) • For each action a in A do • If every variable mentioned in Prec(a) has the same value in it and S • Then return Progress(S,a) as one of the children of S • Progress(S,A) is a state S’ where each state variable v has value v[Eff(a)]if it is mentioned in Eff(a) and has the value v[S] otherwise • Search starts from the initial state
A state S can be regressed over an action A (or A is applied in the backward direction to S) Iff: --There is no variable v such that v is given different values by the effects of A and the state S --There is at least one variable v’ such that v’ is given the same value by the effects of A as well as state S The resulting state S’ is computed as follows: -- every variable that occurs in S, and does not occur in the effects of A will be copied over to S’ with its value as in S -- every variable that occurs in the precondition list of A will be copied over to S’ with the value it has in in the precondition list Regression: Putdown(A) ~clear(B) holding(A) ~clear(B) hand-empty Stack(A,B) holding(A) clear(B) Putdown(B)??
Heuristics to guide Progression/Regression • Set difference heuristic • Intution: The cost of a state is the number of goals that are not yet present in it. • Progression: The cost of a state S is | G \ S | • The number of state-variable value pairs in G which • are not present in S • Regression: The cost of a state S is | S \ I | • The number of state-variable value pairs in S that are • not present in the initial state • Problems with Set difference heuristic: • 1. Every literal is given the same cost. Some literals are • harder to achieve than others! • 2. It is assumed that the cost of achieving n-literals together is n • This ignores the interactions between literals (“subgoals”). • -- It may be easier to achieve a set of literals together than to • achieve each of them separately (+ve interactions) • -- It may be harder to achieve a set of literals together than to • achieve them separately. (-ve interactions)
Subgoal interactions: Suppose we have a set of subgoals G1,….Gn Suppose the length of the shortest plan for achieving the subgoals in isolation is l1,….ln We want to know what is the length of the shortest plan for achieving the n subgoals together, l1…n If subgoals are independent: l1..n = l1+l2+…+ln If subgoals have +ve interactions alone: l1..n < l1+l2+…+ln If subgoals have -ve interactions alone: l1..n > l1+l2+…+ln
Estimating the cost of achieving individual literals (subgoals) Idea: Unfold a data structure called “planning graph” as follows: 1. Start with the initial state. This is called the zeroth level proposition list 2. In the next level, called first level action list, put all the actions whose preconditions are true in the initial state -- Have links between actions and their preconditions 3. In the next level, called first level propostion list, put: Note: A literal appears at most once in a proposition list. 3.1. All the effects of all the actions in the previous level. Links the effects to the respective actions. (If multiple actions give a particular effect, have multiple links to that effect from all those actions) 3.2. All the conditions in the previous proposition list (in this case zeroth proposition list). Put persistence links between the corresponding literals in the previous proposition list and the current proposition list. *4. Repeat steps 2 and 3 until there is no difference between two consecutive proposition lists. At that point the graph is said to have “leveled off” The next 2 slides show this expansion upto two levels
Lecture of 23 Oct (the first four slides are reviewed from the previous class)
h-A h-B Pick-A Pick-B ~cl-A ~cl-B ~he onT-A onT-A onT-B onT-B cl-A cl-A cl-B cl-B he he
h-A on-A-B St-A-B on-B-A h-B Pick-A h-A h-B Pick-B ~cl-A ~cl-A ~cl-B ~cl-B St-B-A ~he ~he onT-A onT-A Ptdn-A onT-A onT-B onT-B onT-B Ptdn-B cl-A cl-A cl-A Pick-A cl-B cl-B cl-B Pick-B he he he
Using the planning graph to estimate the cost of single literals: 1. We can say that the cost of a single literal is the index of the first proposition level in which it appears. --If the literal does not appear in any of the levels in the currently expanded planning graph, then the cost of that literal is: -- l+1 if the graph has been expanded to l levels, but has not yet leveled off -- Infinity, if the graph has been expanded (basically, the literal cannot be achieved from the current initial state) Examples: h({~he}) = 1 h ({On(A,B)}) = 2 h({he})= 0
Estimating the cost of a set of literals (e.g. a state in regression search) Idea 0. [Max Heuristic] Hmax({p,q,r..}) = max{h(p),h(q),….} Admissible, but very weak in practice Idea 2. [Sum Heuristic] Make subgoal independence assumption hind({p,q,r,...}) = h(p)+h(q)+h(r)+… Much better than set-difference heuristic in practice. --Ignores +ve interactions h({~he,h-A}) = h(~he) + h(h-A) = 1+1=2 But, we can achieve both the literals with just a single action, Pickup(A). So, the real cost is 1 --Ignores -ve interactions h({~cl(B),he}) = 1+0 = 1 But, there is really no plan that can achieve these two literals in this problem So, the real cost is infinity!
We can do a better job of accounting for +ve interactions if we define the cost of a set of literals in terms of the level hlev({p,q,r})= The index of the first level of the PG where p,q,r appear together so, h({~he,h-A}) = 1 Interestingly, hlev is an admissible heuristic, even though hind is not! (Prove) To better account for -ve interactions, we need to start looking into feasibility of subsets of literals actually being true together in a proposition level. Specifically, in each proposition level, we want to mark not just which individual literals are feasible, but also which pairs, which triples, which quadruples, and which n-tuples are feasible. (It is quite possible that two literals are independently feasible in level k, but not feasible together in that level) --The idea then is to say that the cost of a set of S literals is the index of the first level of the planning graph, where no subset of S is marked infeasible --The full scale mark-up is very costly, and makes the cost of planning graph construction equal the cost of enumerating the full progression search tree. -- Since we only want estimates, it is okay if talk of feasibility of upto k-tuples -- For the special case of feasibility of k=2 (2-sized subsets), there are some very efficient marking and propagation procedures. This is the idea of marking and propagating mutual exclusion relations.
Rule 1. Two actions a1 and a2 are mutex if • both of the actions are non-noop actions or • a1 is a noop action supporting P, and a2 either needs ~P, or gives ~P. • some precondition of a1 is marked mutex with some precondition of a2 Rule 2. Two propositions P1 and P2 are marked mutex if all actions supporting P1 are pair-wise mutex with all actions supporting P2.
h-A h-B Pick-A Pick-B ~cl-A ~cl-B ~he onT-A onT-A onT-B onT-B cl-A cl-A cl-B cl-B he he
h-A on-A-B St-A-B on-B-A h-B Pick-A h-A h-B Pick-B ~cl-A ~cl-A ~cl-B ~cl-B St-B-A ~he ~he onT-A onT-A Ptdn-A onT-A onT-B onT-B onT-B Ptdn-B cl-A cl-A cl-A Pick-A cl-B cl-B cl-B Pick-B he he he
Here is how it goes. We know that at every time step we are really only going to do one non-no-op action. So, at the first level either pickup-A, or pickup-B or pickup-C are done. If one of them is done, the others can’t be. So, we put red-arrows to signify that each pair of actions are mutually exclusive. • Now, we can PROPAGATE the mutex relations to the proposition levels. • Rule 1. Two actions a1 and a2 are mutex if • both of the actions are non-noop actions or • a1 is a noop action supporting P, and a2 either needs ~P, or gives ~P. • some precondition of a1 is marked mutex with some precondition of a2 • By this rule Pick-A is mutex with Pick-B. Similarly, the noop action he is mutex with pick-A. • Rule 2. Two propositions P1 and P2 are marked mutex if all actions supporting P1 are pair-wise mutex with all actions supporting P2. • By this rule, h-A and h-B are mutex in level 1 since the only action giving h-A is mutex with the only action giving h-B. • ~cl(B) and he are mutex in the first level, but are not mutex in the second level (note that • ~cl(B) is supported by a noop and stack-a-b (among others) in level 2. he is supported by stack-a-b, noop (among others). At least one action –stack-a-b supporting the first is non-mutex with one action—stack-a-b-- supporting the second.
Some observations about the structure of the PG 1. If an action a is present in level l, it will be present in all subsequent levels. 2. If a literal p is present in level l, it will be present in all subsequent levels. 3. If two literals p,q are not mutex in level l, they will never be mutex in subsequent levels --Mutex relations relax monotonically as we grow PG 1,2,3 imply that a PG can be represented efficiently in a bi-level structure: One level for propositions and one level for actions. For each proposition/action, we just track the first time instant they got into the PG. For mutex relations we track the first time instant they went away. A PG is said to have leveled off if there are no differences between two consecutive proposition levels in terms of propositions or mutex relations. --Even if you grow it further, no more changes can occur..
Level-based heuristics on planning graph with mutex relations We now modify the hlev heuristic as follows hlev({p1, …pn})= The index of the first level of the PG where p1, …pn appear together and no pair of them are marked mutex. (If there is no such level, then hlev is set to l+1if the PG is expanded to l levels, and to infinity, if it has been expanded until it leveled off) This heuristic is admissible. With this heuristic, we have a much better handle on both +ve and -ve interactions. In our example, this heuristic gives the following reasonable costs: h({~he, cl-A}) = 1 h({~cl-B,he}) = 2 h({he, h-A}) = infinity (because they will be marked mutex even in the final level of the leveled PG) Works very well in practice
Lecture of Oct 25th Agenda: Demo of AltAlt--A planner that uses PG as a heuristic Qns on PG Use of PGs in Progression vs. Regression Other uses of PG (action selection) PGs as basis for planning as CSP
AltAlt Uses a hybrid of Level and sum Heuristics --sacrifices admissibility --uses partial PG to keep heuristic cost down
Empirical Evaluation: Logistics domain HadjSum2M heuristic Problems and domains from AIPS-2000 Planning Competion (AltAlt approx in top four)
Comparing Solution Qualities • 30 • 25 • 20 • Levels off • 15 • 10 • 5 • 0 • 0 • 5 • 10 • 15 • 20 • 25 • 30 • Lev(S) Do PG expansion only upto level(l) where all top level goals come in without being mutex AIPS-00 Schedule Domain Quality Cost
Qns on PG? • Consider a set of subgoals {p,q,r,s} • If the set appears at level 12 without any pair being mutex, is there guaranteed to be a plan to achive {p,q,r,s} using a 12-step plan? • If {p,q} appear in level 12 without being mutex, is there a guaranteed 12-step plan to achieve {p,q}? • If {p} appears in level 12 without being mutex, is there a guarnateed 12-step plan to achieve {p} with ? PG does approximate reachability analysis