Forward-Chaining Planning in Nondeterministic Domains

Forward-Chaining Planning in Nondeterministic Domains Ugur Kuter and Dana Nau Department of Computer Scienceand Institute for Systems Research University of MarylandCollege Park, Maryland

Generating Plans of Action • Programs to aid human planners • Project management (consumer software) • Plan storage and retrieval • (e.g., variant process planning) • Automatic schedule generation • (various OR and AI techniques) • For some problems, really want to generate plans automatically • Much more difficult • One source of difficulty: nondeterministic outcomes • If I plan to perform some action a, I cannot be sure in advance what outcome a will have

c a b Graspblock c c b a Intendedoutcome a b c Unintendedoutcome Planning with Nondeterminism • Actions with multiple possible outcomes • Action failures • e.g., gripper drops its load • Exogenous events • e.g., road closed • Like Markov Decision Processes (MDPs),but without probabilities attached to the outcomes • Useful if accurate probabilities aren’t available, orif probability calculations would introduce inaccuracies • Existing approaches • Conditional Planning (e.g., Penberthy & Weld, 1992) • Conformant Planning (e.g., Smith & Weld, 1998) • Symbolic Model Checking (e.g., Cimatti et al., 1998, 2003)

Research Motivation • Algorithms for planning with nondeterminism havevery high computational complexity • Search space usually is huge • Existing algorithms search most of the space • Classical planning • Lots of work on generating plans quickly • Techniques for pruning large parts of the entire space • Can we generalize any of these techniques for use in nondeterministic domains?

Our Results • A way to nondeterminize any forward-chaining planner for deterministic planning domains • Rewrite it so that it works in nondeterministic domains • Theoretical analysis • Under the appropriate conditions, some nondeterminized planners can run exponentially faster than the best previous planners for nondeterministic domains • Experimental verification of the theoretical results

Forward-Chaining Planners • Some of the most capable existing planners use forward chaining • Backtracking state-space search starting at the initial state • e.g., HSP, TLPlan, TALplanner, SHOP2 • FCP: abstract model of forward-chaining planners • Among different forward-chainingplanners, the main differenceis the action-generation function (s) {actions applicable to s} • Can classify them based on  • Domain-specific • Domain-independent • Domain-configurable Procedure FCP (s0, g) π := the empty plan; s := s0 loop • if s satisfies g then return π • else if s isn’t in ancestors(s) then • A := (s) • if A is empty then return failure • nondeterministically choose aA • π := π.a; s := (s,a) • else return failure

Classification of Forward-Chaining Planners • Domain-specific: is designed or tuned for one specific domain • Several application-oriented planners work this way • e.g., EDAPS (process planning), Tignum 2 (used in Bridge Baron) • Good performance in the given domain, but hard to generalize • Domain-independent: works in any domain within some class • Usually,  works in anyclassical planning domain • Focus of most researchon AI planning • So far, not practical forreal-world planning • Domain-configurable: … Procedure FCP (s0, g) π := the empty plan; s := s0 loop • if s satisfies g then return π • else if s isn’t in ancestors(s) then • A := (s) • if A is empty then return failure • nondeterministically choose aA • π := π.a; s := (s,a) • else return failure

Classification (continued) • Domain-configurable •  has a domain-independent computational engine • Give domain-specific information to  as part of the domain description • How to prune some of the actions from  1. Control rules written intemporal logic, used for pruning 2. Hierarchical Task Networks (HTNs)and ordered decomposition Procedure FCP (s0, g, K) π := the empty plan; s := s0 loop • if s satisfies g then return π • else if s isn’t in ancestors(s) then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π.a; s := (s,a) • else return failure

1. Control Rules in Temporal Logic • Depth-first forward search, with control rules written in temporal logic • For each state s, a control rule, f • prune s if it doesn’t satisfy f • Control rules for successors of s are computed via logical progression • TLPlan (Bacchus & Kabanza, Artificial Intelligence 2000) • TALplanner (Doherty & Kvarnstrom, AMAI 2001) • Both work the same way, but they use different temporal logics • Example (next slide): • A trivial blocks-world planning problem • LTL (the logic used in TLPlan)

Example State s: Goal: {on(b,a)} • Control rule f: never pick up block x from the table unless x needs to be on top of another block • Progressed formula f + (must be true in all children of s) • If we pick up a, f + will not be satisfied - prune this state • If we pick up b, f + will be satisfied - keep searching below this state • Can write rules to prune huge parts of the search space b a a b

task travel(x,y) 2. HTN Planning method taxi-travel(x,y) air-travel(x,y) • Decompose tasks into subtasks • Handle constraints (e.g., taxi not good for long distances) • Resolve interactions (e.g., take taxi early enough to catch plane) • If necessary, backtrack and try other decompositions get-ticket (a(x), a(y)) travel (a(y),y) get-taxi ride-taxi (x,y) pay-driver travel (x, a(x)) fly (a(x), a(y)) • travel(UMD, U-of-Alberta) • get-ticket(DCA, YEG) • go to Orbitz • find-flights(DCA,YEG) • buy-ticket(DCA,YEG) • travel(UMD, DCA) • get-taxi • ride-taxi(UMD, DCA) • pay-driver • fly(DCA, YEG) • travel(YEG, U-of-Alberta) • get-taxi • ride-taxi(YEG, U-of-Alberta) • pay-driver

Ordered Decomposition task t0 • Decompose tasksin the same order in which they’ll be executed • Whenever we want to plan the next task • we’ve already planned everything that comes before it • Thus, we know the current state of the world • SHOP2 (Nau et al., IJCAI 2001, JAIR 2003) … task tm task tn … op1 op2 opi s0 s1 s2 … Si–1

Performance • Using control rules and HTNs • can encode domain-specific problem-solving knowledge • highly focused search • go almost directly toward a near-optimal solution, with very little backtracking • TLPlan, TALplanner, and SHOP2 have been the best performers in the International Planning Competitions • Several orders of magnitude faster than the domain-independent planners • Solved many more problems

Expressivity • Forward-chaining planners always know the current state • This makes it easy to do things that would be difficult otherwise • States can be arbitrary data structures • Preconditions and effects can include • logical inference • complex numeric computations • interactions with other software packages • Applications: • SHOP2 is open-source freeware, has been used in dozens of applications (Nau et al., 2004) • Bacchus and Kabanza are attempting to commercialize TLPlan Us: East declarer, West dummy Opponents: defenders, South & North Contract: East – 3NT On lead: West at trick 3 East: KJ74 West: A2 Out: QT98653

How to NondeterminizeForward-Chaining Planners • Two steps: 1. Modify FCP to generate policies rather than plans 2. Modify FCP to solve problems in which actions have multiple outcomes • Want to do this in such a way that it will work for all instances of FCP • Nondeterminized versions of HSP, TLPlan, TALplanner, SHOP2, etc.

a0 a1 a2 Goal State Initial State s0 s1 s2 s3 Plans Versus Policies • In classical domains, a solution is a plan (sequence of actions) • For nondeterministic domains, that’s not sufficient • An action may lead tomore than one possible state • What to do next dependson what state we’re in • Instead of a plan, use a policy: a partial function from states to actions π = (a0, a1, a2) s3 s0 a0 s1 a1 s3 s4 π = {(s1,a0), (s1,a1), (s2,a3)} s0 a0 a2 s1 s2

Execution Graphs • An action a has morethan one possibleoutcome … … so a policy πhas more than one possible execution path • Execution graph E(π) = the graph of all of π’s possible execution paths • Sπ = {all states in E(π)} s3 s0 a0 s1 a2 s0 a0 s3 s4 a1 Initial States s2 Goal States s5 a1 s1 π = {(s0, a0), (s1, a1), (s2, a1), (s3, a2)}

Nondeterminization (Step 1) • Rewrite FCP so that it generates solution policiesrather than solution plans Procedure FCP (s0, g, K) π := the empty plan; s := s0 loop • if s satisfies g then return π • else if s isn’t in ancestors(s) then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π.a; s := (s,a) • else return failure Procedure Policy-FCP(s0, g, K) π := ; s := s0 loop • if s satisfies g then return π • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π  {(s,a)}; s := (s,a) • else return failure

s0 s3 a2 a0 s2 s1 Goal a1 Types of Solutions (Cimatti et al, Artificial Intelligence, 2003) • Weak solution: at least one execution path reaches a goal • Strong solution: every execution path reaches a goal • Strong-cyclic solution: every fair execution path reaches a goal • Don’t stay in a cycle forever if there’s a state-transition out of it s0 s3 Goal a2 a0 s2 s1 Goal a3 a1 a3 s0 s3 a2 a0 Goal s2 s1 a1

Nondeterminization (Step 2) • Modify Policy-FCP to generate strong-cyclic solutions • Can also modify it to generate strong and weak solutions(won’t discuss details) Procedure Policy-FCP(s0, g, K) π := ; s := s0 loop • if s satisfies g then return π • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π  {(s,a)}; s := (s,a) • else • return failure Procedure ND-FCP (S0, g, K) π := ; S := S0; solved :=  loop • if S =  then return π • select s in S and remove it from S • if s satisfies g then put s into solved • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π  {(s,a)}; S := S (s,a) • else if s has no descendants in (S  solved) – Sπ • then return failure

Bookkeeping • Bookkeeping to generate graphs rather than paths • S = {nodes that have been generated but not yet explored} • solved = {nodes from which we know we can get to a solution} Procedure ND-FCP (S0, g, K) π := ; S := S0; solved :=  loop • if S =  then return π • select s in S and remove it from S • if s satisfies g then put s into solved • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π  {(s,a)}; S := S (s,a) • else if s has no descendants in (S  solved) – Sπ • then return failure s3 s0 a s1

s6 a2 s3 s4 s0 a0 s1 s5 s2 a3 a1 Failure Detection • A node s is unsolvable in the following cases: • s is a dead end, • s is part of a cycle from which there is no escape, • every descendant of s is unsolvable • This happens if s has no descendants in(S  solved) – Sπ Procedure ND-FCP (S0, g, K) π := ; S := S0; solved :=  loop • if S =  then return π • select s in S and remove it from S • if s satisfies g then put s into solved • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π  {(s,a)}; S := S (s,a) • else if s has no descendants in (S  solved) – Sπ • then return failure

Formal Properties • Several planning algorithms are instances of FCP • TLPlan, TALplanner, SHOP2, etc. • Only difference: what  is • Nondeterminizing FCP preserves ,so it works on any instance of FCP • ND-TLPlan,ND-TALplanner,ND-SHOP2,etc. • Nondeterminizing thempreserves soundness,completeness,time complexity • Details on the next few slides Procedure ND-FCP (S0, g, K) π := ; S := S0; solved :=  loop • if S =  then return π • select s in S and remove it from S • if s satisfies g then put s into solved • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π  {(s,a)}; S := S (s,a) • else if s has no descendants in (S  solved) – Sπ • then return failure

c a b Graspblock c c b a Intendedoutcome a b c Unintendedoutcome Nondeterministic Versions ofOperators and Domains • Nondeterministic version of an operator o • Same as o except that it may have additional possible outcomes • Failures, exogenous events, etc. • Nondeterministic version of a domain D • The operators are nondeterministic versions of the ones in D

Formal Properties • Nondeterminizing an algorithm preserves its soundness and completeness • Let P be any planning algorithm that’s an instance of FCP • Let ND-P be the nondeterminization of P • Let D be any classical planning domain • Let D’ be any nondeterministic version of D • If P is sound/complete on D, then ND-P is sound/complete on D’ • Nondeterminizing an algorithm preserves its time complexity (as a function of its output) • Let TP(n)and TND-P (n) be the running times of P and ND-P, where n = size of the solution found • Then TND-P (n) is polynomially bounded by TP(n) • (Details on next slide)

a0 a1 a2 Goal State Initial State s0 s1 s2 s3 Time-Complexity Theorem • P = an instance of FCP; D = a classical domain • Suppose P’s time complexity is O(f(||)), where f is monotonic • D = a nondeterministic version of D • ND-P’s time complexity is O(p(f(||))) • Caveat: π may be exponentially larger than π a2 s0 a0 s3 s4 a1 Initial States s2 Goal States s5 a1 s1

Special Case • Suppose that P runs in polynomial time and ND-P produces solutions of polynomial size • Then ND-P runs in polynomial time • Example: Blocks World • Given the appropriate domain knowledge • TALplanner, TLplan and SHOP2 solve Blocks-World problems in polynomial time • ND-TALplanner, ND-TLplan, and ND-SHOP2 produce solutions of polynomial size • With this domain knowledge, • ND-TALplanner, ND-TLplan, and ND-SHOP2 solve nondeterministic-BW problems in polynomial time

Experimental Verification • Implementation of ND-SHOP2 • Compare with MBP (Bertoli et al., 2001) • The best-known planner for nondeterministic domains • Based on symbolic model-checking • Two experimental domains • Robot-Navigation (Kabanza et al., 1997) • The e. coli of research on planning with nondeterminism • Nondeterministic Blocks-World

Robot NavigationDomain • Adapted from(Kabanza et al., 1997) • Rooms, doors, hallway • Robot can open/close doors, move packages to other rooms • Objective: move packages to their destinations • A kid runs around and randomly opens/closes doors • Robot may need to re-open a door repeatedly to go through • Experimental Setup • Kid doors: k = 1, …, 7 • Packages: n = 1, …, 5 • 20 randomly-generated problems for each combination of n, k

Varying the problem size

Varying the amount of nondeterminism

c a b Graspblock c c b a Intendedoutcome a b c Unintendedoutcome Nondeterministic Blocks World • Traditional Blocks-World operators: • pickup, putdown, stack, unstack • Actions may have unintended outcomes • e.g., drop a block on the table • Experimental Setup • vary number of blocksfrom 3 to 10 • 20 randomly-generatedproblems for each case

Varying the problem size

Complexity Analysis • Complexity analysis shows MBP running in exponential time and ND-SHOP2 running in time O(n5) • To see why, need to understand how MBP and ND-SHOP2 work

Representing Policies • A policy π is a partial functionfrom states into actions π(s0) = a0, π(s1) = a1, π(s2) = a1, π(s3) = a2 • Can use a symbolic representationroughly like this: if in(r4) and holding(b) and door-closed(r4) then π(s) = open-door(r4) if in(r4) and holding(b) and door-open(r4) then π(s) = go(r4, hall) • Each state description ignores all doors other than d4 • Includes an exponential number of states • Both MBP and ND-SHOP2 use symbolic representations of policies • Can write polynomial-size policies for exponentially large state spaces

How MBP Generates Policies • MBP uses model-checking techniques • e.g., computing pre-images of sets of states • Roughly like a breadth-first backward search • MBP may need to explore exponentially many states that are unreachable from the initial state • Exponentially many states => exponential time • That’s what happens in the robot navigation and nondeterminized blocks world domains

How ND-SHOP2 Generates Policies • ND-SHOP2 takes domain knowledge in the form of HTN methods • Method m1 Task: take-package (p, r, hall) Precond: in(r), holding(p), door-open(r) Subtasks: go(r, hall) • Method m2 Task: take-package(p, r, hall) Precond: in(r), holding(p), door-closed(r) Subtasks: open-door(r), go(r, hall) • Consider the task take-package(b, r4, hall) • ND-SHOP can very quickly develop the policy if in(r4) and holding(b) and door-closed(r4) then π(s) = open-door(r4) if in(r4) and holding(b) and door-open(r4) then π(s) = go(r4, hall)

Conclusions • A technique for “nondeterminization” of forward-chaining classical planner • Theoretical analysis • Nondeterminization preserves soundness/completeness • Time complexity of the generalized planners is polynomially bounded by the time complexity of the original ones • Experimental verification of the results

Future Work • Nondeterministic planning domains are just like MDPs except that there are no probabilities • We are quite confident that • We can generalize our approach to work in MDPs too • Our “MDP-ized” algorithms will be able to run exponentially faster than traditional MDP algorithms • Preliminary implementation and experiments • So far, very encouraging

Related Work • M. Ghallab, D. Nau, and P. Traverso,Automated Planning: Theory and Practice(Morgan Kaufmann, May 2004) • First comprehensive textbook onautomated planning • models, techniques, algorithms • case studies of applications • Web site: http://www.laas.fr/planning • Lecture slides available online

Forward-Chaining Planning in Nondeterministic Domains