410 likes | 593 Views
Forward-Chaining Planning in Nondeterministic Domains. Ugur Kuter and Dana Nau Department of Computer Science and Institute for Systems Research University of Maryland College Park, Maryland. Generating Plans of Action. Programs to aid human planners Project management (consumer software)
E N D
Forward-Chaining Planning in Nondeterministic Domains Ugur Kuter and Dana Nau Department of Computer Scienceand Institute for Systems Research University of MarylandCollege Park, Maryland
Generating Plans of Action • Programs to aid human planners • Project management (consumer software) • Plan storage and retrieval • (e.g., variant process planning) • Automatic schedule generation • (various OR and AI techniques) • For some problems, really want to generate plans automatically • Much more difficult • One source of difficulty: nondeterministic outcomes • If I plan to perform some action a, I cannot be sure in advance what outcome a will have
c a b Graspblock c c b a Intendedoutcome a b c Unintendedoutcome Planning with Nondeterminism • Actions with multiple possible outcomes • Action failures • e.g., gripper drops its load • Exogenous events • e.g., road closed • Like Markov Decision Processes (MDPs),but without probabilities attached to the outcomes • Useful if accurate probabilities aren’t available, orif probability calculations would introduce inaccuracies • Existing approaches • Conditional Planning (e.g., Penberthy & Weld, 1992) • Conformant Planning (e.g., Smith & Weld, 1998) • Symbolic Model Checking (e.g., Cimatti et al., 1998, 2003)
Research Motivation • Algorithms for planning with nondeterminism havevery high computational complexity • Search space usually is huge • Existing algorithms search most of the space • Classical planning • Lots of work on generating plans quickly • Techniques for pruning large parts of the entire space • Can we generalize any of these techniques for use in nondeterministic domains?
Our Results • A way to nondeterminize any forward-chaining planner for deterministic planning domains • Rewrite it so that it works in nondeterministic domains • Theoretical analysis • Under the appropriate conditions, some nondeterminized planners can run exponentially faster than the best previous planners for nondeterministic domains • Experimental verification of the theoretical results
Forward-Chaining Planners • Some of the most capable existing planners use forward chaining • Backtracking state-space search starting at the initial state • e.g., HSP, TLPlan, TALplanner, SHOP2 • FCP: abstract model of forward-chaining planners • Among different forward-chainingplanners, the main differenceis the action-generation function (s) {actions applicable to s} • Can classify them based on • Domain-specific • Domain-independent • Domain-configurable Procedure FCP (s0, g) π := the empty plan; s := s0 loop • if s satisfies g then return π • else if s isn’t in ancestors(s) then • A := (s) • if A is empty then return failure • nondeterministically choose aA • π := π.a; s := (s,a) • else return failure
Classification of Forward-Chaining Planners • Domain-specific: is designed or tuned for one specific domain • Several application-oriented planners work this way • e.g., EDAPS (process planning), Tignum 2 (used in Bridge Baron) • Good performance in the given domain, but hard to generalize • Domain-independent: works in any domain within some class • Usually, works in anyclassical planning domain • Focus of most researchon AI planning • So far, not practical forreal-world planning • Domain-configurable: … Procedure FCP (s0, g) π := the empty plan; s := s0 loop • if s satisfies g then return π • else if s isn’t in ancestors(s) then • A := (s) • if A is empty then return failure • nondeterministically choose aA • π := π.a; s := (s,a) • else return failure
Classification (continued) • Domain-configurable • has a domain-independent computational engine • Give domain-specific information to as part of the domain description • How to prune some of the actions from 1. Control rules written intemporal logic, used for pruning 2. Hierarchical Task Networks (HTNs)and ordered decomposition Procedure FCP (s0, g, K) π := the empty plan; s := s0 loop • if s satisfies g then return π • else if s isn’t in ancestors(s) then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π.a; s := (s,a) • else return failure
1. Control Rules in Temporal Logic • Depth-first forward search, with control rules written in temporal logic • For each state s, a control rule, f • prune s if it doesn’t satisfy f • Control rules for successors of s are computed via logical progression • TLPlan (Bacchus & Kabanza, Artificial Intelligence 2000) • TALplanner (Doherty & Kvarnstrom, AMAI 2001) • Both work the same way, but they use different temporal logics • Example (next slide): • A trivial blocks-world planning problem • LTL (the logic used in TLPlan)
Example State s: Goal: {on(b,a)} • Control rule f: never pick up block x from the table unless x needs to be on top of another block • Progressed formula f + (must be true in all children of s) • If we pick up a, f + will not be satisfied - prune this state • If we pick up b, f + will be satisfied - keep searching below this state • Can write rules to prune huge parts of the search space b a a b
task travel(x,y) 2. HTN Planning method taxi-travel(x,y) air-travel(x,y) • Decompose tasks into subtasks • Handle constraints (e.g., taxi not good for long distances) • Resolve interactions (e.g., take taxi early enough to catch plane) • If necessary, backtrack and try other decompositions get-ticket (a(x), a(y)) travel (a(y),y) get-taxi ride-taxi (x,y) pay-driver travel (x, a(x)) fly (a(x), a(y)) • travel(UMD, U-of-Alberta) • get-ticket(DCA, YEG) • go to Orbitz • find-flights(DCA,YEG) • buy-ticket(DCA,YEG) • travel(UMD, DCA) • get-taxi • ride-taxi(UMD, DCA) • pay-driver • fly(DCA, YEG) • travel(YEG, U-of-Alberta) • get-taxi • ride-taxi(YEG, U-of-Alberta) • pay-driver
Ordered Decomposition task t0 • Decompose tasksin the same order in which they’ll be executed • Whenever we want to plan the next task • we’ve already planned everything that comes before it • Thus, we know the current state of the world • SHOP2 (Nau et al., IJCAI 2001, JAIR 2003) … task tm task tn … op1 op2 opi s0 s1 s2 … Si–1
Performance • Using control rules and HTNs • can encode domain-specific problem-solving knowledge • highly focused search • go almost directly toward a near-optimal solution, with very little backtracking • TLPlan, TALplanner, and SHOP2 have been the best performers in the International Planning Competitions • Several orders of magnitude faster than the domain-independent planners • Solved many more problems
Expressivity • Forward-chaining planners always know the current state • This makes it easy to do things that would be difficult otherwise • States can be arbitrary data structures • Preconditions and effects can include • logical inference • complex numeric computations • interactions with other software packages • Applications: • SHOP2 is open-source freeware, has been used in dozens of applications (Nau et al., 2004) • Bacchus and Kabanza are attempting to commercialize TLPlan Us: East declarer, West dummy Opponents: defenders, South & North Contract: East – 3NT On lead: West at trick 3 East: KJ74 West: A2 Out: QT98653
How to NondeterminizeForward-Chaining Planners • Two steps: 1. Modify FCP to generate policies rather than plans 2. Modify FCP to solve problems in which actions have multiple outcomes • Want to do this in such a way that it will work for all instances of FCP • Nondeterminized versions of HSP, TLPlan, TALplanner, SHOP2, etc.
a0 a1 a2 Goal State Initial State s0 s1 s2 s3 Plans Versus Policies • In classical domains, a solution is a plan (sequence of actions) • For nondeterministic domains, that’s not sufficient • An action may lead tomore than one possible state • What to do next dependson what state we’re in • Instead of a plan, use a policy: a partial function from states to actions π = (a0, a1, a2) s3 s0 a0 s1 a1 s3 s4 π = {(s1,a0), (s1,a1), (s2,a3)} s0 a0 a2 s1 s2
Execution Graphs • An action a has morethan one possibleoutcome … … so a policy πhas more than one possible execution path • Execution graph E(π) = the graph of all of π’s possible execution paths • Sπ = {all states in E(π)} s3 s0 a0 s1 a2 s0 a0 s3 s4 a1 Initial States s2 Goal States s5 a1 s1 π = {(s0, a0), (s1, a1), (s2, a1), (s3, a2)}
Nondeterminization (Step 1) • Rewrite FCP so that it generates solution policiesrather than solution plans Procedure FCP (s0, g, K) π := the empty plan; s := s0 loop • if s satisfies g then return π • else if s isn’t in ancestors(s) then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π.a; s := (s,a) • else return failure Procedure Policy-FCP(s0, g, K) π := ; s := s0 loop • if s satisfies g then return π • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π {(s,a)}; s := (s,a) • else return failure
s0 s3 a2 a0 s2 s1 Goal a1 Types of Solutions (Cimatti et al, Artificial Intelligence, 2003) • Weak solution: at least one execution path reaches a goal • Strong solution: every execution path reaches a goal • Strong-cyclic solution: every fair execution path reaches a goal • Don’t stay in a cycle forever if there’s a state-transition out of it s0 s3 Goal a2 a0 s2 s1 Goal a3 a1 a3 s0 s3 a2 a0 Goal s2 s1 a1
Nondeterminization (Step 2) • Modify Policy-FCP to generate strong-cyclic solutions • Can also modify it to generate strong and weak solutions(won’t discuss details) Procedure Policy-FCP(s0, g, K) π := ; s := s0 loop • if s satisfies g then return π • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π {(s,a)}; s := (s,a) • else • return failure Procedure ND-FCP (S0, g, K) π := ; S := S0; solved := loop • if S = then return π • select s in S and remove it from S • if s satisfies g then put s into solved • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π {(s,a)}; S := S (s,a) • else if s has no descendants in (S solved) – Sπ • then return failure
Bookkeeping • Bookkeeping to generate graphs rather than paths • S = {nodes that have been generated but not yet explored} • solved = {nodes from which we know we can get to a solution} Procedure ND-FCP (S0, g, K) π := ; S := S0; solved := loop • if S = then return π • select s in S and remove it from S • if s satisfies g then put s into solved • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π {(s,a)}; S := S (s,a) • else if s has no descendants in (S solved) – Sπ • then return failure s3 s0 a s1
s6 a2 s3 s4 s0 a0 s1 s5 s2 a3 a1 Failure Detection • A node s is unsolvable in the following cases: • s is a dead end, • s is part of a cycle from which there is no escape, • every descendant of s is unsolvable • This happens if s has no descendants in(S solved) – Sπ Procedure ND-FCP (S0, g, K) π := ; S := S0; solved := loop • if S = then return π • select s in S and remove it from S • if s satisfies g then put s into solved • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π {(s,a)}; S := S (s,a) • else if s has no descendants in (S solved) – Sπ • then return failure
Formal Properties • Several planning algorithms are instances of FCP • TLPlan, TALplanner, SHOP2, etc. • Only difference: what is • Nondeterminizing FCP preserves ,so it works on any instance of FCP • ND-TLPlan,ND-TALplanner,ND-SHOP2,etc. • Nondeterminizing thempreserves soundness,completeness,time complexity • Details on the next few slides Procedure ND-FCP (S0, g, K) π := ; S := S0; solved := loop • if S = then return π • select s in S and remove it from S • if s satisfies g then put s into solved • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π {(s,a)}; S := S (s,a) • else if s has no descendants in (S solved) – Sπ • then return failure
c a b Graspblock c c b a Intendedoutcome a b c Unintendedoutcome Nondeterministic Versions ofOperators and Domains • Nondeterministic version of an operator o • Same as o except that it may have additional possible outcomes • Failures, exogenous events, etc. • Nondeterministic version of a domain D • The operators are nondeterministic versions of the ones in D
Formal Properties • Nondeterminizing an algorithm preserves its soundness and completeness • Let P be any planning algorithm that’s an instance of FCP • Let ND-P be the nondeterminization of P • Let D be any classical planning domain • Let D’ be any nondeterministic version of D • If P is sound/complete on D, then ND-P is sound/complete on D’ • Nondeterminizing an algorithm preserves its time complexity (as a function of its output) • Let TP(n)and TND-P (n) be the running times of P and ND-P, where n = size of the solution found • Then TND-P (n) is polynomially bounded by TP(n) • (Details on next slide)
a0 a1 a2 Goal State Initial State s0 s1 s2 s3 Time-Complexity Theorem • P = an instance of FCP; D = a classical domain • Suppose P’s time complexity is O(f(||)), where f is monotonic • D = a nondeterministic version of D • ND-P’s time complexity is O(p(f(||))) • Caveat: π may be exponentially larger than π a2 s0 a0 s3 s4 a1 Initial States s2 Goal States s5 a1 s1
Special Case • Suppose that P runs in polynomial time and ND-P produces solutions of polynomial size • Then ND-P runs in polynomial time • Example: Blocks World • Given the appropriate domain knowledge • TALplanner, TLplan and SHOP2 solve Blocks-World problems in polynomial time • ND-TALplanner, ND-TLplan, and ND-SHOP2 produce solutions of polynomial size • With this domain knowledge, • ND-TALplanner, ND-TLplan, and ND-SHOP2 solve nondeterministic-BW problems in polynomial time
Experimental Verification • Implementation of ND-SHOP2 • Compare with MBP (Bertoli et al., 2001) • The best-known planner for nondeterministic domains • Based on symbolic model-checking • Two experimental domains • Robot-Navigation (Kabanza et al., 1997) • The e. coli of research on planning with nondeterminism • Nondeterministic Blocks-World
Robot NavigationDomain • Adapted from(Kabanza et al., 1997) • Rooms, doors, hallway • Robot can open/close doors, move packages to other rooms • Objective: move packages to their destinations • A kid runs around and randomly opens/closes doors • Robot may need to re-open a door repeatedly to go through • Experimental Setup • Kid doors: k = 1, …, 7 • Packages: n = 1, …, 5 • 20 randomly-generated problems for each combination of n, k
c a b Graspblock c c b a Intendedoutcome a b c Unintendedoutcome Nondeterministic Blocks World • Traditional Blocks-World operators: • pickup, putdown, stack, unstack • Actions may have unintended outcomes • e.g., drop a block on the table • Experimental Setup • vary number of blocksfrom 3 to 10 • 20 randomly-generatedproblems for each case
Complexity Analysis • Complexity analysis shows MBP running in exponential time and ND-SHOP2 running in time O(n5) • To see why, need to understand how MBP and ND-SHOP2 work
Representing Policies • A policy π is a partial functionfrom states into actions π(s0) = a0, π(s1) = a1, π(s2) = a1, π(s3) = a2 • Can use a symbolic representationroughly like this: if in(r4) and holding(b) and door-closed(r4) then π(s) = open-door(r4) if in(r4) and holding(b) and door-open(r4) then π(s) = go(r4, hall) • Each state description ignores all doors other than d4 • Includes an exponential number of states • Both MBP and ND-SHOP2 use symbolic representations of policies • Can write polynomial-size policies for exponentially large state spaces
How MBP Generates Policies • MBP uses model-checking techniques • e.g., computing pre-images of sets of states • Roughly like a breadth-first backward search • MBP may need to explore exponentially many states that are unreachable from the initial state • Exponentially many states => exponential time • That’s what happens in the robot navigation and nondeterminized blocks world domains
How ND-SHOP2 Generates Policies • ND-SHOP2 takes domain knowledge in the form of HTN methods • Method m1 Task: take-package (p, r, hall) Precond: in(r), holding(p), door-open(r) Subtasks: go(r, hall) • Method m2 Task: take-package(p, r, hall) Precond: in(r), holding(p), door-closed(r) Subtasks: open-door(r), go(r, hall) • Consider the task take-package(b, r4, hall) • ND-SHOP can very quickly develop the policy if in(r4) and holding(b) and door-closed(r4) then π(s) = open-door(r4) if in(r4) and holding(b) and door-open(r4) then π(s) = go(r4, hall)
Conclusions • A technique for “nondeterminization” of forward-chaining classical planner • Theoretical analysis • Nondeterminization preserves soundness/completeness • Time complexity of the generalized planners is polynomially bounded by the time complexity of the original ones • Experimental verification of the results
Future Work • Nondeterministic planning domains are just like MDPs except that there are no probabilities • We are quite confident that • We can generalize our approach to work in MDPs too • Our “MDP-ized” algorithms will be able to run exponentially faster than traditional MDP algorithms • Preliminary implementation and experiments • So far, very encouraging
Related Work • M. Ghallab, D. Nau, and P. Traverso,Automated Planning: Theory and Practice(Morgan Kaufmann, May 2004) • First comprehensive textbook onautomated planning • models, techniques, algorithms • case studies of applications • Web site: http://www.laas.fr/planning • Lecture slides available online