1 / 34

Concurrent Probabilistic Temporal Planning (CPTP)

Concurrent Probabilistic Temporal Planning (CPTP). Mausam Joint work with Daniel S. Weld University of Washington Seattle. Motivation. Three features of real world planning domains : Durative actions All actions (navigation between sites, placing instruments etc.) take time.

tawanak
Download Presentation

Concurrent Probabilistic Temporal Planning (CPTP)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle

  2. Motivation • Three features of real world planning domains : • Durative actions • All actions (navigation between sites, placing instruments etc.) take time. • Concurrency • Some instruments may warm up • Others may perform their tasks • Others may shutdown to save power. • Uncertainty • All actions (pick up the rock, send data etc.) have a probability of failure.

  3. Motivation (contd.) • Concurrent Temporal Planning (widely studied with deterministic effects) • Extends classical planning • Doesn’t easily extend to probabilistic outcomes. • Concurrent planning with uncertainty (Concurrent MDPs – AAAI’04) • Handle combinations of actions over an MDP • Actions take unit time. • Few planners handle the three in concert!

  4. Outline of the talk • MDP and CoMDP • Concurrent Probabilistic Temporal Planning • Concurrent MDP in augmented state space. • Solution Methods for CPTP • Two heuristics to guide the search • Hybridisation • Experiments & Conclusions • Related & Future Work

  5. unit duration Markov Decision Process • S : a set of states, factored into Boolean variables. • A : a set of actions • Pr (S£A£S! [0,1]): the transition model • C(A!R) : the cost model • s0 : the start state • G : a set of absorbing goals

  6. GOAL of an MDP • Find a policy (S!A) which: • minimises expected cost of reaching a goal • for a fully observable • Markov decision process • if the agent executes for indefinite horizon.

  7. Equations : optimal policy • Define J*(s) {optimal cost} as the minimum expected cost to reach a goal from s. • J* should satisfy:

  8. Min Bellman Backup Jn Qn+1(s,a) Jn a1 min Jn a2 Jn+1(s) s Jn a3 Jn Jn Ap(s) Jn

  9. Min a3 RTDP Trial Jn Qn+1(s,a) amin = a2 Jn a1 min Jn Goal a2 s Jn+1(s) Jn Jn Jn Ap(s) Jn

  10. Real Time Dynamic Programming(Barto, Bradtke and Singh’95) • Trial : Simulate greedy policy; Perform Bellman backup on visited states • Repeat RTDP Trials until cost function converges • Anytime behaviour • Only expands reachable state space • Complete convergence is slow • Labeled RTDP (Bonet & Geffner’03) • Admissible, if started with admissible cost function. • Monotonic; converges quickly optimistic Lower bound

  11. Concurrent MDP (CoMDP)(Mausam & Weld’04) • Allows concurrent combinations of actions • Safe execution: Inherit mutex definitions from classical planning: • Conflicting preconditions • Conflicting effects • Interfering preconditions and effects

  12. Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn s Bellman Backup (CoMDP) Exponential blowup to calculate a Bellman Backup! a1 min a2 a3 Jn+1(s) a1,a2 a1,a3 a2,a3 a1,a2,a3 Ap(s)

  13. Sampled RTDP • RTDP with Stochastic (partial) backups: • Approximate • Always try the last best combination • Randomly sample a few other combinations • In practice • Close to optimal solutions • Converges very fast

  14. Outline of the talk • MDP and CoMDP • Concurrent Probabilistic Temporal Planning • Concurrent MDP in augmented state space. • Solution Methods for CPTP • Two heuristics to guide the search • Hybridisation • Experiments & Conclusions • Related & Future Work

  15. Modelling CPTP as CoMDP • Model explicit action durations • Minimise expected make-span. • If we initialise C(a) as its duration – (a) : • CoMDP • CPTP Aligned epochs Interwoven epochs

  16. <X,;> <X1,{(a,1), (c,3)}> X1 : Application of b on X. <X2,{(h,1)}> X2 : Application of a, b, c, d and e over X. Augmented state space Time 0 3 6 9 a d f b e g X c h

  17. Simplifying assumptions • All actions have deterministic durations. • All action durations are integers. • Action model • Preconditions must hold until end of action. • Effects are usable only at the end of action. • Properties : • Mutex rules are still required. • Sufficient to consider only epochs when an action ends

  18. Completing the CoMDP • Redefine • Applicability set • Transition function • Start and goal states. • Example: • Transition function is redefined • Agent moves forward in time to an epoch where some action completes. • Start state : <s0,;> • etc.

  19. Solution • CPTP = CoMDP in interwoven state space. • Thus one may use our sampled RTDP (etc) • PROBLEM: Exponential blowup in the size of the state space.

  20. Outline of the talk • MDP and CoMDP • Concurrent Probabilistic Temporal Planning • Concurrent MDP in augmented state space. • Solution Methods for CPTP • Solution 1 : Two heuristics to guide the search • Solution 2 : Hybridisation • Experiments & Conclusions • Related & Future Work

  21. a a b c X G X G b c Serialisation J*(X)· 20 J*(<X,;>) = 10 Max Concurrency Heuristic (MC) • Define c : maximum number of actions executable concurrently in the domain. • J*(X)· 2£J*(<X,;>) • J*(<X,;>)¸ J*(X)/2 Admissible Heuristic

  22. Eager Effects Heuristic : Solving a relaxed problem • S :S £ Z • Let (X, d)be a state where • Xis the world state. • d: time remaining for all actions (started anytime in the history) to complete execution. • Start state : (s0,0) • Goal states :{ (X,0) | X2G }

  23. 8 a X V b 2 After 2 units c 4 Eager Effects Heuristic (contd.) Allow all actions even when mutex with a or c! (V,6) Hence the name – Eager Effects! Allowing inapplicable actions to execute, thus optimistic! Assuming information of action effects ahead of time, thus optimisitic! Admissible Heuristic

  24. Solution2 : Hybridisation • Observations • Aligned epoch policy is sub-optimal • but fast to compute. • Interwoven epoch policy is optimal • but slow to compute. • Solution: Produce a hybrid policy i.e. : • Output interwoven policy for probable states. • Output aligned policy for improbable states.

  25. Path to goals s G Low Prob. G

  26. Hybrid algorithm (contd.) • Observation: RTDP explores probable branches much more than others. • Algorithm(m,k,r) : Loop • Do m RTDP trials: let current value of start state be J(s0). • Output a hybrid policy () • Interwoven policy for states visited > k times • Aligned policy for other states. • Evaluate policy  : J(s0) Stop if {J(s0) – J(s0)} < rJ(s0) Less than optimal Greater than optimal

  27. Hybridisation • Outputs a proper policy : • Policy defined at all reachablepolicy states • Policy guaranteed to take agent to goal. • Has an optimality ratio (r) parameter • Controls balance between optimality & running times. • Can be used as an anytime algorithm. • Is general – • we can hybridise two algorithms in other cases • e.g. in solving original concurrent MDP.

  28. Outline of the talk • MDP and CoMDP • Concurrent Probabilistic Temporal Planning • Concurrent MDP in augmented state space. • Solution Methods for CPTP • Two heuristics to guide the search • Hybridisation • Experiments & Conclusions • Related & Future Work

  29. Experiments • Domains • Rover • MachineShop • Artificial • State Variables: 14-26 • Durations: 1-20

  30. Speedups in Rover domain

  31. Qualities of solution

  32. Experiments : Summary • Max Concurrency heuristic • Fast to compute • Speeds up the search. • Eager Effects heuristic • High quality • Can be expensive in some domains. • Hybrid algorithm • Very fast • Produces good quality solutions. • Aligned epoch model • Superfast • Outputs poor quality solutions at times.

  33. Related Work • Prottle (Little, Aberdeen, Thiebaux’05) • Generate, test and debug paradigm (Younes & Simmons’04) • Concurrent options (Rohanimanesh & Mahadevan’04)

  34. Future Work • Other applications of hybridisation • CoMDP • MDP • OverSubscription Planning • Relaxing the assumptions • Handling mixed costs • Extending to PDDL2.1 • Stochastic action durations • Extensions to metric resources • State space compression/aggregation

More Related