340 likes | 485 Views
Concurrent Probabilistic Temporal Planning (CPTP). Mausam Joint work with Daniel S. Weld University of Washington Seattle. Motivation. Three features of real world planning domains : Durative actions All actions (navigation between sites, placing instruments etc.) take time.
E N D
Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle
Motivation • Three features of real world planning domains : • Durative actions • All actions (navigation between sites, placing instruments etc.) take time. • Concurrency • Some instruments may warm up • Others may perform their tasks • Others may shutdown to save power. • Uncertainty • All actions (pick up the rock, send data etc.) have a probability of failure.
Motivation (contd.) • Concurrent Temporal Planning (widely studied with deterministic effects) • Extends classical planning • Doesn’t easily extend to probabilistic outcomes. • Concurrent planning with uncertainty (Concurrent MDPs – AAAI’04) • Handle combinations of actions over an MDP • Actions take unit time. • Few planners handle the three in concert!
Outline of the talk • MDP and CoMDP • Concurrent Probabilistic Temporal Planning • Concurrent MDP in augmented state space. • Solution Methods for CPTP • Two heuristics to guide the search • Hybridisation • Experiments & Conclusions • Related & Future Work
unit duration Markov Decision Process • S : a set of states, factored into Boolean variables. • A : a set of actions • Pr (S£A£S! [0,1]): the transition model • C(A!R) : the cost model • s0 : the start state • G : a set of absorbing goals
GOAL of an MDP • Find a policy (S!A) which: • minimises expected cost of reaching a goal • for a fully observable • Markov decision process • if the agent executes for indefinite horizon.
Equations : optimal policy • Define J*(s) {optimal cost} as the minimum expected cost to reach a goal from s. • J* should satisfy:
Min Bellman Backup Jn Qn+1(s,a) Jn a1 min Jn a2 Jn+1(s) s Jn a3 Jn Jn Ap(s) Jn
Min a3 RTDP Trial Jn Qn+1(s,a) amin = a2 Jn a1 min Jn Goal a2 s Jn+1(s) Jn Jn Jn Ap(s) Jn
Real Time Dynamic Programming(Barto, Bradtke and Singh’95) • Trial : Simulate greedy policy; Perform Bellman backup on visited states • Repeat RTDP Trials until cost function converges • Anytime behaviour • Only expands reachable state space • Complete convergence is slow • Labeled RTDP (Bonet & Geffner’03) • Admissible, if started with admissible cost function. • Monotonic; converges quickly optimistic Lower bound
Concurrent MDP (CoMDP)(Mausam & Weld’04) • Allows concurrent combinations of actions • Safe execution: Inherit mutex definitions from classical planning: • Conflicting preconditions • Conflicting effects • Interfering preconditions and effects
Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn s Bellman Backup (CoMDP) Exponential blowup to calculate a Bellman Backup! a1 min a2 a3 Jn+1(s) a1,a2 a1,a3 a2,a3 a1,a2,a3 Ap(s)
Sampled RTDP • RTDP with Stochastic (partial) backups: • Approximate • Always try the last best combination • Randomly sample a few other combinations • In practice • Close to optimal solutions • Converges very fast
Outline of the talk • MDP and CoMDP • Concurrent Probabilistic Temporal Planning • Concurrent MDP in augmented state space. • Solution Methods for CPTP • Two heuristics to guide the search • Hybridisation • Experiments & Conclusions • Related & Future Work
Modelling CPTP as CoMDP • Model explicit action durations • Minimise expected make-span. • If we initialise C(a) as its duration – (a) : • CoMDP • CPTP Aligned epochs Interwoven epochs
<X,;> <X1,{(a,1), (c,3)}> X1 : Application of b on X. <X2,{(h,1)}> X2 : Application of a, b, c, d and e over X. Augmented state space Time 0 3 6 9 a d f b e g X c h
Simplifying assumptions • All actions have deterministic durations. • All action durations are integers. • Action model • Preconditions must hold until end of action. • Effects are usable only at the end of action. • Properties : • Mutex rules are still required. • Sufficient to consider only epochs when an action ends
Completing the CoMDP • Redefine • Applicability set • Transition function • Start and goal states. • Example: • Transition function is redefined • Agent moves forward in time to an epoch where some action completes. • Start state : <s0,;> • etc.
Solution • CPTP = CoMDP in interwoven state space. • Thus one may use our sampled RTDP (etc) • PROBLEM: Exponential blowup in the size of the state space.
Outline of the talk • MDP and CoMDP • Concurrent Probabilistic Temporal Planning • Concurrent MDP in augmented state space. • Solution Methods for CPTP • Solution 1 : Two heuristics to guide the search • Solution 2 : Hybridisation • Experiments & Conclusions • Related & Future Work
a a b c X G X G b c Serialisation J*(X)· 20 J*(<X,;>) = 10 Max Concurrency Heuristic (MC) • Define c : maximum number of actions executable concurrently in the domain. • J*(X)· 2£J*(<X,;>) • J*(<X,;>)¸ J*(X)/2 Admissible Heuristic
Eager Effects Heuristic : Solving a relaxed problem • S :S £ Z • Let (X, d)be a state where • Xis the world state. • d: time remaining for all actions (started anytime in the history) to complete execution. • Start state : (s0,0) • Goal states :{ (X,0) | X2G }
8 a X V b 2 After 2 units c 4 Eager Effects Heuristic (contd.) Allow all actions even when mutex with a or c! (V,6) Hence the name – Eager Effects! Allowing inapplicable actions to execute, thus optimistic! Assuming information of action effects ahead of time, thus optimisitic! Admissible Heuristic
Solution2 : Hybridisation • Observations • Aligned epoch policy is sub-optimal • but fast to compute. • Interwoven epoch policy is optimal • but slow to compute. • Solution: Produce a hybrid policy i.e. : • Output interwoven policy for probable states. • Output aligned policy for improbable states.
Path to goals s G Low Prob. G
Hybrid algorithm (contd.) • Observation: RTDP explores probable branches much more than others. • Algorithm(m,k,r) : Loop • Do m RTDP trials: let current value of start state be J(s0). • Output a hybrid policy () • Interwoven policy for states visited > k times • Aligned policy for other states. • Evaluate policy : J(s0) Stop if {J(s0) – J(s0)} < rJ(s0) Less than optimal Greater than optimal
Hybridisation • Outputs a proper policy : • Policy defined at all reachablepolicy states • Policy guaranteed to take agent to goal. • Has an optimality ratio (r) parameter • Controls balance between optimality & running times. • Can be used as an anytime algorithm. • Is general – • we can hybridise two algorithms in other cases • e.g. in solving original concurrent MDP.
Outline of the talk • MDP and CoMDP • Concurrent Probabilistic Temporal Planning • Concurrent MDP in augmented state space. • Solution Methods for CPTP • Two heuristics to guide the search • Hybridisation • Experiments & Conclusions • Related & Future Work
Experiments • Domains • Rover • MachineShop • Artificial • State Variables: 14-26 • Durations: 1-20
Experiments : Summary • Max Concurrency heuristic • Fast to compute • Speeds up the search. • Eager Effects heuristic • High quality • Can be expensive in some domains. • Hybrid algorithm • Very fast • Produces good quality solutions. • Aligned epoch model • Superfast • Outputs poor quality solutions at times.
Related Work • Prottle (Little, Aberdeen, Thiebaux’05) • Generate, test and debug paradigm (Younes & Simmons’04) • Concurrent options (Rohanimanesh & Mahadevan’04)
Future Work • Other applications of hybridisation • CoMDP • MDP • OverSubscription Planning • Relaxing the assumptions • Handling mixed costs • Extending to PDDL2.1 • Stochastic action durations • Extensions to metric resources • State space compression/aggregation