340 likes | 361 Views
Explore Concurrent Probabilistic Temporal Planning (CPTP) through Markov Decision Process (MDP) and Concurrent MDP (CoMDP). Learn solution methods, heuristics, hybridization, and experimental conclusions. Address challenges like exponential state space in CoMDP and apply heuristics for efficient planning.
E N D
Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle
Motivation • Three features of real world planning domains : • Durative actions • All actions (navigation between sites, placing instruments etc.) take time. • Concurrency • Some instruments may warm up • Others may perform their tasks • Others may shutdown to save power. • Uncertainty • All actions (pick up the rock, send data etc.) have a probability of failure.
Motivation (contd.) • Concurrent Temporal Planning (widely studied with deterministic effects) • Extends classical planning • Doesn’t easily extend to probabilistic outcomes. • Concurrent planning with uncertainty (Concurrent MDPs – AAAI’04) • Handle combinations of actions over an MDP • Actions take unit time. • Few planners handle the three in concert!
Outline of the talk • MDP and CoMDP • Concurrent Probabilistic Temporal Planning • Concurrent MDP in augmented state space. • Solution Methods for CPTP • Two heuristics to guide the search • Hybridisation • Experiments & Conclusions • Related & Future Work
unit duration Markov Decision Process • S : a set of states, factored into Boolean variables. • A : a set of actions • Pr (S£A£S! [0,1]): the transition model • C(A!R) : the cost model • s0 : the start state • G : a set of absorbing goals
GOAL of an MDP • Find a policy (S!A) which: • minimises expected cost of reaching a goal • for a fully observable • Markov decision process • if the agent executes for indefinite horizon.
Equations : optimal policy • Define J*(s) {optimal cost} as the minimum expected cost to reach a goal from s. • J* should satisfy:
Min Bellman Backup Jn Qn+1(s,a) Jn a1 min Jn a2 Jn+1(s) s Jn a3 Jn Jn Ap(s) Jn
Min a3 RTDP Trial Jn Qn+1(s,a) amin = a2 Jn a1 min Jn Goal a2 s Jn+1(s) Jn Jn Jn Ap(s) Jn
Real Time Dynamic Programming(Barto, Bradtke and Singh’95) • Trial : Simulate greedy policy; Perform Bellman backup on visited states • Repeat RTDP Trials until cost function converges • Anytime behaviour • Only expands reachable state space • Complete convergence is slow • Labeled RTDP (Bonet & Geffner’03) • Admissible, if started with admissible cost function. • Monotonic; converges quickly optimistic Lower bound
Concurrent MDP (CoMDP)(Mausam & Weld’04) • Allows concurrent combinations of actions • Safe execution: Inherit mutex definitions from classical planning: • Conflicting preconditions • Conflicting effects • Interfering preconditions and effects
Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn Jn s Bellman Backup (CoMDP) Exponential blowup to calculate a Bellman Backup! a1 min a2 a3 Jn+1(s) a1,a2 a1,a3 a2,a3 a1,a2,a3 Ap(s)
Sampled RTDP • RTDP with Stochastic (partial) backups: • Approximate • Always try the last best combination • Randomly sample a few other combinations • In practice • Close to optimal solutions • Converges very fast
Outline of the talk • MDP and CoMDP • Concurrent Probabilistic Temporal Planning • Concurrent MDP in augmented state space. • Solution Methods for CPTP • Two heuristics to guide the search • Hybridisation • Experiments & Conclusions • Related & Future Work
Modelling CPTP as CoMDP • Model explicit action durations • Minimise expected make-span. • If we initialise C(a) as its duration – (a) : • CoMDP • CPTP Aligned epochs Interwoven epochs
<X,;> <X1,{(a,1), (c,3)}> X1 : Application of b on X. <X2,{(h,1)}> X2 : Application of a, b, c, d and e over X. Augmented state space Time 0 3 6 9 a d f b e g X c h
Simplifying assumptions • All actions have deterministic durations. • All action durations are integers. • Action model • Preconditions must hold until end of action. • Effects are usable only at the end of action. • Properties : • Mutex rules are still required. • Sufficient to consider only epochs when an action ends
Completing the CoMDP • Redefine • Applicability set • Transition function • Start and goal states. • Example: • Transition function is redefined • Agent moves forward in time to an epoch where some action completes. • Start state : <s0,;> • etc.
Solution • CPTP = CoMDP in interwoven state space. • Thus one may use our sampled RTDP (etc) • PROBLEM: Exponential blowup in the size of the state space.
Outline of the talk • MDP and CoMDP • Concurrent Probabilistic Temporal Planning • Concurrent MDP in augmented state space. • Solution Methods for CPTP • Solution 1 : Two heuristics to guide the search • Solution 2 : Hybridisation • Experiments & Conclusions • Related & Future Work
a a b c X G X G b c Serialisation J*(X)· 20 J*(<X,;>) = 10 Max Concurrency Heuristic (MC) • Define c : maximum number of actions executable concurrently in the domain. • J*(X)· 2£J*(<X,;>) • J*(<X,;>)¸ J*(X)/2 Admissible Heuristic
Eager Effects Heuristic : Solving a relaxed problem • S :S £ Z • Let (X, d)be a state where • Xis the world state. • d: time remaining for all actions (started anytime in the history) to complete execution. • Start state : (s0,0) • Goal states :{ (X,0) | X2G }
8 a X V b 2 After 2 units c 4 Eager Effects Heuristic (contd.) Allow all actions even when mutex with a or c! (V,6) Hence the name – Eager Effects! Allowing inapplicable actions to execute, thus optimistic! Assuming information of action effects ahead of time, thus optimisitic! Admissible Heuristic
Solution2 : Hybridisation • Observations • Aligned epoch policy is sub-optimal • but fast to compute. • Interwoven epoch policy is optimal • but slow to compute. • Solution: Produce a hybrid policy i.e. : • Output interwoven policy for probable states. • Output aligned policy for improbable states.
Path to goals s G Low Prob. G
Hybrid algorithm (contd.) • Observation: RTDP explores probable branches much more than others. • Algorithm(m,k,r) : Loop • Do m RTDP trials: let current value of start state be J(s0). • Output a hybrid policy () • Interwoven policy for states visited > k times • Aligned policy for other states. • Evaluate policy : J(s0) Stop if {J(s0) – J(s0)} < rJ(s0) Less than optimal Greater than optimal
Hybridisation • Outputs a proper policy : • Policy defined at all reachablepolicy states • Policy guaranteed to take agent to goal. • Has an optimality ratio (r) parameter • Controls balance between optimality & running times. • Can be used as an anytime algorithm. • Is general – • we can hybridise two algorithms in other cases • e.g. in solving original concurrent MDP.
Outline of the talk • MDP and CoMDP • Concurrent Probabilistic Temporal Planning • Concurrent MDP in augmented state space. • Solution Methods for CPTP • Two heuristics to guide the search • Hybridisation • Experiments & Conclusions • Related & Future Work
Experiments • Domains • Rover • MachineShop • Artificial • State Variables: 14-26 • Durations: 1-20
Experiments : Summary • Max Concurrency heuristic • Fast to compute • Speeds up the search. • Eager Effects heuristic • High quality • Can be expensive in some domains. • Hybrid algorithm • Very fast • Produces good quality solutions. • Aligned epoch model • Superfast • Outputs poor quality solutions at times.
Related Work • Prottle (Little, Aberdeen, Thiebaux’05) • Generate, test and debug paradigm (Younes & Simmons’04) • Concurrent options (Rohanimanesh & Mahadevan’04)
Future Work • Other applications of hybridisation • CoMDP • MDP • OverSubscription Planning • Relaxing the assumptions • Handling mixed costs • Extending to PDDL2.1 • Stochastic action durations • Extensions to metric resources • State space compression/aggregation