A Hybridized Planner for Stochastic Domains

A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento

Planning under Uncertainty(ICAPS’03 Workshop) • Qualitative (disjunctive) uncertainty • Which real problem can you solve? • Quantitative (probabilistic) uncertainty • Which real problem can you model?

The Quantitative View • Markov Decision Process • models uncertainty with probabilistic outcomes • general decision-theoretic framework • algorithms are slow • do we need the full power of decision theory? • is an unconverged partial policy any good?

The Qualitative View • Conditional Planning • Model uncertainty as logical disjunction of outcomes • exploits classical planning techniques  FAST • ignores probabilities  poor solutions • how bad are pure qualitative solutions? • can we improve the qualitative policies?

HybPlan: A Hybridized Planner • combine probabilistic + disjunctive planners • produces good solutions in intermediate times • anytime: makes effective use of resources • bounds termination with quality guarantee • Quantitative View • completes partial probabilistic policy by using qualitative policies in some states • Qualitative View • improves qualitative policies in more important regions

Outline • Motivation • Planning with Probabilistic Uncertainty (RTDP) • Planning with Disjunctive Uncertainty (MBP) • Hybridizing RTDP and MBP (HybPlan) • Experiments • Conclusions and Future Work

Markov Decision Process < S, A, Pr, C, s0, G > S : a set of states A : a set of actions Pr : prob. transition model C : cost model s0 : start state G: a set of goals Find a policy (S!A) • minimizes expected cost to reach a goal • for an indefinite horizon • for a fully observable • Markov decision process. Optimal cost function, J*, ~ optimal policy

Example 2 Longer path s0 Goal All states are dead-ends 2 Wrong direction, but goal still reachable

Optimal State Costs 2 2 3 3 4 4 1 1 3 2 1 1 4 0 1 1 3 2 1 Goal 8 8 2 7 7 6

Optimal Policy 3 2 1 4 0 3 2 1 Goal

Bellman Backup: Create better approximation to cost function @ s

Bellman Backup: Create better approximation to cost function @ s Trial=simulate greedy policy & update visited states

Real Time Dynamic Programming(Barto et al. ’95; Bonet & Geffner’03) Bellman Backup: Create better approximation to cost function @ s Repeat trials until cost function converges Trial=simulate greedy policy & update visited states

Planning with Disjunctive Uncertainty • < S, A, T, s0, G > S : a set of states A : a set of actions T : disjunctive transition model s0 : the start state G: a set of goals • Find a strong-cyclic policy (S!A) • that guarantees reaching a goal • for an indefinite horizon • for a fully observable • planning problem

Model Based Planner (Bertoli et. al.) • States, transitions, etc. represented logically • Uncertainty  multiple possible successor states • Planning Algorithm • Iteratively removes “bad” states. • Bad = don’t reach anywhere or reach other bad states

MBP Policy Sub-optimal solution Goal

Outline • Motivation • Planning with Probabilistic Uncertainty (RTDP) • Planning with Disjunctive Uncertainty (MBP) • Hybridizing RTDP and MBP (HybPlan) • Experiments • Conclusions and Future Work

HybPlan Top Level Code 0. run MBP to find a solution to goal • run RTDP for some time • compute partial greedy policy (rtdp) • compute hybridized policy (hyb) by • hyb(s) = rtdp(s) if visited(s) > threshold • hyb(s) = mbp(s) otherwise • cleanhyb by removing • dead-ends • probability 1 cycles • evaluatehyb • save best policy obtained so far repeat until 1) resources exhaust or 2)a satisfactory policy found

First RTDP Trial 0 run RTDP for some time 2 0 0 0 0 0 0 0 0 0 0 0 Goal 0 0 0 0 0 0 0 2 0 0 0

Bellman Backup 0 run RTDP for some time 2 0 0 0 0 0 0 0 0 0 0 Goal 0 0 0 0 0 0 0 Q1(s,N) = 1 + 0.5£ 0 + 0.5£ 0 Q1(s,N) = 1 Q1(s,S) = Q1(s,W) = Q1(s,E) = 1 J1(s) = 1 Let greedy action be North 2 0 0 0

Simulation of Greedy Action 0 run RTDP for some time 2 0 0 0 0 0 0 0 0 0 0 1 Goal 0 0 0 0 0 0 0 2 0 0 0

Continuing First Trial 0 run RTDP for some time 2 0 0 0 0 0 0 0 0 0 1 Goal 0 0 0 0 0 0 0 2 0 0 0

Continuing First Trial 0 run RTDP for some time 2 0 0 1 0 0 0 0 0 0 1 Goal 0 0 0 0 0 0 0 2 0 0 0

Finishing First Trial run RTDP for some time 2 1 0 0 1 0 0 0 0 0 0 1 Goal 0 0 0 0 0 0 0 2 0 0 0

Cost Function after First Trial 2 run RTDP for some time 2 1 0 0 1 0 0 0 0 0 0 1 Goal 0 0 0 0 0 0 0 2 0 0 0

Partial Greedy Policy 2 2. compute greedy policy (rtdp) 2 1 0 1 1 Goal

Construct Hybridized Policy w/ MBP 2 3. compute hybridized policy (hyb) (threshold = 0) 2 1 0 0 1 1 Goal

Evaluate Hybridized Policy 2 2 5. evaluatehyb 6. store hyb 2 1 0 3 3 0 1 4 4 1 Goal 5 After first trial J(hyb) = 5

Second Trial 2 2 1 0 0 1 0 0 0 0 0 2 1 Goal 1 1 0 0 0 0 0 2 0 0 0

Partial Greedy Policy 0 2 1 1 1

Absence of MBP Policy 2 2 1 0 MBP Policy doesn’t exist! no path to goal 0 1 0 £ 2 1 Goal 1 1

Third Trial 2 2 1 0 0 1 0 0 0 0 0 2 1 Goal 1 1 0 0 0 0 1 2 1 0 3

Partial Greedy Policy 1 0 1 2 1 3

Probability 1 Cycles repeat find a state s in cycle hyb(s) = mbp(s) until cycle is broken 1 0 1 2 1 0 3

Probability 1 Cycles 2 2 1 0 0 1 repeat find a state s in cycle hyb(s) = mbp(s) until cycle is broken 1 Goal 0 1 2 1 0 3

Error Bound 2 2 2 1 0 3 3 0 1 4 4 J*(s0) · 5 J*(s0) ¸ 1 ) Error(hyb) = 5-1 = 4 1 Goal 5 After 1st trial J(hyb) = 5

Termination • when a policy of required error bound is found • when the planning time exhausts • when the available memory exhausts Properties • outputs a proper policy • anytime algorithm (once MBP terminates) • HybPlan = RTDP, if infinite resources available • HybPlan = MBP, if extremely limited resources • HybPlan = better than both, otherwise

Outline • Motivation • Planning with Probabilistic Uncertainty (RTDP) • Planning with Disjunctive Uncertainty (MBP) • Hybridizing RTDP and MBP (HybPlan) • Experiments • Anytime Properties • Scalability • Conclusions and Future Work

Domains NASA Rover Domain Factory Domain Elevator domain

Anytime Properties RTDP

Scalability

Conclusions • First algorithm that integrates disjunctive and probabilistic planners. • Experiments show that HybPlan is • anytime • scales better than RTDP • produces better quality solutions than MBP • can interleaved planning and execution

Hybridized Planning: A General Notion • Hybridize other pairs of planners • an optimal or close-to-optimal planner • a sub-optimal but fast planner to yield a planner that produces • a good quality solution in intermediate running times • Examples • POMDP : RTDP/PBVI with POND/MBP/BBSP • Oversubscription Planning : A* with greedy solutions • Concurrent MDP : Sampled RTDP with single-action RTDP

A Hybridized Planner for Stochastic Domains

A Hybridized Planner for Stochastic Domains

Presentation Transcript

PTSD for all Domains

Domains

A stochastic dimension reduction for stochastic PDEs

Domains

Domains

Domains

For your planner…

Hybridized Orbitals

Domains

Domains

Domains for informational writing

Using a Planner

Domains for Dummies

Algorithm for a Large Class of Domains

The Hybridized Driveway

Free Domains for a Free Internet

DOMAINS

Policy Generation for Continuous-time Stochastic Domains with Concurrency

A Stochastic Model for Intrusions

Domains

A Decomposition Heuristic for Stochastic Programming