710 likes | 825 Views
Tractable Planning for Real-World Robotics: The promises and challenges of dealing with uncertainty. Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004. Robots in unstructured environments. A vision for robotic-assisted health-care. Providing
E N D
Tractable Planning for Real-World Robotics:The promises and challenges of dealing with uncertainty Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004
Robots in unstructured environments Tractable Planning for Real-World Robotics 2
A vision for robotic-assisted health-care Providing information Monitoring Rx adherence & safety Calling for help in emergencies Providing physical assistance Reminding to eat, drink, & take meds Moving things around Supporting inter-personal communication Tractable Planning for Real-World Robotics 3
What is hard about planning for real-world robotics? * Highlights from a proposed robot grand challenge • Generating a plan that has high expected utility. • Switching between different representations of the world. • Resolving conflicts between interferingjobs. • Accomplishing jobs in changing, partly unknown environments. • Handling percepts which are incomplete, ambiguous, outdated, incorrect. Tractable Planning for Real-World Robotics 4
Talk outline • Uncertainty in plan-based robotics • Partially Observable Markov Decision Processes (POMDPs) • POMDP solver #1: Point-based value iteration (PBVI) • POMDP solver #2: Policy-contingent abstraction (PolCA+) Tractable Planning for Real-World Robotics 5
Why use a POMDP? • POMDPs provide a rich framework for sequential decision-making, which can model: • Effect uncertainty • State uncertainty • Varying rewards across actions and goals Tractable Planning for Real-World Robotics 6
POMDP applications in last decade Robotics: • Robust mobile robot navigation [Simmons & Koenig, 1995; + many more] • Autonomous helicopter control [Bagnell & Schneider, 2001; Ng et al., 2003] • Machine vision [Bandera et al., 1996; Darrell & Pentland, 1996] • High-level robot control [Pineau et al., 2003] • Robust dialogue management [Roy, Pineau & Thrun, 2000; Peak & Horvitz, 2000] Others: • Machine maintenance [Puterman., 1994] • Network troubleshooting [Thiebeaux et al., 1996] • Circuits testing [correspondence, 2004] • Preference elicitation [Boutilier, 2002] • Medical diagnosis [Hauskrecht, 1997] Tractable Planning for Real-World Robotics 7
POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set st-1 st What goes on: What we see: zt-1 zt at-1 at Tractable Planning for Real-World Robotics 8
POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set T: Pr(s’|s,a) = state-to-state transition probabilities st-1 st What goes on: What we see: zt-1 zt at-1 at Tractable Planning for Real-World Robotics 9
POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set T: Pr(s’|s,a) = state-to-state transition probabilities O: Pr(z|s,a) = observation generation probabilities st-1 st What goes on: What we see: zt-1 zt at-1 at Tractable Planning for Real-World Robotics 10
POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set T: Pr(s’|s,a) = state-to-state transition probabilities O: Pr(z|s,a) = observation generation probabilities R(s,a) = reward function st-1 st What goes on: rt rt-1 What we see: zt-1 zt at-1 at Tractable Planning for Real-World Robotics 11
POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set T: Pr(s’|s,a) = state-to-state transition probabilities O: Pr(z|s,a) = observation generation probabilities R(s,a) = reward function st-1 st What goes on: rt rt-1 What we see: zt-1 zt at-1 at bt-1 bt What we infer: Tractable Planning for Real-World Robotics 12
Examples of robot beliefs robot particles Uniform belief Bi-modal belief Tractable Planning for Real-World Robotics 13
Understanding the belief state • A belief is a probability distribution over states Where Dim(B) = |S|-1 • E.g. Let S={s1, s2} 1 P(s1) 0 Tractable Planning for Real-World Robotics 14
Understanding the belief state • A belief is a probability distribution over states Where Dim(B) = |S|-1 • E.g. Let S={s1, s2, s3} 1 P(s1) 0 P(s2) 1 Tractable Planning for Real-World Robotics 15
Understanding the belief state • A belief is a probability distribution over states Where Dim(B) = |S|-1 • E.g. Let S={s1, s2, s3 , s4} 1 P(s3) P(s1) 0 P(s2) 1 Tractable Planning for Real-World Robotics 16
POMDP solving Objective: Find the sequence of actions that maximizes the expected sum of rewards. Immediate reward Future reward Value function Tractable Planning for Real-World Robotics 17
POMDP value function • Represent V(b) as the upper surface of a set of vectors. • Each vector is a piece of the control policy (= action sequences). • Dim(vector) = number of states. • Modify / add vectors to update value fn (i.e. refine policy). 2 states V(b) P(s1) b Tractable Planning for Real-World Robotics 18
Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length #vectors 0 1 V0(b) P(break-in) b Tractable Planning for Real-World Robotics 19
Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length # vectors 0 1 1 3 Go-to-bed Call-911 V1(b) Investigate P(break-in) b Tractable Planning for Real-World Robotics 20
Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length # vectors 0 1 1 3 2 27 Go-to-bed V2(b) Call-911 Investigate P(break-in) b Tractable Planning for Real-World Robotics 21
Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length # vectors 0 1 1 3 2 27 3 2187 Go-to-bed V3(b) Call-911 Investigate P(break-in) b Tractable Planning for Real-World Robotics 22
Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length # vectors 0 1 1 3 2 27 3 2187 4 14,348,907 Go-to-bed V4(b) Call-911 Investigate P(break-in) b Tractable Planning for Real-World Robotics 23
The curse of history Policy size grows exponentially with the planning horizon: Where Γ = policy size n = planning horizon A = # actions Z = # observations Tractable Planning for Real-World Robotics 24
How many vectors for this problem? 104 (navigation) x 103 (dialogue) states 1000+ observations 100+ actions Tractable Planning for Real-World Robotics 25
Talk outline • Uncertainty in plan-based robotics • Partially Observable Markov Decision Processes (POMDPs) • POMDP solver #1: Point-based value iteration (PBVI) • POMDP solver #2: Policy-contingent abstraction (PolCA+) Tractable Planning for Real-World Robotics 26
Exact solving assumes all beliefs are equally likely robot particles Uniform belief Bi-modal belief N-modal belief INSIGHT: No sequence of actions and observations can produce this N-modal belief. Tractable Planning for Real-World Robotics 27
A new algorithm: Point-based value iteration Approach: Select a small set of belief points Plan for those belief points only V(b) P(s1) b1 b0 b2 Tractable Planning for Real-World Robotics 28
A new algorithm: Point-based value iteration Approach: Select a small set of belief points Use well-separated, reachable beliefs Plan for those belief points only V(b) P(s1) b1 b0 b2 a,z a,z Tractable Planning for Real-World Robotics 29
A new algorithm: Point-based value iteration Approach: Select a small set of belief points Use well-separated, reachable beliefs Plan for those belief points only Learn value and its gradient V(b) P(s1) b1 b0 b2 a,z a,z Tractable Planning for Real-World Robotics 30
Approach: Select a small set of belief points Use well-separated, reachable beliefs Plan for those belief points only Learn value and its gradient Pick action that maximizes value: A new algorithm: Point-based value iteration V(b) P(s1) b1 b0 b2 b Tractable Planning for Real-World Robotics 31
The anytime PBVI algorithm • Alternate between: 1. Growing the set of belief point 2. Planning for those belief points • Terminate when you run out of time or have a good policy. Tractable Planning for Real-World Robotics 32
Complexity of value update Exact UpdatePBVI Time: Projection S2 A Z n S2 A Z n Sum S A nZS A Z n B Size: (# vectors) A nZB where: S = # states n = # vectors at iteration n A = # actions B = # belief pointsZ = # observations Tractable Planning for Real-World Robotics 33
Theoretical properties of PBVI Theorem:For any set of belief points B and planning horizon n, the error of the PBVI algorithm is bounded by: Where is the set of reachable beliefs B is the set of all beliefs V(b) Err Err P(s1) b1 b0 b2 Tractable Planning for Real-World Robotics 34
The anytime PBVI algorithm • Alternate between: 1. Growing the set of belief point 2. Planning for those belief points • Terminate when you run out of time or have a good policy. Tractable Planning for Real-World Robotics 35
PBVI’s belief selection heuristic 1. Leverage insight from policy search methods: • Focus on reachable beliefs. a1,z1 a2,z1 P(s1) ba1,z1 ba2,z1 ba2,z2 b ba1,z2 a2,z2 a1,z2 Tractable Planning for Real-World Robotics 36
PBVI’s belief selection heuristic 1. Leverage insight from policy search methods: • Focus on reachable beliefs. 2. Focus on high probability beliefs: • Consider all actions, but stochastic observation choice. a2,z1 P(s1) ba1,z1 ba2,z1 ba2,z2 b ba1,z2 a1,z2 Tractable Planning for Real-World Robotics 37
PBVI’s belief selection heuristic 1. Leverage insight from policy search methods: • Focus on reachable beliefs. 2. Focus on high probability beliefs: • Consider all actions, but stochastic observation choice. 3. Use the error bound on point-based value updates: • Select well-separated beliefs, rather than near-by beliefs. a2,z1 P(s1) ba2,z1 b ba1,z2 a1,z2 Tractable Planning for Real-World Robotics 38
Classes of value function approximations 1. No belief [Littman et al., 1995] 3. Compressed belief [Poupart&Boutilier, 2002; Roy&Gordon, 2002] 2. Grid over belief [Lovejoy, 1991; Brafman 1997; Hauskrecht, 2000; Zhou&Hansen, 2001] 4. Sample belief points [Poon, 2001; Pineau et al, 2003] x1 x0 x2 Tractable Planning for Real-World Robotics 39
Performance on well-known POMDPs REWARD Maze2 0.11 - 0.07 0.35 0.34 # Belief points Maze2 - 337 - 1840 95 TIME Maze2 1.44 - 24hrs 27898 360 Method No belief [Littman&al., 1995] Grid [Brafman., 1997] Compressed [Poupart&al., 2003] Sample [Poon, 2001] PBVI [Pineau&al., 2003] Maze1 0.20 0.94 0.00 2.30 2.25 Maze3 0.26 - 0.11 0.53 0.53 Maze1 - 174 - 660 470 Maze3 - - - 300 86 Maze3 0.51 - 24hrs 450 288 Maze1 0.19 - 24hrs 12166 3448 Maze3: 60 states Maze1: 36 states Maze2: 92 states Tractable Planning for Real-World Robotics 40
PBVI in the Nursebot domain Objective: Find the patient! State space = RobotPosition PatientPosition Observation space = RobotPosition + PatientFound Action space = {North, South, East, West, Declare} Tractable Planning for Real-World Robotics 41
PBVI performance on find-the-patient domain Patient found 17% of trials No Belief PBVI Patient found 90% of trials PBVI No Belief Tractable Planning for Real-World Robotics 42
Validation of PBVI’s belief expansion heuristic Find-the-patient domain 870 states, 5 actions, 30 observations PBVI Greedy Random No Belief Tractable Planning for Real-World Robotics 43
Policy assuming full observability You find some: (25%) You loose some: (75%) Tractable Planning for Real-World Robotics 44
PBVI Policy with 3141 belief points You find some: (81%) You loose some: (19%) Tractable Planning for Real-World Robotics 45
PBVI Policy with 643 belief points You find some: (22%) You loose some: (78%) Tractable Planning for Real-World Robotics 46
Highlights of the PBVI algorithm [ Pineau, Gordon & Thrun, IJCAI 2003. Pineau, Gordon & Thrun, NIPS 2003. ] • Algorithmic: • New belief sampling algorithm. • Efficient heuristic for belief point selection. • Anytime performance. • Experimental: • Outperforms previous value approximation algorithms on known problems. • Solves new larger problem (1 order of magnitude increase in problem size). • Theoretical: • Bounded approximation error. Tractable Planning for Real-World Robotics 47
Back to the grand challenge How can we go from 103 states to real-world problems? Tractable Planning for Real-World Robotics 48
Talk outline • Uncertainty in plan-based robotics • Partially Observable Markov Decision Processes (POMDPs) • POMDP solver #1: Point-based value iteration (PBVI) • POMDP solver #2: Policy-contingent abstraction (PolCA+) Tractable Planning for Real-World Robotics 49
Structured POMDPs Many real-world decision-making problems exhibit structure inherent to the problem domain. High-level controller Navigation Cognitive support Social interaction Move AskWhere Left Right Forward Backward Tractable Planning for Real-World Robotics 50