Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Tractable Planning for Real-World Robotics:The promises and challenges of dealing with uncertainty Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Robots in unstructured environments Tractable Planning for Real-World Robotics 2

A vision for robotic-assisted health-care Providing information Monitoring Rx adherence & safety Calling for help in emergencies Providing physical assistance Reminding to eat, drink, & take meds Moving things around Supporting inter-personal communication Tractable Planning for Real-World Robotics 3

What is hard about planning for real-world robotics? * Highlights from a proposed robot grand challenge • Generating a plan that has high expected utility. • Switching between different representations of the world. • Resolving conflicts between interferingjobs. • Accomplishing jobs in changing, partly unknown environments. • Handling percepts which are incomplete, ambiguous, outdated, incorrect. Tractable Planning for Real-World Robotics 4

Talk outline • Uncertainty in plan-based robotics • Partially Observable Markov Decision Processes (POMDPs) • POMDP solver #1: Point-based value iteration (PBVI) • POMDP solver #2: Policy-contingent abstraction (PolCA+) Tractable Planning for Real-World Robotics 5

Why use a POMDP? • POMDPs provide a rich framework for sequential decision-making, which can model: • Effect uncertainty • State uncertainty • Varying rewards across actions and goals Tractable Planning for Real-World Robotics 6

POMDP applications in last decade Robotics: • Robust mobile robot navigation [Simmons & Koenig, 1995; + many more] • Autonomous helicopter control [Bagnell & Schneider, 2001; Ng et al., 2003] • Machine vision [Bandera et al., 1996; Darrell & Pentland, 1996] • High-level robot control [Pineau et al., 2003] • Robust dialogue management [Roy, Pineau & Thrun, 2000; Peak & Horvitz, 2000] Others: • Machine maintenance [Puterman., 1994] • Network troubleshooting [Thiebeaux et al., 1996] • Circuits testing [correspondence, 2004] • Preference elicitation [Boutilier, 2002] • Medical diagnosis [Hauskrecht, 1997] Tractable Planning for Real-World Robotics 7

POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set st-1 st What goes on: What we see: zt-1 zt at-1 at Tractable Planning for Real-World Robotics 8

POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set T: Pr(s’|s,a) = state-to-state transition probabilities st-1 st What goes on: What we see: zt-1 zt at-1 at Tractable Planning for Real-World Robotics 9

POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set T: Pr(s’|s,a) = state-to-state transition probabilities O: Pr(z|s,a) = observation generation probabilities st-1 st What goes on: What we see: zt-1 zt at-1 at Tractable Planning for Real-World Robotics 10

POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set T: Pr(s’|s,a) = state-to-state transition probabilities O: Pr(z|s,a) = observation generation probabilities R(s,a) = reward function st-1 st What goes on: rt rt-1 What we see: zt-1 zt at-1 at Tractable Planning for Real-World Robotics 11

POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set T: Pr(s’|s,a) = state-to-state transition probabilities O: Pr(z|s,a) = observation generation probabilities R(s,a) = reward function st-1 st What goes on: rt rt-1 What we see: zt-1 zt at-1 at bt-1 bt What we infer: Tractable Planning for Real-World Robotics 12

Examples of robot beliefs robot particles Uniform belief Bi-modal belief Tractable Planning for Real-World Robotics 13

Understanding the belief state • A belief is a probability distribution over states Where Dim(B) = |S|-1 • E.g. Let S={s1, s2} 1 P(s1) 0 Tractable Planning for Real-World Robotics 14

Understanding the belief state • A belief is a probability distribution over states Where Dim(B) = |S|-1 • E.g. Let S={s1, s2, s3} 1 P(s1) 0 P(s2) 1 Tractable Planning for Real-World Robotics 15

Understanding the belief state • A belief is a probability distribution over states Where Dim(B) = |S|-1 • E.g. Let S={s1, s2, s3 , s4} 1 P(s3) P(s1) 0 P(s2) 1 Tractable Planning for Real-World Robotics 16

POMDP solving Objective: Find the sequence of actions that maximizes the expected sum of rewards. Immediate reward Future reward Value function Tractable Planning for Real-World Robotics 17

POMDP value function • Represent V(b) as the upper surface of a set of vectors. • Each vector is a piece of the control policy (= action sequences). • Dim(vector) = number of states. • Modify / add vectors to update value fn (i.e. refine policy). 2 states V(b) P(s1) b Tractable Planning for Real-World Robotics 18

Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length #vectors 0 1 V0(b) P(break-in) b Tractable Planning for Real-World Robotics 19

Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length # vectors 0 1 1 3 Go-to-bed Call-911 V1(b) Investigate P(break-in) b Tractable Planning for Real-World Robotics 20

Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length # vectors 0 1 1 3 2 27 Go-to-bed V2(b) Call-911 Investigate P(break-in) b Tractable Planning for Real-World Robotics 21

Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length # vectors 0 1 1 3 2 27 3 2187 Go-to-bed V3(b) Call-911 Investigate P(break-in) b Tractable Planning for Real-World Robotics 22

Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length # vectors 0 1 1 3 2 27 3 2187 4 14,348,907 Go-to-bed V4(b) Call-911 Investigate P(break-in) b Tractable Planning for Real-World Robotics 23

The curse of history Policy size grows exponentially with the planning horizon: Where Γ = policy size n = planning horizon A = # actions Z = # observations Tractable Planning for Real-World Robotics 24

How many vectors for this problem? 104 (navigation) x 103 (dialogue) states 1000+ observations 100+ actions Tractable Planning for Real-World Robotics 25

Exact solving assumes all beliefs are equally likely robot particles Uniform belief Bi-modal belief N-modal belief INSIGHT: No sequence of actions and observations can produce this N-modal belief. Tractable Planning for Real-World Robotics 27

A new algorithm: Point-based value iteration Approach: Select a small set of belief points Plan for those belief points only V(b) P(s1) b1 b0 b2 Tractable Planning for Real-World Robotics 28

A new algorithm: Point-based value iteration Approach: Select a small set of belief points  Use well-separated, reachable beliefs Plan for those belief points only V(b) P(s1) b1 b0 b2 a,z a,z Tractable Planning for Real-World Robotics 29

A new algorithm: Point-based value iteration Approach: Select a small set of belief points  Use well-separated, reachable beliefs Plan for those belief points only  Learn value and its gradient V(b) P(s1) b1 b0 b2 a,z a,z Tractable Planning for Real-World Robotics 30

Approach: Select a small set of belief points  Use well-separated, reachable beliefs Plan for those belief points only  Learn value and its gradient Pick action that maximizes value: A new algorithm: Point-based value iteration V(b) P(s1) b1 b0 b2 b Tractable Planning for Real-World Robotics 31

The anytime PBVI algorithm • Alternate between: 1. Growing the set of belief point 2. Planning for those belief points • Terminate when you run out of time or have a good policy. Tractable Planning for Real-World Robotics 32

Complexity of value update Exact UpdatePBVI Time: Projection S2 A Z n S2 A Z n Sum S A nZS A Z n B Size: (# vectors) A nZB where: S = # states n = # vectors at iteration n A = # actions B = # belief pointsZ = # observations Tractable Planning for Real-World Robotics 33

Theoretical properties of PBVI Theorem:For any set of belief points B and planning horizon n, the error of the PBVI algorithm is bounded by: Where  is the set of reachable beliefs B is the set of all beliefs V(b) Err Err P(s1) b1 b0 b2 Tractable Planning for Real-World Robotics 34

The anytime PBVI algorithm • Alternate between: 1. Growing the set of belief point 2. Planning for those belief points • Terminate when you run out of time or have a good policy. Tractable Planning for Real-World Robotics 35

PBVI’s belief selection heuristic 1. Leverage insight from policy search methods: • Focus on reachable beliefs. a1,z1 a2,z1 P(s1) ba1,z1 ba2,z1 ba2,z2 b ba1,z2 a2,z2 a1,z2 Tractable Planning for Real-World Robotics 36

PBVI’s belief selection heuristic 1. Leverage insight from policy search methods: • Focus on reachable beliefs. 2. Focus on high probability beliefs: • Consider all actions, but stochastic observation choice. a2,z1 P(s1) ba1,z1 ba2,z1 ba2,z2 b ba1,z2 a1,z2 Tractable Planning for Real-World Robotics 37

PBVI’s belief selection heuristic 1. Leverage insight from policy search methods: • Focus on reachable beliefs. 2. Focus on high probability beliefs: • Consider all actions, but stochastic observation choice. 3. Use the error bound on point-based value updates: • Select well-separated beliefs, rather than near-by beliefs. a2,z1 P(s1) ba2,z1 b ba1,z2 a1,z2 Tractable Planning for Real-World Robotics 38

Classes of value function approximations 1. No belief [Littman et al., 1995] 3. Compressed belief [Poupart&Boutilier, 2002; Roy&Gordon, 2002] 2. Grid over belief [Lovejoy, 1991; Brafman 1997; Hauskrecht, 2000; Zhou&Hansen, 2001] 4. Sample belief points [Poon, 2001; Pineau et al, 2003] x1 x0 x2 Tractable Planning for Real-World Robotics 39

Performance on well-known POMDPs REWARD Maze2 0.11 - 0.07 0.35 0.34 # Belief points Maze2 - 337 - 1840 95 TIME Maze2 1.44 - 24hrs 27898 360 Method No belief [Littman&al., 1995] Grid [Brafman., 1997] Compressed [Poupart&al., 2003] Sample [Poon, 2001] PBVI [Pineau&al., 2003] Maze1 0.20 0.94 0.00 2.30 2.25 Maze3 0.26 - 0.11 0.53 0.53 Maze1 - 174 - 660 470 Maze3 - - - 300 86 Maze3 0.51 - 24hrs 450 288 Maze1 0.19 - 24hrs 12166 3448 Maze3: 60 states Maze1: 36 states Maze2: 92 states Tractable Planning for Real-World Robotics 40

PBVI in the Nursebot domain Objective: Find the patient! State space = RobotPosition  PatientPosition Observation space = RobotPosition + PatientFound Action space = {North, South, East, West, Declare} Tractable Planning for Real-World Robotics 41

PBVI performance on find-the-patient domain Patient found 17% of trials No Belief PBVI Patient found 90% of trials PBVI No Belief Tractable Planning for Real-World Robotics 42

Validation of PBVI’s belief expansion heuristic Find-the-patient domain 870 states, 5 actions, 30 observations PBVI Greedy Random No Belief Tractable Planning for Real-World Robotics 43

Policy assuming full observability You find some: (25%) You loose some: (75%) Tractable Planning for Real-World Robotics 44

PBVI Policy with 3141 belief points You find some: (81%) You loose some: (19%) Tractable Planning for Real-World Robotics 45

PBVI Policy with 643 belief points You find some: (22%) You loose some: (78%) Tractable Planning for Real-World Robotics 46

Highlights of the PBVI algorithm [ Pineau, Gordon & Thrun, IJCAI 2003. Pineau, Gordon & Thrun, NIPS 2003. ] • Algorithmic: • New belief sampling algorithm. • Efficient heuristic for belief point selection. • Anytime performance. • Experimental: • Outperforms previous value approximation algorithms on known problems. • Solves new larger problem (1 order of magnitude increase in problem size). • Theoretical: • Bounded approximation error. Tractable Planning for Real-World Robotics 47

Back to the grand challenge How can we go from 103 states to real-world problems? Tractable Planning for Real-World Robotics 48

Structured POMDPs  Many real-world decision-making problems exhibit structure inherent to the problem domain. High-level controller Navigation Cognitive support Social interaction Move AskWhere Left Right Forward Backward Tractable Planning for Real-World Robotics 50

Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Presentation Transcript

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University