1 / 70

Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Tractable Planning for Real-World Robotics: The promises and challenges of dealing with uncertainty. Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004. Robots in unstructured environments. A vision for robotic-assisted health-care. Providing

rory
Download Presentation

Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tractable Planning for Real-World Robotics:The promises and challenges of dealing with uncertainty Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

  2. Robots in unstructured environments Tractable Planning for Real-World Robotics 2

  3. A vision for robotic-assisted health-care Providing information Monitoring Rx adherence & safety Calling for help in emergencies Providing physical assistance Reminding to eat, drink, & take meds Moving things around Supporting inter-personal communication Tractable Planning for Real-World Robotics 3

  4. What is hard about planning for real-world robotics? * Highlights from a proposed robot grand challenge • Generating a plan that has high expected utility. • Switching between different representations of the world. • Resolving conflicts between interferingjobs. • Accomplishing jobs in changing, partly unknown environments. • Handling percepts which are incomplete, ambiguous, outdated, incorrect. Tractable Planning for Real-World Robotics 4

  5. Talk outline • Uncertainty in plan-based robotics • Partially Observable Markov Decision Processes (POMDPs) • POMDP solver #1: Point-based value iteration (PBVI) • POMDP solver #2: Policy-contingent abstraction (PolCA+) Tractable Planning for Real-World Robotics 5

  6. Why use a POMDP? • POMDPs provide a rich framework for sequential decision-making, which can model: • Effect uncertainty • State uncertainty • Varying rewards across actions and goals Tractable Planning for Real-World Robotics 6

  7. POMDP applications in last decade Robotics: • Robust mobile robot navigation [Simmons & Koenig, 1995; + many more] • Autonomous helicopter control [Bagnell & Schneider, 2001; Ng et al., 2003] • Machine vision [Bandera et al., 1996; Darrell & Pentland, 1996] • High-level robot control [Pineau et al., 2003] • Robust dialogue management [Roy, Pineau & Thrun, 2000; Peak & Horvitz, 2000] Others: • Machine maintenance [Puterman., 1994] • Network troubleshooting [Thiebeaux et al., 1996] • Circuits testing [correspondence, 2004] • Preference elicitation [Boutilier, 2002] • Medical diagnosis [Hauskrecht, 1997] Tractable Planning for Real-World Robotics 7

  8. POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set st-1 st What goes on: What we see: zt-1 zt at-1 at Tractable Planning for Real-World Robotics 8

  9. POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set T: Pr(s’|s,a) = state-to-state transition probabilities st-1 st What goes on: What we see: zt-1 zt at-1 at Tractable Planning for Real-World Robotics 9

  10. POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set T: Pr(s’|s,a) = state-to-state transition probabilities O: Pr(z|s,a) = observation generation probabilities st-1 st What goes on: What we see: zt-1 zt at-1 at Tractable Planning for Real-World Robotics 10

  11. POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set T: Pr(s’|s,a) = state-to-state transition probabilities O: Pr(z|s,a) = observation generation probabilities R(s,a) = reward function st-1 st What goes on: rt rt-1 What we see: zt-1 zt at-1 at Tractable Planning for Real-World Robotics 11

  12. POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: S = state set A = action set Z = observation set T: Pr(s’|s,a) = state-to-state transition probabilities O: Pr(z|s,a) = observation generation probabilities R(s,a) = reward function st-1 st What goes on: rt rt-1 What we see: zt-1 zt at-1 at bt-1 bt What we infer: Tractable Planning for Real-World Robotics 12

  13. Examples of robot beliefs robot particles Uniform belief Bi-modal belief Tractable Planning for Real-World Robotics 13

  14. Understanding the belief state • A belief is a probability distribution over states Where Dim(B) = |S|-1 • E.g. Let S={s1, s2} 1 P(s1) 0 Tractable Planning for Real-World Robotics 14

  15. Understanding the belief state • A belief is a probability distribution over states Where Dim(B) = |S|-1 • E.g. Let S={s1, s2, s3} 1 P(s1) 0 P(s2) 1 Tractable Planning for Real-World Robotics 15

  16. Understanding the belief state • A belief is a probability distribution over states Where Dim(B) = |S|-1 • E.g. Let S={s1, s2, s3 , s4} 1 P(s3) P(s1) 0 P(s2) 1 Tractable Planning for Real-World Robotics 16

  17. POMDP solving Objective: Find the sequence of actions that maximizes the expected sum of rewards. Immediate reward Future reward Value function Tractable Planning for Real-World Robotics 17

  18. POMDP value function • Represent V(b) as the upper surface of a set of vectors. • Each vector is a piece of the control policy (= action sequences). • Dim(vector) = number of states. • Modify / add vectors to update value fn (i.e. refine policy). 2 states V(b) P(s1) b Tractable Planning for Real-World Robotics 18

  19. Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length #vectors 0 1 V0(b) P(break-in) b Tractable Planning for Real-World Robotics 19

  20. Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length # vectors 0 1 1 3 Go-to-bed Call-911 V1(b) Investigate P(break-in) b Tractable Planning for Real-World Robotics 20

  21. Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length # vectors 0 1 1 3 2 27 Go-to-bed V2(b) Call-911 Investigate P(break-in) b Tractable Planning for Real-World Robotics 21

  22. Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length # vectors 0 1 1 3 2 27 3 2187 Go-to-bed V3(b) Call-911 Investigate P(break-in) b Tractable Planning for Real-World Robotics 22

  23. Optimal POMDP solving • Simple problem: 2 states, 3 actions, 3 observations Plan length # vectors 0 1 1 3 2 27 3 2187 4 14,348,907 Go-to-bed V4(b) Call-911 Investigate P(break-in) b Tractable Planning for Real-World Robotics 23

  24. The curse of history Policy size grows exponentially with the planning horizon: Where Γ = policy size n = planning horizon A = # actions Z = # observations Tractable Planning for Real-World Robotics 24

  25. How many vectors for this problem? 104 (navigation) x 103 (dialogue) states 1000+ observations 100+ actions Tractable Planning for Real-World Robotics 25

  26. Talk outline • Uncertainty in plan-based robotics • Partially Observable Markov Decision Processes (POMDPs) • POMDP solver #1: Point-based value iteration (PBVI) • POMDP solver #2: Policy-contingent abstraction (PolCA+) Tractable Planning for Real-World Robotics 26

  27. Exact solving assumes all beliefs are equally likely robot particles Uniform belief Bi-modal belief N-modal belief INSIGHT: No sequence of actions and observations can produce this N-modal belief. Tractable Planning for Real-World Robotics 27

  28. A new algorithm: Point-based value iteration Approach: Select a small set of belief points Plan for those belief points only V(b) P(s1) b1 b0 b2 Tractable Planning for Real-World Robotics 28

  29. A new algorithm: Point-based value iteration Approach: Select a small set of belief points  Use well-separated, reachable beliefs Plan for those belief points only V(b) P(s1) b1 b0 b2 a,z a,z Tractable Planning for Real-World Robotics 29

  30. A new algorithm: Point-based value iteration Approach: Select a small set of belief points  Use well-separated, reachable beliefs Plan for those belief points only  Learn value and its gradient V(b) P(s1) b1 b0 b2 a,z a,z Tractable Planning for Real-World Robotics 30

  31. Approach: Select a small set of belief points  Use well-separated, reachable beliefs Plan for those belief points only  Learn value and its gradient Pick action that maximizes value: A new algorithm: Point-based value iteration V(b) P(s1) b1 b0 b2 b Tractable Planning for Real-World Robotics 31

  32. The anytime PBVI algorithm • Alternate between: 1. Growing the set of belief point 2. Planning for those belief points • Terminate when you run out of time or have a good policy. Tractable Planning for Real-World Robotics 32

  33. Complexity of value update Exact UpdatePBVI Time: Projection S2 A Z n S2 A Z n Sum S A nZS A Z n B Size: (# vectors) A nZB where: S = # states n = # vectors at iteration n A = # actions B = # belief pointsZ = # observations Tractable Planning for Real-World Robotics 33

  34. Theoretical properties of PBVI Theorem:For any set of belief points B and planning horizon n, the error of the PBVI algorithm is bounded by: Where  is the set of reachable beliefs B is the set of all beliefs V(b) Err Err P(s1) b1 b0 b2 Tractable Planning for Real-World Robotics 34

  35. The anytime PBVI algorithm • Alternate between: 1. Growing the set of belief point 2. Planning for those belief points • Terminate when you run out of time or have a good policy. Tractable Planning for Real-World Robotics 35

  36. PBVI’s belief selection heuristic 1. Leverage insight from policy search methods: • Focus on reachable beliefs. a1,z1 a2,z1 P(s1) ba1,z1 ba2,z1 ba2,z2 b ba1,z2 a2,z2 a1,z2 Tractable Planning for Real-World Robotics 36

  37. PBVI’s belief selection heuristic 1. Leverage insight from policy search methods: • Focus on reachable beliefs. 2. Focus on high probability beliefs: • Consider all actions, but stochastic observation choice. a2,z1 P(s1) ba1,z1 ba2,z1 ba2,z2 b ba1,z2 a1,z2 Tractable Planning for Real-World Robotics 37

  38. PBVI’s belief selection heuristic 1. Leverage insight from policy search methods: • Focus on reachable beliefs. 2. Focus on high probability beliefs: • Consider all actions, but stochastic observation choice. 3. Use the error bound on point-based value updates: • Select well-separated beliefs, rather than near-by beliefs. a2,z1 P(s1) ba2,z1 b ba1,z2 a1,z2 Tractable Planning for Real-World Robotics 38

  39. Classes of value function approximations 1. No belief [Littman et al., 1995] 3. Compressed belief [Poupart&Boutilier, 2002; Roy&Gordon, 2002] 2. Grid over belief [Lovejoy, 1991; Brafman 1997; Hauskrecht, 2000; Zhou&Hansen, 2001] 4. Sample belief points [Poon, 2001; Pineau et al, 2003] x1 x0 x2 Tractable Planning for Real-World Robotics 39

  40. Performance on well-known POMDPs REWARD Maze2 0.11 - 0.07 0.35 0.34 # Belief points Maze2 - 337 - 1840 95 TIME Maze2 1.44 - 24hrs 27898 360 Method No belief [Littman&al., 1995] Grid [Brafman., 1997] Compressed [Poupart&al., 2003] Sample [Poon, 2001] PBVI [Pineau&al., 2003] Maze1 0.20 0.94 0.00 2.30 2.25 Maze3 0.26 - 0.11 0.53 0.53 Maze1 - 174 - 660 470 Maze3 - - - 300 86 Maze3 0.51 - 24hrs 450 288 Maze1 0.19 - 24hrs 12166 3448 Maze3: 60 states Maze1: 36 states Maze2: 92 states Tractable Planning for Real-World Robotics 40

  41. PBVI in the Nursebot domain Objective: Find the patient! State space = RobotPosition  PatientPosition Observation space = RobotPosition + PatientFound Action space = {North, South, East, West, Declare} Tractable Planning for Real-World Robotics 41

  42. PBVI performance on find-the-patient domain Patient found 17% of trials No Belief PBVI Patient found 90% of trials PBVI No Belief Tractable Planning for Real-World Robotics 42

  43. Validation of PBVI’s belief expansion heuristic Find-the-patient domain 870 states, 5 actions, 30 observations PBVI Greedy Random No Belief Tractable Planning for Real-World Robotics 43

  44. Policy assuming full observability You find some: (25%) You loose some: (75%) Tractable Planning for Real-World Robotics 44

  45. PBVI Policy with 3141 belief points You find some: (81%) You loose some: (19%) Tractable Planning for Real-World Robotics 45

  46. PBVI Policy with 643 belief points You find some: (22%) You loose some: (78%) Tractable Planning for Real-World Robotics 46

  47. Highlights of the PBVI algorithm [ Pineau, Gordon & Thrun, IJCAI 2003. Pineau, Gordon & Thrun, NIPS 2003. ] • Algorithmic: • New belief sampling algorithm. • Efficient heuristic for belief point selection. • Anytime performance. • Experimental: • Outperforms previous value approximation algorithms on known problems. • Solves new larger problem (1 order of magnitude increase in problem size). • Theoretical: • Bounded approximation error. Tractable Planning for Real-World Robotics 47

  48. Back to the grand challenge How can we go from 103 states to real-world problems? Tractable Planning for Real-World Robotics 48

  49. Talk outline • Uncertainty in plan-based robotics • Partially Observable Markov Decision Processes (POMDPs) • POMDP solver #1: Point-based value iteration (PBVI) • POMDP solver #2: Policy-contingent abstraction (PolCA+) Tractable Planning for Real-World Robotics 49

  50. Structured POMDPs  Many real-world decision-making problems exhibit structure inherent to the problem domain. High-level controller Navigation Cognitive support Social interaction Move AskWhere Left Right Forward Backward Tractable Planning for Real-World Robotics 50

More Related