1 / 61

Hierarchical Methods for Planning under Uncertainty

Hierarchical Methods for Planning under Uncertainty. Thesis Proposal Joelle Pineau Thesis Committee: Sebastian Thrun, Chair Matthew Mason Andrew Moore Craig Boutilier, U. of Toronto. Integrating robots in living environments. The robot’s role: - Social interaction

lupita
Download Presentation

Hierarchical Methods for Planning under Uncertainty

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hierarchical Methods forPlanning under Uncertainty Thesis Proposal Joelle Pineau Thesis Committee: Sebastian Thrun, Chair Matthew Mason Andrew Moore Craig Boutilier, U. of Toronto

  2. Integrating robots in living environments The robot’s role: - Social interaction - Mobile manipulation - Intelligent reminding - Remote-operation - Data collection / monitoring Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  3. A broad perspective Belief state OBSERVATIONS STATE USER + WORLD + ROBOT ACTIONS GOAL = Selecting appropriate actions Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  4. Why is this a difficult problem? UNCERTAINTY Cause #1: Non-deterministic effects of actions Cause #2: Partial and noisy sensor information Cause #3: Inaccurate model of the world and the user Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  5. S2 o1, o2 a1 S1 o1, o2 a2 S3 o1, o2 Why is this a difficult problem? UNCERTAINTY Cause #1: Non-deterministic effects of actions Cause #2: Partial and noisy sensor information Cause #3: Inaccurate model of the world and the user A solution: Partially Observable Markov Decision Processes (POMDPs) Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  6. The truth about POMDPs • Bad news: • Finding an optimal POMDP action selection policy is computationally intractable for complex problems. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  7. The truth about POMDPs • Bad news: • Finding an optimal POMDP action selection policy is computationally intractable for complex problems. • Good news: • Many real-world decision-making problems exhibit structure inherent to the problem domain. • By leveraging structure in the problem domain, I propose an algorithm that makes POMDPs tractable, even for large domains. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  8. Dialogue manager Health manager Reminding manager Social manager How is it done? • Use a “Divide-and-conquer” approach: • We decompose a large monolithic problem into a collection of loosely-related smaller problems. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  9. Thesis statement Decision-making under uncertainty can be made tractable for complex problems by exploiting hierarchical structure in the problem domain. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  10. Outline • Problem motivation • Partially observable Markov decision processes • The hierarchical POMDP algorithm • Proposed research Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  11. Uncertainty in sensor input? no yes Markov Chain no Hidden Markov Model (HMM) Control problem? yes Markov Decision Process (MDP) Partially Observable MDP (POMDP) POMDPs within the family of Markov models Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  12. What are POMDPs? S2 Components: Set of states: sS Set of actions: aA Set of observations: oO 0.5 Pr(o1)=0.9 Pr(o2)=0.1 0.5 S1 a1 Pr(o1)=0.5 Pr(o2)=0.5 S3 a2 1 Pr(o1)=0.2 Pr(o2)=0.8 POMDP parameters: Initial belief: b0(s)=Pr(so=s) Observation probabilities: O(s,a,o)=Pr(o|s,a) Transition probabilities: T(s,a,s’)=Pr(s’|s,a) Rewards: R(s,a) HMM MDP Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  13. S1 “tiger-left” Pr(o=growl-left)=0.85 Pr(o=growl-right)=0.15 S2 “tiger-right” Pr(o=growl-left)=0.15 Pr(o=growl-right)=0.85 Actions={ listen, open-left, open-right} A POMDP example: The tiger problem Reward Function: R(a=listen) = -1 R(a=open-right, s=tiger-left) = 10 R(a=open-left, s=tiger-left) = -100 Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  14. What can we do with POMDPs? 1) State tracking: • After an action, what is the state of the world, st ? 2) Computing a policy: • Which action, aj, should the controller apply next? Not so hard. Very hard! St-1 st ... World: at-1 Control layer: ot ?? bt-1 ?? ... Robot: Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  15. The tiger problem: State tracking b0 Belief vector S1 “tiger-left” S2 “tiger-right” Belief Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  16. The tiger problem: State tracking b0 Belief vector S1 “tiger-left” S2 “tiger-right” Belief obs=growl-left action=listen Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  17. The tiger problem: State tracking b1 b0 Belief vector S1 “tiger-left” S2 “tiger-right” Belief obs=growl-left action=listen Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  18. Policy Optimization • Which action, aj, should the controller apply next? • In MDPs: • Policy is a mapping from state to action, : si  aj • In POMDPs: • Policy is a mapping from belief to action, : b  aj • Recursively calculate expected long-term reward for each state/belief: • Find the action that maximizes the expected reward: Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  19. The tiger problem: Optimal policy open-right listen open-left Optimal policy: Belief vector: S1 “tiger-left” S2 “tiger-right” Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  20. Complexity of policy optimization • Finite-horizon POMDPs are in worse-case doubly exponential: • Infinite-horizon undiscounted stochastic POMDPs are EXPTIME-hard, and may not be decidable (|n|). Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  21. The essence of the problem • How can we find good policies for complex POMDPs? • Is there a principled way to provide near-optimal policies in reasonable time? Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  22. Outline • Problem motivation • Partially observable Markov decision processes • The hierarchical POMDP algorithm • Proposed research Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  23. Act InvestigateHealth Move Navigate AskWhere CheckPulse CheckMeds Left Right Forward Backward A hierarchical approach to POMDP planning • Key Idea: Exploit hierarchical structure in the problem domain to break a problem into many “related” POMDPs. • What type of structure? Action set partitioning subtask abstract action Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  24. Assumptions • Each POMDP controller has a subset of Ao. • Each POMDP controller has full state set S0, observation set O0. • Each controller includes discriminative reward information. • We are given the action set partitioning graph. • We are given a full POMDP model of the problem: {So,Ao,Oo,Mo}. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  25. The tiger problem: An action hierarchy act open-left investigate listen open-right Pinvestigate={S0, Ainvestigate, O0, Minvestigate} Ainvestigate={listen, open-right} Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  26. Optimizing the “investigate” controller open-right listen Locally optimal policy: Belief vector: S1 “tiger-left” S2 “tiger-right” Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  27. The tiger problem: An action hierarchy Pact={S0, Aact, O0, Mact} Aact={open-left, investigate} act But... R(s, a=investigate) is not defined! open-left investigate listen open-right Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  28. Modeling abstract actions Insight: Use the local policy of corresponding low-level controller. General form: R( si, ak) = R ( si, Policy(controllerk,si) ) Example: R(s=tiger-left,ak =investigate) = Policy(investigate,s=tiger-left) = open-right open-right listen open-left tiger-left 10 -1 -100 tiger-right -100 -1 10 Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  29. Optimizing the “act” controller investigate open-left Locally optimal policy: Belief vector: S1 “tiger-left” S2 “tiger-right” Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  30. The complete hierarchical policy open-right listen open-left Hierarchical policy: Belief vector: S1 “tiger-left” S2 “tiger-right” Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  31. The complete hierarchical policy Optimal policy: open-right listen open-left Hierarchical policy: Belief vector: S1 “tiger-left” S2 “tiger-right” Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  32. Results for larger simulation domains Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  33. Related work on hierarchical methods • Hierarchical HMMs • Fine et al., 1998 • Hierarchical MDPs • Dayan&Hinton, 1993; Dietterich, 1998; McGovern et al., 1998; Parr&Russell, 1998; Singh, 1992. • Loosely-coupled MDPs • Boutilier et al., 1997; Dean&Lin, 1995; Meuleau et al. 1998; Singh&Cohn, 1998; Wang&Mahadevan, 1999. • Factored state POMDPs • Boutilier et al., 1999; Boutilier&Poole, 1996; Hansen&Feng, 2000. • Hierarchical POMDPs • Castanon, 1997; Hernandez-Gardiol&Mahadevan, 2001; Theocharous et al., 2001; Wiering&Schmidhuber, 1997. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  34. Outline • Problem motivation • Partially observable Markov decision processes • The hierarchical POMDP algorithm • Proposed research Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  35. Proposed research 1) Algorithmic design 2) Algorithmic analysis 3) Model learning 4) System development and application Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  36. Research block #1: Algorithmic design • Goal 1.1: Developing/implementing hierarchical POMDP algorithm. • Goal 1.2: Extending H-POMDP for factorized state representation. • Goal 1.3: Using state/observation abstraction. • Goal 1.4: Planning for controllers with no local reward information. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  37. Goal 1.3: State/observation abstraction • Assumption #2: “Each POMDP controller has full state set S0, and observation set O0.” • Can we reduce the number of states/observations, |S| and |O|? Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  38. InvestigateHealth POMDP recursive upper-bound Time complexity: Navigate Forward Backward Left Right CheckMeds CheckPulse Goal 1.3: State/observation abstraction • Assumption #2: “Each POMDP controller has full state set S0, and observation set O0.” • Can we reduce the number of states/observations, |S| and |O|? Yes! Each controller only needs subset of state/observation features. • What is the computational speed-up? Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  39. Goal 1.4: Local controller reward information • Assumption #3: “Each controller includes some amount of discriminative reward information.” • Can we relax this assumption? Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  40. Goal 1.4: Local controller reward information • Assumption #3: “Each controller includes some amount of discriminative reward information.” • Can we relax this assumption? Possibly. Use reward shaping to select policy-invariant reward function. • What is the benefit? • H-POMDP could solve problems with sparse reward functions. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  41. Research block #2: Algorithmic analysis • Goal 2.1: Evaluating performance of the H-POMDP algorithm. • Goal 2.2: Quantifying the loss due to the hierarchy. • Goal 2.3: Comparing different possible decompositions of a problem. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  42. Goal 2.1: Performance evaluation • How does the hierarchical POMDP algorithm compare to: • Exact value function methods • Sondik, 1971; Monahan, 1982; Littman, 1996; Cassandra et al, 1997. • Policy search methods • Hansen, 1998; Kearns et al., 1999; Ng&Jordan, 2000; Baxter&Bartlett, 2000. • Value approximation methods • Parr&Russell, 1995; Thrun, 2000. • Belief approximation methods • Nourbakhsh, 1995; Koenig&Simmons, 1996; Hauskrecht, 2000; Roy&Thrun, 2000. • Memory-based methods • McCallum, 1996. • Consider problems from POMDP literature and dialogue management domain. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  43. Atop ... A1 ... Goal 2.2: Quantifying the loss • The hierarchical POMDP planning algorithm provides an approximately-optimal policy. • How “near-optimal” is the policy? • Subject to some (very restrictive) conditions: “The value function of top-level controller is an upper-bound on the value of the approximation.” • Can we loosen the restrictions? Tighten the bound? Find a lower-bound? Vtop(b)Vactual(b) Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  44. a1 a2 a1 a2 a3 Goal 2.3: Comparing different decomposition • Assumption #4: “We are given an action set partitioning graph.” • What makes a good hierarchical action decomposition? • Comparing decompositions is the first step towards automatic decomposition. Replace Manufacture Examine Inspect Manufacture Replace Examine Inspect Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  45. Research block #3: Model learning • Goal 3.1: Automatically generating good action hierarchies. • Assumption #4: “We are given an action set partitioning graph.” • Can we automatically generate a good hierarchical decomposition? • Maybe.It is being done for hierarchical MDPs. • Goal 3.2: Including parameter learning. • Assumption #5: “We are given a full POMDP model of the problem.” • Can we introduce parameter learning? • Yes! Maximum-likelihood parameter optimization (Baum-Welch) can be used for POMDPs. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  46. Research block #4: System development and application • Goal 4.1: Building an extensive dialogue manager Remote-control command Facemail operations Teleoperation module Reminder message Status information Reminding module Touchscreen input Speech utterance Touchscreen message Speech utterance User Robot sensor readings Motion command Robot module Dialogue Manager Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  47. An implemented scenario Problem size: |S|=288, |A|=14, |O|=15 State Features: {RobotLocation, UserLocation, UserStatus, ReminderGoal, UserMotionGoal, UserSpeechGoal} Patient room Robot home Physiotherapy Test subjects: 3 elderly residents in assisted living facility Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  48. Contributions • Algorithmic contribution: A novel POMDP algorithm based on hierarchical structure. • Enables use of POMDPs for much larger problems. • Application contribution: Application of POMDPs to dialogue management is novel. • Allows design of robust robot behavioural managers. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  49. Research schedule fall 01 spring/summer 02 spring/summer/fall 02 ongoing fall 02 / spring 03 1) Algorithmic design/implementation 2) Algorithmic analysis 3) Model learning 4) System development and application 5) Thesis writing Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

  50. Questions? Thesis Proposal: Hierarchical Methods for Planning under Uncertainty

More Related