610 likes | 718 Views
Hierarchical Methods for Planning under Uncertainty. Thesis Proposal Joelle Pineau Thesis Committee: Sebastian Thrun, Chair Matthew Mason Andrew Moore Craig Boutilier, U. of Toronto. Integrating robots in living environments. The robot’s role: - Social interaction
E N D
Hierarchical Methods forPlanning under Uncertainty Thesis Proposal Joelle Pineau Thesis Committee: Sebastian Thrun, Chair Matthew Mason Andrew Moore Craig Boutilier, U. of Toronto
Integrating robots in living environments The robot’s role: - Social interaction - Mobile manipulation - Intelligent reminding - Remote-operation - Data collection / monitoring Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
A broad perspective Belief state OBSERVATIONS STATE USER + WORLD + ROBOT ACTIONS GOAL = Selecting appropriate actions Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Why is this a difficult problem? UNCERTAINTY Cause #1: Non-deterministic effects of actions Cause #2: Partial and noisy sensor information Cause #3: Inaccurate model of the world and the user Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
S2 o1, o2 a1 S1 o1, o2 a2 S3 o1, o2 Why is this a difficult problem? UNCERTAINTY Cause #1: Non-deterministic effects of actions Cause #2: Partial and noisy sensor information Cause #3: Inaccurate model of the world and the user A solution: Partially Observable Markov Decision Processes (POMDPs) Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
The truth about POMDPs • Bad news: • Finding an optimal POMDP action selection policy is computationally intractable for complex problems. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
The truth about POMDPs • Bad news: • Finding an optimal POMDP action selection policy is computationally intractable for complex problems. • Good news: • Many real-world decision-making problems exhibit structure inherent to the problem domain. • By leveraging structure in the problem domain, I propose an algorithm that makes POMDPs tractable, even for large domains. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Dialogue manager Health manager Reminding manager Social manager How is it done? • Use a “Divide-and-conquer” approach: • We decompose a large monolithic problem into a collection of loosely-related smaller problems. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Thesis statement Decision-making under uncertainty can be made tractable for complex problems by exploiting hierarchical structure in the problem domain. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Outline • Problem motivation • Partially observable Markov decision processes • The hierarchical POMDP algorithm • Proposed research Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Uncertainty in sensor input? no yes Markov Chain no Hidden Markov Model (HMM) Control problem? yes Markov Decision Process (MDP) Partially Observable MDP (POMDP) POMDPs within the family of Markov models Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
What are POMDPs? S2 Components: Set of states: sS Set of actions: aA Set of observations: oO 0.5 Pr(o1)=0.9 Pr(o2)=0.1 0.5 S1 a1 Pr(o1)=0.5 Pr(o2)=0.5 S3 a2 1 Pr(o1)=0.2 Pr(o2)=0.8 POMDP parameters: Initial belief: b0(s)=Pr(so=s) Observation probabilities: O(s,a,o)=Pr(o|s,a) Transition probabilities: T(s,a,s’)=Pr(s’|s,a) Rewards: R(s,a) HMM MDP Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
S1 “tiger-left” Pr(o=growl-left)=0.85 Pr(o=growl-right)=0.15 S2 “tiger-right” Pr(o=growl-left)=0.15 Pr(o=growl-right)=0.85 Actions={ listen, open-left, open-right} A POMDP example: The tiger problem Reward Function: R(a=listen) = -1 R(a=open-right, s=tiger-left) = 10 R(a=open-left, s=tiger-left) = -100 Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
What can we do with POMDPs? 1) State tracking: • After an action, what is the state of the world, st ? 2) Computing a policy: • Which action, aj, should the controller apply next? Not so hard. Very hard! St-1 st ... World: at-1 Control layer: ot ?? bt-1 ?? ... Robot: Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
The tiger problem: State tracking b0 Belief vector S1 “tiger-left” S2 “tiger-right” Belief Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
The tiger problem: State tracking b0 Belief vector S1 “tiger-left” S2 “tiger-right” Belief obs=growl-left action=listen Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
The tiger problem: State tracking b1 b0 Belief vector S1 “tiger-left” S2 “tiger-right” Belief obs=growl-left action=listen Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Policy Optimization • Which action, aj, should the controller apply next? • In MDPs: • Policy is a mapping from state to action, : si aj • In POMDPs: • Policy is a mapping from belief to action, : b aj • Recursively calculate expected long-term reward for each state/belief: • Find the action that maximizes the expected reward: Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
The tiger problem: Optimal policy open-right listen open-left Optimal policy: Belief vector: S1 “tiger-left” S2 “tiger-right” Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Complexity of policy optimization • Finite-horizon POMDPs are in worse-case doubly exponential: • Infinite-horizon undiscounted stochastic POMDPs are EXPTIME-hard, and may not be decidable (|n|). Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
The essence of the problem • How can we find good policies for complex POMDPs? • Is there a principled way to provide near-optimal policies in reasonable time? Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Outline • Problem motivation • Partially observable Markov decision processes • The hierarchical POMDP algorithm • Proposed research Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Act InvestigateHealth Move Navigate AskWhere CheckPulse CheckMeds Left Right Forward Backward A hierarchical approach to POMDP planning • Key Idea: Exploit hierarchical structure in the problem domain to break a problem into many “related” POMDPs. • What type of structure? Action set partitioning subtask abstract action Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Assumptions • Each POMDP controller has a subset of Ao. • Each POMDP controller has full state set S0, observation set O0. • Each controller includes discriminative reward information. • We are given the action set partitioning graph. • We are given a full POMDP model of the problem: {So,Ao,Oo,Mo}. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
The tiger problem: An action hierarchy act open-left investigate listen open-right Pinvestigate={S0, Ainvestigate, O0, Minvestigate} Ainvestigate={listen, open-right} Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Optimizing the “investigate” controller open-right listen Locally optimal policy: Belief vector: S1 “tiger-left” S2 “tiger-right” Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
The tiger problem: An action hierarchy Pact={S0, Aact, O0, Mact} Aact={open-left, investigate} act But... R(s, a=investigate) is not defined! open-left investigate listen open-right Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Modeling abstract actions Insight: Use the local policy of corresponding low-level controller. General form: R( si, ak) = R ( si, Policy(controllerk,si) ) Example: R(s=tiger-left,ak =investigate) = Policy(investigate,s=tiger-left) = open-right open-right listen open-left tiger-left 10 -1 -100 tiger-right -100 -1 10 Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Optimizing the “act” controller investigate open-left Locally optimal policy: Belief vector: S1 “tiger-left” S2 “tiger-right” Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
The complete hierarchical policy open-right listen open-left Hierarchical policy: Belief vector: S1 “tiger-left” S2 “tiger-right” Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
The complete hierarchical policy Optimal policy: open-right listen open-left Hierarchical policy: Belief vector: S1 “tiger-left” S2 “tiger-right” Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Results for larger simulation domains Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Related work on hierarchical methods • Hierarchical HMMs • Fine et al., 1998 • Hierarchical MDPs • Dayan&Hinton, 1993; Dietterich, 1998; McGovern et al., 1998; Parr&Russell, 1998; Singh, 1992. • Loosely-coupled MDPs • Boutilier et al., 1997; Dean&Lin, 1995; Meuleau et al. 1998; Singh&Cohn, 1998; Wang&Mahadevan, 1999. • Factored state POMDPs • Boutilier et al., 1999; Boutilier&Poole, 1996; Hansen&Feng, 2000. • Hierarchical POMDPs • Castanon, 1997; Hernandez-Gardiol&Mahadevan, 2001; Theocharous et al., 2001; Wiering&Schmidhuber, 1997. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Outline • Problem motivation • Partially observable Markov decision processes • The hierarchical POMDP algorithm • Proposed research Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Proposed research 1) Algorithmic design 2) Algorithmic analysis 3) Model learning 4) System development and application Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Research block #1: Algorithmic design • Goal 1.1: Developing/implementing hierarchical POMDP algorithm. • Goal 1.2: Extending H-POMDP for factorized state representation. • Goal 1.3: Using state/observation abstraction. • Goal 1.4: Planning for controllers with no local reward information. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Goal 1.3: State/observation abstraction • Assumption #2: “Each POMDP controller has full state set S0, and observation set O0.” • Can we reduce the number of states/observations, |S| and |O|? Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
InvestigateHealth POMDP recursive upper-bound Time complexity: Navigate Forward Backward Left Right CheckMeds CheckPulse Goal 1.3: State/observation abstraction • Assumption #2: “Each POMDP controller has full state set S0, and observation set O0.” • Can we reduce the number of states/observations, |S| and |O|? Yes! Each controller only needs subset of state/observation features. • What is the computational speed-up? Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Goal 1.4: Local controller reward information • Assumption #3: “Each controller includes some amount of discriminative reward information.” • Can we relax this assumption? Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Goal 1.4: Local controller reward information • Assumption #3: “Each controller includes some amount of discriminative reward information.” • Can we relax this assumption? Possibly. Use reward shaping to select policy-invariant reward function. • What is the benefit? • H-POMDP could solve problems with sparse reward functions. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Research block #2: Algorithmic analysis • Goal 2.1: Evaluating performance of the H-POMDP algorithm. • Goal 2.2: Quantifying the loss due to the hierarchy. • Goal 2.3: Comparing different possible decompositions of a problem. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Goal 2.1: Performance evaluation • How does the hierarchical POMDP algorithm compare to: • Exact value function methods • Sondik, 1971; Monahan, 1982; Littman, 1996; Cassandra et al, 1997. • Policy search methods • Hansen, 1998; Kearns et al., 1999; Ng&Jordan, 2000; Baxter&Bartlett, 2000. • Value approximation methods • Parr&Russell, 1995; Thrun, 2000. • Belief approximation methods • Nourbakhsh, 1995; Koenig&Simmons, 1996; Hauskrecht, 2000; Roy&Thrun, 2000. • Memory-based methods • McCallum, 1996. • Consider problems from POMDP literature and dialogue management domain. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Atop ... A1 ... Goal 2.2: Quantifying the loss • The hierarchical POMDP planning algorithm provides an approximately-optimal policy. • How “near-optimal” is the policy? • Subject to some (very restrictive) conditions: “The value function of top-level controller is an upper-bound on the value of the approximation.” • Can we loosen the restrictions? Tighten the bound? Find a lower-bound? Vtop(b)Vactual(b) Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
a1 a2 a1 a2 a3 Goal 2.3: Comparing different decomposition • Assumption #4: “We are given an action set partitioning graph.” • What makes a good hierarchical action decomposition? • Comparing decompositions is the first step towards automatic decomposition. Replace Manufacture Examine Inspect Manufacture Replace Examine Inspect Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Research block #3: Model learning • Goal 3.1: Automatically generating good action hierarchies. • Assumption #4: “We are given an action set partitioning graph.” • Can we automatically generate a good hierarchical decomposition? • Maybe.It is being done for hierarchical MDPs. • Goal 3.2: Including parameter learning. • Assumption #5: “We are given a full POMDP model of the problem.” • Can we introduce parameter learning? • Yes! Maximum-likelihood parameter optimization (Baum-Welch) can be used for POMDPs. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Research block #4: System development and application • Goal 4.1: Building an extensive dialogue manager Remote-control command Facemail operations Teleoperation module Reminder message Status information Reminding module Touchscreen input Speech utterance Touchscreen message Speech utterance User Robot sensor readings Motion command Robot module Dialogue Manager Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
An implemented scenario Problem size: |S|=288, |A|=14, |O|=15 State Features: {RobotLocation, UserLocation, UserStatus, ReminderGoal, UserMotionGoal, UserSpeechGoal} Patient room Robot home Physiotherapy Test subjects: 3 elderly residents in assisted living facility Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Contributions • Algorithmic contribution: A novel POMDP algorithm based on hierarchical structure. • Enables use of POMDPs for much larger problems. • Application contribution: Application of POMDPs to dialogue management is novel. • Allows design of robust robot behavioural managers. Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Research schedule fall 01 spring/summer 02 spring/summer/fall 02 ongoing fall 02 / spring 03 1) Algorithmic design/implementation 2) Algorithmic analysis 3) Model learning 4) System development and application 5) Thesis writing Thesis Proposal: Hierarchical Methods for Planning under Uncertainty
Questions? Thesis Proposal: Hierarchical Methods for Planning under Uncertainty