700 likes | 931 Views
Partially Observable Markov Decision Process (Chapter 15 & 16). José Luis Peralta. Contents. POMDP Example POMDP Finite World POMDP algorithm Practical Considerations Approximate POMDP Techniques. Partially Observable Markov Decision Processes (POMDP). POMDP:
E N D
Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta
Contents • POMDP • Example POMDP • Finite World POMDP algorithm • Practical Considerations • Approximate POMDP Techniques
Partially Observable Markov Decision Processes(POMDP) • POMDP: • Uncertainty in Measurements State • Uncertainty in Control Effects • Adapt previous Value Iteration Algorithm (VI-VIA)
Partially Observable Markov Decision Processes(POMDP) • POMDP: • World can't be sensed directly • Measurements: incomplete, noisy, etc. • Partial Observability • Robot has to estimate a posterior distribution over a possible world state.
Partially Observable Markov Decision Processes(POMDP) • POMDP: • Algorithm to find optimal control policy exit for FINITE WORLD: • State space • Action space • Space of observation • Planning horizon • Computation is complex • For continuous case there are approximations All Finite
Partially Observable Markov Decision Processes(POMDP) • The algorithm we are going to study all based in Value Iteration (VI). with • The same as previous but is not observable • Robot has to make decision in the BELIEF STATE • Robot’s internal knowledge about the state of the environment • Space of posteriori distribution over state
Partially Observable Markov Decision Processes(POMDP) • So with • Control Policy
Partially Observable Markov Decision Processes(POMDP) • Belief bel • Each value in POMDP is function of entire probability distribution • Problems: • State Space finite Belief Space continuous • State Space continuous Belief Space infinitely-dimensional continuum • Also complexity in calculate the Value Function Because of the integral over all the distribution
Partially Observable Markov Decision Processes(POMDP) • At the end optimal solution exist for Interesting Special Case of Finite World: • state space; action space; space of observations; planning horizon All finite • Solution of VF are Piecewise Linear Function over the belief space • The previous arrive because • Expectation is a linear operation • Ability to select different controls in different parts
Example POMDP 2 States: 3 Control Actions:
Example POMDP When execute payoff: Dilemma opposite payoff in each state knowledge of the state translate directly into payoff
Example POMDP To acquire knowledge robot has control (Cost of waiting, cost of sensing, etc.) affects the state of the world in non-deterministic manner:
Example POMDP • Benefit Before each control decision, the robot can sense. By sensing robot gains knowledge about the state • Make better control decisions • High payoff expectation • In the case of control action , robot sense without terminal action
Example POMDP • The measurement model is governed by the following probability distribution:
Example POMDP This example is easy to graph over the belief space (2 states) • Belief state
Example POMDP • Control Policy • Function that maps the unit interval [0;1] to space of all actions Example
Example POMDP – Control Choice • Control Choice (When to execute what control?) • First consider the immediate payoff . • Payoff now is a function of belief state So for , the expected payoff Payoff in POMDPs
Example POMDP – Control Choice • First we calculate • the robot simply selects the action of highest expected payoff Piecewise Linear convex Function Maximum of individual payoff function
Example POMDP – Control Choice • First we calculate • the robot simply selects the action of highest expected payoff Piecewise Linear convex Function Maximum of individual payoff function
Example POMDP – Control Choice • First we calculate • the robot simply selects the action of highest expected payoff Transition occurs when in Optimal Policy
Example POMDP - Sensing • Now we have perception • What if the robot can sense before it chooses control? • How it affects the optimal Value Function Sensing info about State enable choose better control action In previous example Expected payoff How better will this be after sensing?
Example POMDP – Control Choice Belief after sensing as a function of the belief before sensing Given by Bayes Rule Finally
Example POMDP – Control Choice How this affects the Value Function?
Example POMDP – Control Choice Mathematically That is just replacing by in the Value Function
Example POMDP – Control Choice However our interest is the complete Expected Value Function after sensing, that consider also the probability of sensing the other measurement. This is given by:
Example POMDP – Control Choice An this results in
Example POMDP – Control Choice Mathematically
Example POMDP - Prediction To plan at a horizon larger than we have to take this into consideration and project our value function accordingly According to our transition probability model If If In between the expectation is linear
Example POMDP – Prediction An this results in
Example POMDP – Prediction And adding and we have:
Example POMDP – Prediction Mathematically cost Fix!!
Example POMDP – Pruning Full backup : Impractical!!! Efficient approximate POMDP needed
Finite World POMDP algorithm To understand this read Mathematical Derivation of POMDPs pp.531-536 in [1]
Finite World POMDP algorithm To understand this read Mathematical Derivation of POMDPs pp.531-536 in [1]
Example POMDP – Practical Considerations It looks easy let’s try something more “real”… Probabilistic Robot “RoboProb”
Example POMDP – Practical Considerations It looks easy let’s try something more “real”… Probabilistic Robot “RoboProb” 11 States: 0.8 5 Control Actions: 0.1 0.1 Sense without moving Transition Model
Example POMDP – Practical Considerations It looks easy let’s try something more “real”… Probabilistic Robot “RoboProb” “Reward” Payoff The same set for all control action Example
Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Transition Probability Example 0.8 0.1 0.1
Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Transition Probability Example
Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Measurement Probability
Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Belief States Impossible to graph!!
Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Each linear function results from executing control , followed by observing measurement , and then executing control .
Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Defining Measurement Probability Defining “Reward” Payoff Defining Transition Probability Merging Transition (Control) Probability
Example POMDP – Practical Considerations It’s getting kind of hard :S… Probabilistic Robot “RoboProb” Setting Beliefs Executing Sensing Executing
Example POMDP – Practical Considerations Now What…? Probabilistic Robot “RoboProb” Calculating The real problem is to compute
Example POMDP – Practical Considerations The real problem is to compute Key factor in this update is the conditional probability This probability specifies a distribution over probability distributions. Given a belief and a control action , the outcome is a distribution over distributions. Because belief is also based on the next measurement, the measurement itself is generated stochastically.
Example POMDP – Practical Considerations The real problem is to compute So we make Contain only on non-zero term =