1 / 13

What Are Partially Observable Markov Decision Processes

What Are Partially Observable Markov Decision Processes. and Why Might You Care? Bob Wall CS 536. POMDPs.

Download Presentation

What Are Partially Observable Markov Decision Processes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536

  2. POMDPs • A special case of the Markov Decision Process (MDP). In an MDP, the environ-ment is fully observable, and with the Markov assumption for the transition model, the optimal policy depends only on the current state. • For POMDPs, the environment is only partially observable

  3. POMDP Implications • Since current state is not necessarily known, agent cannot execute the optimal policy for the state. • A POMDP is defined by the following: • Set of states S, set of actions A, set of observations O • Transition model T(s, a, s’) • Reward model R(s) • Observation model O(s, o) – probability of observing observation s in state o.

  4. POMDP Implications (cont.) • Optimal action depends not on current state but on agent’s current belief state. • Belief state is a probability distribution over all possible states • Given a belief state, if agent does an action a and perceives observation o, new belief state is • b’(s’) = α O(s’, o) Σ T(s, a, s’) b(s) • Optimal policy π*(s) maps from belief states to actions

  5. POMDP Solutions • Solving POMDP on a physical state space is equi-valent to solving an MDP on the belief state space • However, state space is continuous and very high-dimensional, so solutions are difficult to compute. • Even finding approximately optimal solutions is PSPACE-hard (i.e. really hard)

  6. Why Study POMDPs? • In spite of the difficulties, POMDPs are still very important. • Many real-world problems and situations are not fully observable, but the Markov assumption is often valid. • Active area of research • Google search on “POMDP” returns ~5000 results • A number of current papers on the topic

  7. Some Solution Techniques • Most exact solution algorithms (value iteration, policy iteration ) use dynamic programming techniques • These techniques transform from one value function (the transition model in physical space, which is piecewise linear and convex - PWLC) to another that can be used in an MDP solution technique • Dynamic programming algorithms: one-pass (1971), exhaustive (1982), linear support (1988), witness (1996) • Better method – incremental pruning (1996)

  8. POMDPs at Work • Pattern Recognition tasks • SA-POMDP (Single-action POMDP) – only decision is whether to change state or not • Model constructed to recognize words within text to which noise was added – i.e. individual letters within the words were • SA-POMDP outperformed a pattern recognizer based on Hidden Markov Models, and exhibited better immunity to noise

  9. POMDPs at Work (cont.) • Robotics • Mission planning • Robot Navigation • POMDP used to control the movement of an autonomous robot within a crowded environment • Used to predict the motion of other objects within the robot’s environment • Decompose state space into hierarchy, so individual POMDPs have a computationally tractable task

  10. POMDPs at Work (cont.) • BATmobile – the Bayesian Autonomous Taxi • Many different tasks make use of a number of AI techniques • POMDPs used for the actual driving control (as opposed to higher level trip planning) • To efficiently compute, uses approximation techniques

  11. BAT (cont.) • Several different techniques combined: • Dynamic Probabilistic Network (DPN) to maintain current belief state • Dynamic Decision Network (DDN) to perform bounded lookahead • Hand-coded explicit policy representations – i.e. decision trees • Supervised / reinforcement learning techniques to learn policy decisions

  12. BAT (cont.) • The BAT has been constructed in a simulation environment and has been demonstrated to successfully handle a variety of driving problems, such as passing slower vehicles, reacting to unsafe drivers, avoiding stalled vehicles, and merging into traffic.

  13. Resources • Tutorial on POMDPs: • http://www.cs.brown.edu/research/ai/pomdp/tutorial/index.html • Additional pointers to articles on my web site: • http://www.cs.montana.edu/~bwall/cs536

More Related