1 / 26

Classical Situation

Classical Situation. heaven. hell. World deterministic State observable. MDP-Style Planning. Policy Universal Plan Navigation function. [Koditschek 87, Barto et al. 89]. heaven. hell. World stochastic State observable. Stochastic, Partially Observable. heaven?. hell?. sign.

saxton
Download Presentation

Classical Situation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classical Situation heaven hell • World deterministic • State observable

  2. MDP-Style Planning • Policy • Universal Plan • Navigation function [Koditschek 87, Barto et al. 89] heaven hell • World stochastic • State observable

  3. Stochastic, Partially Observable heaven? hell? sign [Sondik 72] [Littman/Cassandra/Kaelbling 97]

  4. Stochastic, Partially Observable heaven hell hell heaven sign sign

  5. Stochastic, Partially Observable start 50% 50% heaven hell ? ? hell heaven sign sign sign

  6. Robot Planning Frameworks

  7. MDP-Style Planning • Policy • Universal Plan • Navigation function [Koditschek 87, Barto et al. 89] heaven hell • World stochastic • State observable

  8. Markov Decision Process (discrete) r=1 0.1 s2 0.9 0.7 0.1 0.3 0.99 r=0 s3 0.3 s1 r=20 0.3 0.4 0.2 s5 s4 r=0 0.8 r=-10 [Bellman 57] [Howard 60] [Sutton/Barto 98]

  9. Value Iteration • Value function of policy p • Bellman equation for optimal value function • Value iteration: recursively estimating value function • Greedy policy: [Bellman 57] [Howard 60] [Sutton/Barto 98]

  10. Value Iteration for Motion Planning(assumes knowledge of robot’s location)

  11. Continuous Environments From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

  12. Approximate Cell Decomposition [Latombe 91] From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

  13. Parti-Game [Moore 96] From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

  14. Robot Planning Frameworks

  15. Stochastic, Partially Observable ? ? heaven hell ? ? hell heaven start start sign sign sign sign 50% 50%

  16. A Quiz 3 perfect deterministic 3 perfect stochastic 3 abstract states deterministic 2-dim continuous*: p(S=s1), p(S=s2) 3 stochastic deterministic 3 none stochastic -dim continuous 1-dim continuous 1-dim continuous stochastic stochastic stochastic stochastic deterministic stochastic *) countable, but for all practical purposes # states sensors actions size belief space? 3: s1, s2, s3 3: s1, s2, s3 23-1: s1, s2, s3,s12,s13,s23,s123 2-dim continuous*: p(S=s1), p(S=s2) -dim continuous* -dim continuous* aargh!

  17. Introduction to POMDPs (1 of 3) action b 100 -100 100 -40 80 0 a b a b action a -100 action b action a s1 s2 s1 s2 p(s1) [Sondik 72, Littman, Kaelbling, Cassandra ‘97]

  18. Introduction to POMDPs (2 of 3) c 80% 20% s2’ p(s1’) s1’ s1 s2 p(s1) 100 -100 100 -40 80 0 a b a b -100 s1 s2 s1 s2 p(s1) [Sondik 72, Littman, Kaelbling, Cassandra ‘97]

  19. Introduction to POMDPs (3 of 3) 30% 50% A A B B 70% 50% s2 p(s1’|B) p(s1’|A) s1 s1 s2 p(s1) 100 -100 100 -40 80 0 a b a b -100 c 80% s1 s2 s1 s2 p(s1) 20% [Sondik 72, Littman, Kaelbling, Cassandra ‘97]

  20. Value Iteration in POMDPs Substitute b for s • Value function of policy p • Bellman equation for optimal value function • Value iteration: recursively estimating value function • Greedy policy:

  21. Missing Terms: Belief Space • Expected reward: • Next state density: Bayes filters! (Dirac distribution)

  22. Value Iteration in Belief Space state s next state s’, reward r’ observation o . . . . . . . . belief state b next belief state b’ Q(b, a) max Q(b’, a) value function

  23. Why is This So Complex? ? State Space Planning (no state uncertainty) Belief Space Planning (full state uncertainties)

  24. Augmented MDPs: uncertainty (entropy) conventional state space [Roy et al, 98/99]

  25. Path Planning with Augmented MDPs information gain Conventional planner Probabilistic Planner [Roy et al, 98/99]

  26. Robot Planning Frameworks

More Related