Classical Situation

Classical Situation heaven hell • World deterministic • State observable

MDP-Style Planning • Policy • Universal Plan • Navigation function [Koditschek 87, Barto et al. 89] heaven hell • World stochastic • State observable

Stochastic, Partially Observable heaven? hell? sign [Sondik 72] [Littman/Cassandra/Kaelbling 97]

Stochastic, Partially Observable heaven hell hell heaven sign sign

Stochastic, Partially Observable start 50% 50% heaven hell ? ? hell heaven sign sign sign

Robot Planning Frameworks

MDP-Style Planning • Policy • Universal Plan • Navigation function [Koditschek 87, Barto et al. 89] heaven hell • World stochastic • State observable

Markov Decision Process (discrete) r=1 0.1 s2 0.9 0.7 0.1 0.3 0.99 r=0 s3 0.3 s1 r=20 0.3 0.4 0.2 s5 s4 r=0 0.8 r=-10 [Bellman 57] [Howard 60] [Sutton/Barto 98]

Value Iteration • Value function of policy p • Bellman equation for optimal value function • Value iteration: recursively estimating value function • Greedy policy: [Bellman 57] [Howard 60] [Sutton/Barto 98]

Value Iteration for Motion Planning(assumes knowledge of robot’s location)

Continuous Environments From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

Approximate Cell Decomposition [Latombe 91] From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

Parti-Game [Moore 96] From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

Stochastic, Partially Observable ? ? heaven hell ? ? hell heaven start start sign sign sign sign 50% 50%

A Quiz 3 perfect deterministic 3 perfect stochastic 3 abstract states deterministic 2-dim continuous*: p(S=s1), p(S=s2) 3 stochastic deterministic 3 none stochastic -dim continuous 1-dim continuous 1-dim continuous stochastic stochastic stochastic stochastic deterministic stochastic *) countable, but for all practical purposes # states sensors actions size belief space? 3: s1, s2, s3 3: s1, s2, s3 23-1: s1, s2, s3,s12,s13,s23,s123 2-dim continuous*: p(S=s1), p(S=s2) -dim continuous* -dim continuous* aargh!

Introduction to POMDPs (1 of 3) action b 100 -100 100 -40 80 0 a b a b action a -100 action b action a s1 s2 s1 s2 p(s1) [Sondik 72, Littman, Kaelbling, Cassandra ‘97]

Introduction to POMDPs (2 of 3) c 80% 20% s2’ p(s1’) s1’ s1 s2 p(s1) 100 -100 100 -40 80 0 a b a b -100 s1 s2 s1 s2 p(s1) [Sondik 72, Littman, Kaelbling, Cassandra ‘97]

Introduction to POMDPs (3 of 3) 30% 50% A A B B 70% 50% s2 p(s1’|B) p(s1’|A) s1 s1 s2 p(s1) 100 -100 100 -40 80 0 a b a b -100 c 80% s1 s2 s1 s2 p(s1) 20% [Sondik 72, Littman, Kaelbling, Cassandra ‘97]

Value Iteration in POMDPs Substitute b for s • Value function of policy p • Bellman equation for optimal value function • Value iteration: recursively estimating value function • Greedy policy:

Missing Terms: Belief Space • Expected reward: • Next state density: Bayes filters! (Dirac distribution)

Value Iteration in Belief Space state s next state s’, reward r’ observation o . . . . . . . . belief state b next belief state b’ Q(b, a) max Q(b’, a) value function

Why is This So Complex? ? State Space Planning (no state uncertainty) Belief Space Planning (full state uncertainties)

Augmented MDPs: uncertainty (entropy) conventional state space [Roy et al, 98/99]

Path Planning with Augmented MDPs information gain Conventional planner Probabilistic Planner [Roy et al, 98/99]

Classical Situation

Classical Situation

Presentation Transcript

Situation

SITUATION

Classical

SITUATION

SITUATION

Situation

Situation:

Situation

classical

CLASSICAL GREECE CLASSICAL GREECE

Situation :

Situation

Situation:

Classical

Classical

Situation…

Situation

Classical

CLASSICAL

Situation